hero

JOIN THE TECH SQUARE FAMILY

Senior Site Reliability Engineer

AdPipe

AdPipe

Software Engineering
United States
Posted on Nov 20, 2025

Location: Atlanta, GA, hybrid / US Remote

Team: Engineering • Reports to: VP of Engineering

Type: Full‑time

Why AdPipe

Video is hard; we’re making it easy. Fresh off a $12M Series A, AdPipe is further investing in our AI‑powered video platform, already used by some of the most respected marketing organizations in the world, to repurpose existing, authentic content into new engaging videos, fast. We help creative teams work smarter and empower every marketer to ship high‑quality video without having to learn the ins and outs of an NLE.

At the core of our platform sits:

  • A high‑throughput ingestion pipeline that indexes large media libraries for search and AI automation.
  • An in‑browser video editor that interfaces with both users and AI.

The Role

We’re hiring our first Site Reliability Engineer to establish SRE at AdPipe and set the direction for our platform. You’ll own reliability, scalability, and performance across our stack, partner closely with product engineering, and build the guardrails that let us ship quickly without breaking customer trust. You’ll also define our AWS foundations, IaC approach, and operational standards from the ground up. This is a hands-on role with high leverage and visibility.

What you’ll do

  • Set platform direction: Create the north star for our IaaS platform, balancing simplicity, scale, and cost.
  • Introduce IaC from zero: Design the repo/module strategy, code reviews, and guardrails; migrate manual/console resources to code.
  • Build observability: Stand up metrics, logs, traces, dashboards, and alerting (sane by default, noisy never).
  • Harden AWS foundations: Multi‑account/landing zone guidance, VPC/IAM/networking patterns, secrets management, backup/restore, runbooks, and chaos/proactive testing.
  • Streamline delivery: Improve CI/CD, rollouts, migrations, and progressive delivery for safer, faster deploys.
  • Lead incident management: Triage, coordinate response, root‑cause without blame, and drive follow‑through on action items.
  • Optimize performance & cost: Profile hotspots, tune databases/queues, and right‑size compute & storage.
  • Uplevel engineering: Establish standards for service ownership, on‑call, runbooks, linting/security checks, and internal tooling

Qualifications — what you likely bring

  • 5+ years in SRE/Platform/Infra roles with time owning production systems.
  • Deep experience with AWS and container orchestration, networking, and Linux.
  • Proven track record bootstrapping IaC, including module design, state management, and code reviews.
  • Strong observability background (metrics, logs, tracing) and production debugging chops.
  • Hands‑on with CI/CD, incident response, and at least one modern programming language.
  • Excellent communication; can drive consensus and teach best practices.

Our Tech Stack

  • Backend: Elixir/Phoenix, Node/TypeScript, Python
  • Data & Storage: PostgreSQL, a vector db, S3
  • Infra: AWS + ECS, GitHub Actions; no formal IaC yet (you will introduce it)
  • Observability: OpenTelemetry, CloudWatch, Log aggregation
  • CI/CD: GitHub Actions

Compensation & benefits

  • Competitive salary and meaningful equity
  • Medical, dental, vision, 401(k)
  • Flexible PTO, Parental Leave, remote‑first culture
  • Location/visa: USA - not able to sponsor visa at this time

Our commitment

We’re an equal‑opportunity employer. We welcome teammates from all backgrounds and encourage candidates who don’t meet every single bullet to apply: skills grow, curiosity and ownership matter.