Staff Site Reliability Engineer
Babylist
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
As a Staff Site Reliability Engineer (SRE) on Babylist’s Platform team, you’ll keep Babylist’s infrastructure reliable, fast, and scalable for millions of users. This is an engineering-evolution role (not maintenance): you’ll actively improve how AWS infrastructure, CI systems, and developer tooling are built and operated.
Responsibilities
- Own infrastructure and reliability practices that support 9M+ users and the engineers building for them
- Evolve AWS infrastructure and reliability operations across teams with wide leverage
- Drive improvements to Infrastructure as Code (IaC) using Terraform
- Design, improve, and maintain CI/CD systems focused on developer velocity
- Build and tune observability and alerting that is actionable and low-noise
- Lead/participate in on-call and incident management processes
- Operate and debug Kubernetes in production
Requirements
- Deep, hands-on Terraform expertise (own IaC end-to-end)
- Strong AWS experience at scale, including:
- EKS, RDS, cloud networking, DNS, CDNs, and load balancers
- Experience operating Kubernetes in production (debugging hard issues)
- Comfort designing and improving CI/CD systems (e.g., CircleCI, GitHub Actions)
- Solid observability instincts with tools such as Datadog, Sentry, PagerDuty, Cronitor
- Experience with on-call and incident management
Tech Stack (from posting)
- Ruby on Rails, AWS, Sidekiq, MySQL, Redis
About Babylist
Babylist is a leading platform for expecting and new families, helping more than 10 million people shop with seamless purchasing, guidance, and expert recommendations. The company has grown from a baby registry into a broader ecosystem including the Babylist Shop, Health, Money, showrooms, and branded content, and is positioned as an AI-forward tech organization.
Scraped 6/15/2026