Site Reliability Engineer
Fabric
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
As a Site Reliability Engineer (SRE) at Fabric Health, you will own and evolve the infrastructure that powers healthcare experiences for millions of patients. You will help bridge traditional infrastructure reliability with AI-driven operations, acting as a primary architect for AWS and Kubernetes (EKS).
Responsibilities
- Infrastructure & Kubernetes Orchestration
- Design, deploy, and maintain production Kubernetes (EKS) clusters for enterprise-grade availability.
- Eliminate manual configuration by managing infrastructure state through Terraform.
- Optimize AWS usage across EC2, RDS, and S3 for performance, cost-efficiency, and reliability.
- AI-Assisted Operations & Automation
- Explore and deploy agentic workflows for AI-assisted runbooks that automate operational decisions and repetitive tasks.
- Build and evolve deployment pipelines using GitHub Actions or Semaphore.
- Reduce toil via internal tools that replace manual operational work with intelligent automation.
- Observability & Incident Management
- Evolve the Datadog observability stack (metrics, traces, logs) to meet SLOs.
- Lead incident response and facilitate blameless postmortems to reduce MTTR.
- Define and monitor SLIs/SLOs for healthcare-grade performance.
- Compliance & Collaboration
- Ensure infrastructure compliance with HIPAA and other healthcare regulations.
- Mentor engineers on reliability best practices and contribute clinical-safety perspectives in cross-functional design reviews.
Requirements / Fit
- Strong expertise at the intersection of cloud infrastructure, automation, and system design.
- Deep observability discipline and a root-cause mindset (not just patching).
- Interest in the “next frontier” of SRE, including AI/agentic operations.
- Ability to balance technical rigor with pragmatism in a fast-paced, clinical-safety context.
Nice-to-Haves (Implied)
- Experience automating infrastructure and operations end-to-end.
- Familiarity with AI-assisted operational tooling/runbooks.
About Fabric
Fabric Health is a mission-driven healthcare technology company focused on improving clinical capacity. It unifies the care journey from intake to treatment using intelligent automation to reduce administrative burden and make care delivery more efficient. The platform is used by leading healthcare organizations and is supported by major investors.
Scraped 6/15/2026