xelys jobs xelys jobs

Site Reliability Engineer

Fabric

seniorpermanentdevopssecurity New York, NY 7 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringAWSKubernetesTerraformDatadogObservabilityIncident ManagementSLO/SLIHIPAAAutomation

About the role

Role Overview

As a Site Reliability Engineer (SRE) at Fabric Health, you will own and evolve the infrastructure that powers healthcare experiences for millions of patients. You will help bridge traditional infrastructure reliability with AI-driven operations, acting as a primary architect for AWS and Kubernetes (EKS).

Responsibilities

  • Infrastructure & Kubernetes Orchestration
    • Design, deploy, and maintain production Kubernetes (EKS) clusters for enterprise-grade availability.
    • Eliminate manual configuration by managing infrastructure state through Terraform.
    • Optimize AWS usage across EC2, RDS, and S3 for performance, cost-efficiency, and reliability.
  • AI-Assisted Operations & Automation
    • Explore and deploy agentic workflows for AI-assisted runbooks that automate operational decisions and repetitive tasks.
    • Build and evolve deployment pipelines using GitHub Actions or Semaphore.
    • Reduce toil via internal tools that replace manual operational work with intelligent automation.
  • Observability & Incident Management
    • Evolve the Datadog observability stack (metrics, traces, logs) to meet SLOs.
    • Lead incident response and facilitate blameless postmortems to reduce MTTR.
    • Define and monitor SLIs/SLOs for healthcare-grade performance.
  • Compliance & Collaboration
    • Ensure infrastructure compliance with HIPAA and other healthcare regulations.
    • Mentor engineers on reliability best practices and contribute clinical-safety perspectives in cross-functional design reviews.

Requirements / Fit

  • Strong expertise at the intersection of cloud infrastructure, automation, and system design.
  • Deep observability discipline and a root-cause mindset (not just patching).
  • Interest in the “next frontier” of SRE, including AI/agentic operations.
  • Ability to balance technical rigor with pragmatism in a fast-paced, clinical-safety context.

Nice-to-Haves (Implied)

  • Experience automating infrastructure and operations end-to-end.
  • Familiarity with AI-assisted operational tooling/runbooks.

About Fabric

Fabric Health is a mission-driven healthcare technology company focused on improving clinical capacity. It unifies the care journey from intake to treatment using intelligent automation to reduce administrative burden and make care delivery more efficient. The platform is used by leading healthcare organizations and is supported by major investors.

Scraped 6/15/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.