xelys jobs xelys jobs

Senior Site Reliability Engineer

Block

seniorpermanentbackenddevops United States 6 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringIncident ManagementObservabilityCI/CDKubernetesTerraformAWSAI AutomationProgressive DeliveryRoot Cause Analysis

About the role

Role overview

As a Senior Site Reliability Engineer (SRE) on the SRE team, you will proactively and reactively improve the reliability of Block’s platform and critical infrastructure. You’ll be metrics-driven and systems-oriented, building distributed platforms that enable safe, scalable product development using AI-driven tooling and automation.

Responsibilities

  • Build and extend reliability platforms and tools across the company
  • Standardize reliability tooling across multiple platforms and organizations
  • Triage and coordinate stabilization of sev 0–1 incidents
  • Serve as primary oncall (12 hours per day, one week every few weeks, depending on team size)
  • Lead incident command, coordinate mitigation, and drive escalation during high-severity events
  • Drive platform-wide reliability improvements, including shared operational tooling and deploy-safety patterns
  • Use AI to enhance signal detection, reduce alert noise, and accelerate root cause analysis
  • Design and implement safe deployment patterns such as progressive delivery, automated rollback, and guardrails

Requirements

  • Drive to perform root cause analysis and take steps to fix underlying issues
  • Demonstrated technical initiative and leadership, especially on backend/platform projects
  • Familiarity with AI-driven tooling for observability, incident analysis, or automation
  • Experience running production oncall for high-availability systems
  • Strong incident management skills: structured triage, mitigation under pressure, and blameless postmortems
  • Fluency with CI/CD, progressive rollout strategies, and rollback automation
  • Monitoring & observability expertise: tuning alerts for uptime, error rates, latency regressions, and resource exhaustion
  • Ability to create and maintain evidence-based maturity assessments using trailing 90-day data windows
  • Comfort with vendor/dependency management, including maintaining validated escalation contacts reachable within ≤ 5 minutes
  • Strong accountability, autonomy, and desire to perform and grow as an engineer

Nice to have

  • Fluency with evidence-based reliability program practices (maturity assessments)
  • Further experience integrating AI into operational workflows (beyond incident analysis/alert tuning)

About Block

Block is a global company building foundational platform teams across areas like People, Finance, Counsel, Hardware, Information Security, and Platform Infrastructure Engineering. The company’s mission centers on economic empowerment, using technology to support scalable, reliable product development.

Scraped 4/15/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.