xelys jobs xelys jobs

Site Reliability Engineer

Empower

midpermanentbackend United States Today via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringAWSPythonIncident ManagementObservabilityRedshiftDynamoDBEMRCloudWatchInfrastructure as Code

About the role

Role overview

Site Reliability Engineer (SRE) responsible for the reliability, availability, and operational excellence of Empower’s AWS-based data platform. You’ll apply core SRE practices—production engineering, incident management, root-cause elimination, observability, automation, and capacity planning—to large-scale data infrastructure that powers EMR, EMR Serverless, Redshift, DynamoDB, and S3.

Responsibilities

  • Own and improve reliability, stability, scalability, and performance of core data platforms and services
  • Provide operational support for large-scale, distributed data systems and ensure strong SLAs
  • Partner with full-stack, data, and platform engineering teams to drive continuous improvements
  • Operate and support EMR / EMR Serverless workloads and data pipelines (Python/Spark)
  • Support and optimize Amazon Redshift and DynamoDB in high-throughput production environments
  • Design, build, and evolve monitoring, alerting, and observability frameworks (focus on symptoms, not just outages)
  • Lead incident response and perform troubleshooting across the full stack; coordinate with internal and external stakeholders
  • Conduct root cause analysis (RCA) and readiness reviews; implement durable fixes and automation
  • Create and maintain runbooks, SOPs, and operational documentation
  • Collaborate to optimize performance, reliability, and cost
  • Participate in an on-call rotation for incidents impacting customer-facing systems
  • Recommend AWS managed services and architectural patterns
  • Continuously evaluate performance, capacity, and cost to scale efficiently

Requirements

  • 4–6 years of experience building/operating systems across application, data, integration, infrastructure, and security domains
  • 4+ years of hands-on AWS experience with strong production exposure to several of: Redshift, DynamoDB, EMR, EMR Serverless, EC2, S3, Lambda, Step Functions, EventBridge, RDS, IAM
  • Proven experience operating data platforms such as data lakes and data warehouses in production
  • Strong SQL skills with modern databases (e.g., Redshift, DynamoDB, Postgres, MySQL, Oracle)
  • 4+ years of Python experience (scripting, automation, or data workloads)
  • Experience with CloudWatch and monitoring/alerting
  • Hands-on incident management and uptime SLAs in customer-impacting environments
  • Strong understanding of Git-based workflows (e.g., GitHub, Git Flow)
  • Experience in Agile environments (Scrum/Kanban) using tools such as Jira and Confluence
  • Bachelor’s degree in CS/Information Systems/Data/Analytics or related (or equivalent practical experience)

Additional notes

  • Applicants must be authorized to work in the U.S.; the company is unable to sponsor or take over sponsorship of an employment visa (including CPT/OPT).

About Empower

Empower is a financial services company focused on helping customers achieve financial freedom. It emphasizes internal mobility, well-being, and building reliable, purpose-driven teams, with a product and platform ecosystem that includes cloud-based data infrastructure.

Scraped 4/3/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.