xelys jobs xelys jobs

Senior Site Reliability Engineer

Jobgether

full-remoteseniorpermanentdevopssecurity United States 2 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringAWSObservabilitySplunkDatadogServiceNowIncident ResponseAlertingRunbooksFinancial Services

About the role

Role Overview

Senior Site Reliability Engineer responsible for production reliability and cloud operations in a highly regulated financial services environment. You will focus on stability, observability, performance, and improving incident response by turning operational chaos into scalable processes.

Accountabilities

  • Own and improve production reliability across large-scale distributed systems, ensuring high availability and performance
  • Design, refine, and maintain observability and monitoring using tools such as Splunk, Datadog, and ServiceNow
  • Reduce alert noise/alert fatigue by improving signal quality, eliminating false positives, and strengthening severity classification and escalation paths
  • Develop and maintain incident response playbooks, troubleshooting procedures, mitigation steps, and post-incident reviews
  • Troubleshoot complex AWS-based production issues and drive rapid identification and resolution
  • Collaborate with engineering, infrastructure, and product teams to improve reliability, scalability, and operational efficiency
  • Increase operational maturity through automation and observability improvements for production support

Requirements

  • Extensive experience in Site Reliability Engineering, production support, or infrastructure engineering
  • Strong expertise in AWS and cloud-native architectures
  • Proven observability experience with Splunk and Datadog (or similar)
  • Demonstrated ability to improve signal-to-noise via effective alerting
  • Experience creating incident response playbooks, severity frameworks, and runbooks
  • Strong troubleshooting skills in complex distributed/production systems
  • Excellent analytical and communication skills to coordinate across technical and non-technical stakeholders

Nice to Have

  • Experience in regulated industries such as financial services, banking, or payments

About Jobgether

The job posting is listed on behalf of Jobgether, which uses an AI-powered matching process to connect candidates with partner companies. The hiring partner operates large-scale, highly regulated financial services systems, including modern banking and payments platforms.

Scraped 6/17/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.