Senior Site Reliability Engineer (Platform, MKI)
Elastic
full-remoteseniorpermanentdevopsbackend Full remote - Ottawa, CA Today via WTTJ
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)KubernetesGolangTerraformCrossplaneLinuxPrometheusElastic StackIncident ManagementInfrastructure-as-Code
About the role
Role Overview
As a Senior Site Reliability Engineer at Elastic, you will help design, build, and scale a multi-cloud platform. You’ll collaborate with engineers, lead technical initiatives, and support major incident response to ensure reliability for global Elastic infrastructure.
Key Missions
- Design, build, scale, and mature the multi-cloud platform for hosting internal and external services.
- Develop and extend software/tools that support infrastructure and enable rapid product deployment.
- Lead initiatives to automate system engineering to guarantee reliability across the global platform.
Responsibilities
- Operate and improve the reliability of services in a multi-cloud environment.
- Collaborate with engineering teams to diagnose issues and deliver solutions.
- Lead and improve alerting and major incident management processes, metrics, and systems.
- Mentor and coach teammates in a globally distributed, self-organizing environment.
Requirements
- Production experience with public cloud providers and Kubernetes infrastructure at scale.
- Strong software engineering background, ideally delivering solutions in Golang.
- Experience operating a SaaS product in public cloud using Infrastructure-as-Code (e.g., Crossplane or Terraform).
- Experience with containerized services (e.g., Docker).
- Proven experience building and operating Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, including supporting automation.
- Experience leading and improving alerting and major incident management (e.g., Elastic Stack, Prometheus, Influx) with measurable impact reporting.
- Linux system administration expertise on distributed systems at scale.
- Experience diagnosing or designing solutions using the Elastic Stack.
Nice to Have
- Experience working in distributed teams and/or remote settings.
- Interest/passion for inclusive communication to strengthen partner and team relationships.
About Elastic
Elastic is a company focused on building technology for search, observability, and data platforms used by teams to operate applications and infrastructure. The role highlights its operational excellence and multi-cloud platform hosting internal and external services.
Scraped 6/14/2026