xelys jobs xelys jobs

Senior Site Reliability Engineer (Platform, MKI)

Elastic

full-remoteseniorpermanentdevopsbackend Full remote - Ottawa, CA Today via WTTJ

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability Engineering (SRE)KubernetesGolangTerraformCrossplaneLinuxPrometheusElastic StackIncident ManagementInfrastructure-as-Code

About the role

Role Overview

As a Senior Site Reliability Engineer at Elastic, you will help design, build, and scale a multi-cloud platform. You’ll collaborate with engineers, lead technical initiatives, and support major incident response to ensure reliability for global Elastic infrastructure.

Key Missions

  • Design, build, scale, and mature the multi-cloud platform for hosting internal and external services.
  • Develop and extend software/tools that support infrastructure and enable rapid product deployment.
  • Lead initiatives to automate system engineering to guarantee reliability across the global platform.

Responsibilities

  • Operate and improve the reliability of services in a multi-cloud environment.
  • Collaborate with engineering teams to diagnose issues and deliver solutions.
  • Lead and improve alerting and major incident management processes, metrics, and systems.
  • Mentor and coach teammates in a globally distributed, self-organizing environment.

Requirements

  • Production experience with public cloud providers and Kubernetes infrastructure at scale.
  • Strong software engineering background, ideally delivering solutions in Golang.
  • Experience operating a SaaS product in public cloud using Infrastructure-as-Code (e.g., Crossplane or Terraform).
  • Experience with containerized services (e.g., Docker).
  • Proven experience building and operating Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, including supporting automation.
  • Experience leading and improving alerting and major incident management (e.g., Elastic Stack, Prometheus, Influx) with measurable impact reporting.
  • Linux system administration expertise on distributed systems at scale.
  • Experience diagnosing or designing solutions using the Elastic Stack.

Nice to Have

  • Experience working in distributed teams and/or remote settings.
  • Interest/passion for inclusive communication to strengthen partner and team relationships.

About Elastic

Elastic is a company focused on building technology for search, observability, and data platforms used by teams to operate applications and infrastructure. The role highlights its operational excellence and multi-cloud platform hosting internal and external services.

Scraped 6/14/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.