Senior Software Engineer - Site Reliability (Remote)
The Home Depot
full-remoteseniorpermanentdevopsbackend Atlanta, GA 2 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)GCPObservabilityPrometheusGrafanaKubernetesTerraformPerformance TestingChaos EngineeringAgile
About the role
Role overview
As a Senior Software Engineer – Site Reliability (SRE), you’ll drive the stability, scalability, and performance of The Home Depot’s platform. You’ll build and improve automation for complex infrastructure and operational challenges, partnering with product owners and developers to enable reliable, high-performing services.
Responsibilities
- Improve reliability and availability through proactive monitoring, performance tuning, and operational improvements aligned to business Service Level Objectives (SLOs).
- Develop and maintain software to support SRE outcomes, including creating test suites (e.g., functional and destructive tests) to enable safe, rapid production deployments.
- Lead incident review activities by driving post-mortems and using learnings to reduce recurrence.
- Reduce operational toil by creating automation for infrastructure and operational workflows.
- Support capacity planning and participate in tool selection.
- Mentor and enable growth of less experienced engineers by providing guidance and leading technical discussions.
- Collaborate in Agile processes; ensure product stories are valuable, developer-ready, testable, and easy to understand.
Requirements
- Legally permitted to work in the United States and 18+ years of age.
- Ability to operate effectively with ambiguity and drive results in fast-changing situations.
Preferred qualifications / experience
- GCP (Cloud Infrastructure)
- Observability tooling: Grafana, Prometheus, Loki, Tempo
- Chaos/destructive testing and tools such as Litmus
- Performance testing with k6 (K6)
- Infrastructure as Code with Terraform Enterprise
- Kubernetes (K8S) and Kubernetes manifest-as-code approaches
- GitHub and CDK8S
- AI-assisted development via GitHub Copilot
- SRE practices such as Production Readiness Review, Capacity Planning, Change Validation, and Production Support
Logistics
- Remote role with no travel required.
- Reporting to a Software Engineer Manager or Sr. Manager; no direct reports.
About The Home Depot
The Home Depot is a major U.S. retail company focused on home improvement products and services. The role described is within the company’s technology organization, working on platform reliability and operational excellence for customer-facing services.
Scraped 4/9/2026