Site Reliability Engineer
CRG - People and Technology
full-remotemidpermanent United States Yesterday via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreAbout the role
Role Overview
Product Reliability Engineer / SRE for an innovative infrastructure platform company. This role bridges customer experience and engineering excellence, focusing on ensuring platform reliability across diverse customer environments.
Key Responsibilities
- Partner with customers and internal teams to manage L2/L3 escalations, resolving complex deployment, upgrade, and runtime issues
- Perform deep root cause analysis and reproduce issues through to resolution
- Collaborate with engineering teams to implement fixes and prevent recurrence
- Develop and maintain diagnostic tooling (health checks, support bundles, environment validators)
- Improve and scale test automation and CI environments, reducing flakiness and increasing reliability
- Build reproducible environments for debugging and integration testing
- Establish performance baselines and regression testing frameworks
- Identify recurring failure patterns and improve installation and upgrade robustness
- Write production-quality code in Python, Go, or Rust to enhance platform reliability
- Transform customer issues into actionable improvements across testing, observability, and product design
Required Experience
- 4–7 years in SRE, Platform Engineering, or Production Engineering roles
- Strong hands-on experience with Kubernetes (troubleshooting networking, storage, RBAC, multi-environment setups)
- Experience in startup environments
- Solid programming skills in Python, Go, or Rust
- Proven ability to diagnose complex distributed systems using logs, metrics, and tracing
- Experience managing customer-facing escalations or high-severity incidents
- Strong problem-solving skills for complex, ambiguous issues
- Excellent written and verbal communication
- Comfortable working remotely with high autonomy
Desirable Skills
- Experience with containers, Helm, and deployment/upgrade workflows
- Knowledge of CI/CD at scale, including test parallelization and reproducible environments
- Familiarity with performance testing and profiling tools
- Background in customer-facing engineering roles (support, solutions, escalation engineering)
- Open-source contributions, particularly in infrastructure or observability
About CRG - People and Technology
CRG - People and Technology partners with an innovative technology business that develops a schema-driven infrastructure platform serving as a single source of truth for modern systems. The platform is designed for on-premise deployment where reliability is a core product feature.
Scraped 3/31/2026