Site Reliability Engineer
Hydrolix
full-remoteseniorpermanentdevops United States Today via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)KubernetesCI/CDPrometheusGrafanaLinuxIncident ResponseRoot Cause AnalysisAWSSQL
About the role
Site Reliability Engineer (SRE)
Join Hydrolix’s Services team to improve the reliability, scalability, and operational excellence of its cloud data platform.
Responsibilities
- Infrastructure Reliability: Deploy, maintain, and ensure a reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms.
- Service Optimization: Design and maintain systems/processes that improve reliability, availability, and performance.
- CI/CD Management: Build and optimize CI/CD tools and deployment workflows.
- Monitoring & Incident Response: Create and manage monitoring, alerting, and incident response to minimize downtime and speed recovery.
- Root Cause Analysis: Perform thorough root cause analyses and implement long-term preventive measures.
- Automation & Efficiency: Automate repetitive work and optimize system performance.
- On-call Support: Cover weekday business hours and once-monthly weekend shifts.
Collaboration
- Partner with software engineering, infrastructure, and product teams to bake reliability into the development lifecycle.
- Advocate for SRE best practices and promote operational excellence.
- Work with a distributed global team for round-the-clock support.
- Interface with customers to resolve incidents and ensure a seamless user experience.
Requirements
- 5+ years experience as an SRE (or equivalent) supporting complex distributed systems.
- Hands-on experience with observability tools such as Prometheus, Vector, Grafana, Superset, or Kibana.
- Proficiency with a major cloud platform (AWS, GCP, Azure, or Linode).
- SQL database experience (familiarity with PostgreSQL is a plus).
- Programming skills in Python, Go, or Rust.
- Strong Linux expertise, including performance tuning and system-level troubleshooting.
- Excellent written and verbal communication skills with technical clarity for diverse audiences.
Nice-to-haves
- Familiarity with PostgreSQL.
About Hydrolix
Hydrolix builds an innovative cloud data platform for petabyte-scale data management and analytics. The company focuses on helping organizations reduce data costs while improving data retention through reliable, scalable infrastructure.
Scraped 4/8/2026