Senior Site Reliability Engineer
ClickHouse
full-remoteseniorpermanentdevopsbackend Full remote Today via WTTJ
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability EngineeringAWSAzureGCPTerraformAnsibleKubernetesDocker SwarmDistributed DatabasesClickHouse
About the role
Role Overview
As a Senior Site Reliability Engineer, you will ensure the reliability, availability, scalability, and performance of ClickHouse’s cloud infrastructure. You will work across teams to define service levels, improve incident response, and continuously raise the quality of Cloud services.
Key Missions
- Build and lead processes that ensure reliability, availability, scalability, and performance of cloud infrastructure.
- Collaborate with teams to design and implement scalable, secure, highly available, fault-tolerant distributed systems.
- Own incident management and response, run post-mortems, and drive continuous improvements to Cloud services.
Requirements
- Strong knowledge of cloud platforms: AWS, Azure, or GCP.
- Strong experience with automation/config management tools: Ansible, Terraform, or Puppet.
- Production debugging skills and strong problem-solving.
- Ownership, accountability, and high responsibility.
- Hands-on experience with container orchestration: Kubernetes or Docker Swarm.
- Excellent understanding of distributed databases and SQL (ClickHouse experience is a plus).
- Hands-on experience with Go and/or Python.
- Excellent communication and interpersonal skills.
- At least 8 years of experience in SRE or a related field.
Nice to Have
- Experience with ClickHouse specifically and/or data governance focus.
Education
- Bachelor’s or Master’s degree in Computer Science or related field.
About ClickHouse
ClickHouse is a cloud infrastructure company focused on high-performance data systems, centered around the ClickHouse platform. The role supports the reliability and operation of ClickHouse Cloud, delivering scalable and performant services for data-intensive workloads.
Scraped 5/13/2026