Senior Site Reliability Engineer
Cloudbeds
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
As a Senior Site Reliability Engineer (SRE) at Cloudbeds, you will be responsible for the reliability and performance of a globally used hospitality platform. You’ll architect scalable systems on AWS, improve automation and resiliency, and help teams continuously raise reliability through observability, incident response, and continuous improvement.
Responsibilities
- Design and implement reliable, scalable AWS architectures for platform needs.
- Maintain and support highly loaded Kubernetes (EKS) clusters and related infrastructure components.
- Support CI/CD using ArgoCD and GitOps.
- Automate deployments with Terraform (Infrastructure as Code).
- Build and continuously improve Observability and Monitoring using Grafana, Prometheus, Datadog, and CloudWatch.
- Participate in Incident Management and perform Root Cause Analysis (RCA) to minimize service impact.
- Optimize performance and troubleshoot issues.
- Collaborate with development teams on monitoring best practices and reliability targets.
- Collaborate with security teams to implement and maintain security best practices.
- Participate in infrastructure support rotation and provide guidance to other engineering teams.
Requirements
- 5+ years experience as a DevOps or SRE in the AWS ecosystem.
- 5+ years with Kubernetes (EKS) and Helm.
- Experience building CI/CD pipelines with ArgoCD and GitHub Actions.
- Terraform Infrastructure-as-Code experience.
- Observability/monitoring experience with Grafana, Prometheus, Datadog, CloudWatch.
- Incident management and strong troubleshooting, performance analysis, and RCA experience.
- Web application systems experience (e.g., Nginx, Ingress controllers, load balancing, CDNs).
- Database and middleware experience: MySQL, PostgreSQL, Aurora, Redis, Memcached, SQS.
- Strong networking knowledge: VPC, Security Groups, Network ACLs.
- Ability to work fully remotely and manage time in a global team.
- Strong written and verbal English communication.
- Bachelor’s degree in Computer Science or equivalent experience.
Bonus Skills
- Advanced Database Administration experience (Aurora, MySQL, PostgreSQL).
- Experience in PCI-compliant environments.
- Experience with Kong API Gateway.
About Cloudbeds
Cloudbeds is a software company transforming the hospitality industry with a hospitality-focused platform for properties worldwide. Its technology powers property operations and booking workflows across 150 countries, integrating with hundreds of partners. The team is fully remote and builds AI-powered solutions for hotel operational and commercial challenges.
Scraped 4/24/2026