Senior Site Reliability Engineer
Cloudbeds
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
About the role
Role Overview
As a Senior Site Reliability Engineer (SRE) at Cloudbeds, you will be responsible for the reliability and performance of the platform that powers hospitality transactions globally. You’ll architect and implement scalable AWS solutions, strengthen automation and resilience across engineering teams, and continuously improve observability and incident response.
Responsibilities
- Design and implement reliable, scalable AWS architecture for the organization.
- Maintain and support high-load Kubernetes (EKS) clusters and related infrastructure components.
- Support CI/CD using ArgoCD and GitOps.
- Automate deployments with Terraform (Infrastructure as Code).
- Build and continuously improve Observability & Monitoring, leveraging:
- Grafana, Prometheus, Datadog, and CloudWatch.
- Participate in Incident Management and Root Cause Analysis (RCA) to minimize impact.
- Optimize system performance and perform full-stack troubleshooting.
- Collaborate with development teams on monitoring best practices and reliability targets.
- Partner with security teams to implement and maintain security best practices.
- Contribute via infrastructure support rotation (guidance to other engineering teams).
Requirements
- 5+ years experience as DevOps or SRE in the AWS ecosystem.
- 5+ years with Kubernetes (EKS) and Helm.
- Experience designing/building CI/CD pipelines with ArgoCD and GitHub Actions.
- Terraform for Infrastructure-as-Code.
- Observability/monitoring experience with Grafana, Prometheus, Datadog, and CloudWatch.
- Incident management and strong troubleshooting, performance analysis, and RCA.
- Experience with web application systems: Nginx, Ingress controllers, load balancing, and CDNs.
- Database and middleware experience: MySQL, PostgreSQL, Aurora, plus Redis, Memcached, SQS.
- Networking knowledge: VPC, Security Groups, Network ACLs.
- Ability to work remotely and manage time with a global team; English communication skills.
- Bachelor’s degree in Computer Science or equivalent experience.
Bonus Skills
- Advanced database administration (Aurora, MySQL, PostgreSQL).
- Experience in a PCI-compliant environment.
- Experience with Kong API Gateway.
About Cloudbeds
Cloudbeds builds a cloud-based property management system (PMS) for hospitality, serving properties across 150 countries and processing billions of bookings annually. Its unified platform integrates with hundreds of partners to help hoteliers improve operations and commercial strategy. The company operates with a fully remote team and focuses on reliability, scalability, and increasingly AI-powered solutions.
Scraped 4/15/2026