Senior Site Reliability Engineer
Cloudbeds
full-remoteseniorpermanentdevops United States 3 days ago via LinkedIn
145,000 - 165,000 USD/annual
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability EngineeringAWSKubernetesEKSTerraformGitOpsArgoCDGrafanaPrometheusObservability
About the role
Role Overview
As a Senior Site Reliability Engineer (SRE), you will safeguard Cloudbeds’ platform reliability and performance, ensuring high-volume hospitality transactions run smoothly globally.
Responsibilities
- Design and implement reliable, scalable AWS architecture for organizational needs.
- Maintain and support Kubernetes (EKS) clusters and infrastructure components.
- Support CI/CD using ArgoCD and GitOps.
- Automate deployments with Terraform (Infrastructure as Code).
- Build and continuously improve Observability/Monitoring using Grafana, Prometheus, Datadog, and CloudWatch.
- Participate in Incident Management and perform Root Cause Analysis (RCA) to minimize service impact.
- Optimize performance and troubleshoot production issues.
- Collaborate with development teams to establish monitoring best practices and meet reliability targets.
- Partner with security teams to implement and maintain security best practices.
- Provide guidance through an infrastructure support rotation.
Requirements
- 5+ years as a DevOps or SRE within the AWS ecosystem.
- 5+ years with Kubernetes (EKS) and Helm.
- Experience building/supporting CI/CD pipelines with ArgoCD and GitHub Actions.
- Experience with Terraform and Infrastructure-as-Code.
- Strong observability/monitoring experience with Grafana, Prometheus, Datadog, CloudWatch.
- Proven incident management, troubleshooting, performance analysis, and RCA.
- Experience with web application systems: Nginx, Ingress controllers, load balancing, CDNs.
- Database and middleware experience: MySQL, PostgreSQL, Aurora; Redis, Memcached, SQS.
- Good networking skills: VPC, Security Groups, Network ACLs.
- Ability to work fully remotely and manage time in a global team.
- Communicate clearly in English (written and verbal).
- Bachelor’s degree in CS or equivalent experience.
Bonus Skills
- Advanced database administration (Aurora, MySQL, PostgreSQL).
- Experience in PCI-compliant environments.
- Experience with Kong API Gateway.
About Cloudbeds
Cloudbeds builds an AI-powered hospitality platform used by properties worldwide. Its unified system supports hotel operations and commercial strategy by processing billions of bookings annually and integrating with hundreds of partners. The company runs a fully remote engineering team and has been recognized for technology growth.
Scraped 5/17/2026