Senior Site Reliability Engineer
Cloudbeds
full-remoteseniorpermanentdevopsbackend United States Today via LinkedIn
145,000 - 165,000 USD/annual
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
AWSKubernetes (EKS)TerraformArgoCDGitOpsObservabilityGrafanaPrometheusDatadogIncident Management
About the role
Role Overview
As a Senior Site Reliability Engineer (SRE) at Cloudbeds, you will help ensure the reliability and performance of a globally used hospitality platform. You’ll architect and operate scalable AWS infrastructure that supports high-volume transactions and enable automation, resilience, and continuous improvement across engineering teams.
Responsibilities
- Design and implement reliable, scalable AWS architecture for organizational needs.
- Maintain and support highly loaded Kubernetes (EKS) clusters and supporting infrastructure components.
- Support CI/CD using ArgoCD and GitOps.
- Automate deployments with Terraform (infrastructure-as-code).
- Build and improve observability and monitoring using Grafana, Prometheus, Datadog, and CloudWatch.
- Participate in incident management and perform root cause analysis (RCA) to minimize service impact.
- Optimize performance and troubleshoot production issues.
- Collaborate with development teams to define monitoring best practices and reliability targets.
- Work with security teams to implement and maintain security best practices.
- Join an infrastructure support rotation to guide other engineering teams.
Requirements
- 5+ years in DevOps or SRE within the AWS ecosystem.
- 5+ years with Kubernetes (EKS) and Helm.
- Experience designing/building CI/CD pipelines with ArgoCD and GitHub Actions.
- Strong Terraform infrastructure-as-code experience.
- Observability/monitoring experience with Grafana, Prometheus, Datadog, and CloudWatch.
- Incident management, full-stack troubleshooting, performance analysis, and RCA.
- Experience with web application systems: Nginx, Ingress controllers, load balancing, CDNs.
- Experience with databases: MySQL, PostgreSQL, Aurora; middleware: Redis, Memcached, SQS.
- Networking skills: VPC, Security Groups, Network ACLs.
- Ability to work remotely and manage time in a global team.
- English communication skills (written and verbal).
- Bachelor’s degree in Computer Science or equivalent experience.
Bonus Skills
- Advanced database administration (e.g., Aurora, MySQL, PostgreSQL).
- Experience in a PCI-compliant environment.
- Experience with Kong API Gateway.
About Cloudbeds
Cloudbeds builds a hospitality technology platform (hotel PMS) used by properties across 150 countries to power bookings and operations. The company offers an integrated, partner-connected SaaS platform and operates with a fully remote engineering team.
Scraped 5/14/2026