xelys jobs xelys jobs

Site Reliability Engineer

GR8 People

nullmidpermanentdevopsbackend United States Today via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability Engineering (SRE)AWSTerraformAnsibleCI/CDKubernetesPrometheusGrafanaDistributed SystemsFinOps

About the role

Role Overview

Site Reliability Engineer (SRE) focused on building resilient, scalable, secure, and highly automated AWS infrastructure for mission-critical applications. You will combine software engineering practices with cloud operations to design, deploy, automate, and optimize large-scale production systems.

Key Responsibilities

AWS Cloud Architecture & Deployments

  • Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
  • Lead large-scale deployments across multi-account and multi-region environments
  • Architect and optimize AWS solutions using:
    • Compute/Containers: EC2, EKS, ECS, Lambda
    • Networking/CDN/DNS: VPC, Route 53, CloudFront
    • Data/Storage: RDS, DynamoDB, S3
    • Security: IAM, KMS, Secrets Manager
  • Implement solutions aligned with AWS Well-Architected best practices

Infrastructure as Code (Terraform)

  • Develop and maintain reusable, modular Terraform code
  • Build CI/CD-driven infrastructure pipelines
  • Manage Terraform state securely (remote backends, locking, environment separation)
  • Enforce policy-as-code and guardrails
  • Review and optimize Terraform modules for performance and maintainability

Configuration Management & Automation (Ansible)

  • Design and maintain Ansible playbooks and roles
  • Automate configuration management and application deployments
  • Integrate Ansible with CI/CD pipelines
  • Ensure automation is idempotent, secure, and maintainable

Reliability & Operations

  • Define and implement SLOs/SLIs/error budgets
  • Lead incident response, root cause analysis (RCA), and postmortems
  • Improve observability (logging, monitoring, tracing)
  • Optimize performance, cost, and resilience
  • Build self-healing, automation-first infrastructure
  • Participate in recurring On-Call shifts

DevOps & CI/CD

  • Maintain CI/CD pipelines for infrastructure and applications
  • Promote GitOps workflows
  • Integrate automated testing, security scanning, and compliance validation

Security & Compliance

  • Implement least-privilege IAM policies
  • Automate security controls within Terraform and Ansible
  • Meet internal and regulatory compliance standards
  • Apply infrastructure security best practices (network segmentation, encryption, patching)

Required Qualifications

  • 4+ years in DevOps, Cloud Engineering, or Site Reliability Engineering
  • 3+ years hands-on AWS experience in production environments
  • Proficiency in at least one scripting/programming language (Python, Bash, or Go)
  • Experience with monitoring/observability tools (Prometheus, Grafana, Datadog, etc.)
  • Strong understanding of distributed systems and reliability engineering principles
  • Deep expertise in:
    • Terraform (advanced modules, workspaces, state management)
    • Ansible (roles, playbooks, dynamic inventories)
  • Strong experience with:
    • CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, etc.)
    • Kubernetes (EKS preferred)
    • Linux systems administration
    • Networking fundamentals (VPC design, DNS, load balancing)

Preferred Qualifications

  • Multi-account AWS experience using Organizations and Control Tower
  • Experience implementing GitOps workflows
  • AWS certifications (e.g., Solutions Architect, DevOps Engineer)
  • Cost optimization and FinOps experience
  • Experience in highly regulated environments (HIPAA, SOC 2, PCI)
  • Service mesh experience

Additional Notes

  • Permanent work authorization in the United States is required; no visa sponsorship is available.

About GR8 People

GR8 People is a talent provider focused on helping enterprising companies attract and retain high-potential talent. The company emphasizes recruiting effectively and improving business performance through better talent relationships. It operates in the recruiting and staffing/services space.

Scraped 4/9/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.