Site Reliability Engineer
GR8 People
nullmidpermanentdevopsbackend United States Today via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)AWSTerraformAnsibleCI/CDKubernetesPrometheusGrafanaDistributed SystemsFinOps
About the role
Role Overview
Site Reliability Engineer (SRE) focused on building resilient, scalable, secure, and highly automated AWS infrastructure for mission-critical applications. You will combine software engineering practices with cloud operations to design, deploy, automate, and optimize large-scale production systems.
Key Responsibilities
AWS Cloud Architecture & Deployments
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Lead large-scale deployments across multi-account and multi-region environments
- Architect and optimize AWS solutions using:
- Compute/Containers: EC2, EKS, ECS, Lambda
- Networking/CDN/DNS: VPC, Route 53, CloudFront
- Data/Storage: RDS, DynamoDB, S3
- Security: IAM, KMS, Secrets Manager
- Implement solutions aligned with AWS Well-Architected best practices
Infrastructure as Code (Terraform)
- Develop and maintain reusable, modular Terraform code
- Build CI/CD-driven infrastructure pipelines
- Manage Terraform state securely (remote backends, locking, environment separation)
- Enforce policy-as-code and guardrails
- Review and optimize Terraform modules for performance and maintainability
Configuration Management & Automation (Ansible)
- Design and maintain Ansible playbooks and roles
- Automate configuration management and application deployments
- Integrate Ansible with CI/CD pipelines
- Ensure automation is idempotent, secure, and maintainable
Reliability & Operations
- Define and implement SLOs/SLIs/error budgets
- Lead incident response, root cause analysis (RCA), and postmortems
- Improve observability (logging, monitoring, tracing)
- Optimize performance, cost, and resilience
- Build self-healing, automation-first infrastructure
- Participate in recurring On-Call shifts
DevOps & CI/CD
- Maintain CI/CD pipelines for infrastructure and applications
- Promote GitOps workflows
- Integrate automated testing, security scanning, and compliance validation
Security & Compliance
- Implement least-privilege IAM policies
- Automate security controls within Terraform and Ansible
- Meet internal and regulatory compliance standards
- Apply infrastructure security best practices (network segmentation, encryption, patching)
Required Qualifications
- 4+ years in DevOps, Cloud Engineering, or Site Reliability Engineering
- 3+ years hands-on AWS experience in production environments
- Proficiency in at least one scripting/programming language (Python, Bash, or Go)
- Experience with monitoring/observability tools (Prometheus, Grafana, Datadog, etc.)
- Strong understanding of distributed systems and reliability engineering principles
- Deep expertise in:
- Terraform (advanced modules, workspaces, state management)
- Ansible (roles, playbooks, dynamic inventories)
- Strong experience with:
- CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, etc.)
- Kubernetes (EKS preferred)
- Linux systems administration
- Networking fundamentals (VPC design, DNS, load balancing)
Preferred Qualifications
- Multi-account AWS experience using Organizations and Control Tower
- Experience implementing GitOps workflows
- AWS certifications (e.g., Solutions Architect, DevOps Engineer)
- Cost optimization and FinOps experience
- Experience in highly regulated environments (HIPAA, SOC 2, PCI)
- Service mesh experience
Additional Notes
- Permanent work authorization in the United States is required; no visa sponsorship is available.
About GR8 People
GR8 People is a talent provider focused on helping enterprising companies attract and retain high-potential talent. The company emphasizes recruiting effectively and improving business performance through better talent relationships. It operates in the recruiting and staffing/services space.
Scraped 4/9/2026