Site Reliability Engineer
GR8 People
nullmidpermanentdevopsbackend United States 45 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)AWSTerraformAnsibleCI/CDKubernetesPrometheusGrafanaDistributed SystemsFinOps
About the role
Role Overview
Site Reliability Engineer (SRE) focused on building resilient, scalable, secure, and highly automated AWS infrastructure for mission-critical applications. You will combine software engineering practices with cloud operations to design, deploy, automate, and optimize large-scale production systems.
Key Responsibilities
AWS Cloud Architecture & Deployments
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Lead large-scale deployments across multi-account and multi-region environments
- Architect and optimize AWS solutions using:
- Compute/Containers: EC2, EKS, ECS, Lambda
- Networking/CDN/DNS: VPC, Route 53, CloudFront
- Data/Storage: RDS, DynamoDB, S3
- Security: IAM, KMS, Secrets Manager
- Implement solutions aligned with AWS Well-Architected best practices
Infrastructure as Code (Terraform)
- Develop and maintain reusable, modular Terraform code
- Build CI/CD-driven infrastructure pipelines
- Manage Terraform state securely (remote backends, locking, environment separation)
- Enforce policy-as-code and guardrails
- Review and optimize Terraform modules for performance and maintainability
Configuration Management & Automation (Ansible)
- Design and maintain Ansible playbooks and roles
- Automate configuration management and application deployments
- Integrate Ansible with CI/CD pipelines
- Ensure automation is idempotent, secure, and maintainable
Reliability & Operations
- Define and implement SLOs/SLIs/error budgets
- Lead incident response, root cause analysis (RCA), and postmortems
- Improve observability (logging, monitoring, tracing)
- Optimize performance, cost, and resilience
- Build self-healing, automation-first infrastructure
- Participate in recurring On-Call shifts
DevOps & CI/CD
- Maintain CI/CD pipelines for infrastructure and applications
- Promote GitOps workflows
- Integrate automated testing, security scanning, and compliance validation
Security & Compliance
- Implement least-privilege IAM policies
- Automate security controls within Terraform and Ansible
- Meet internal and regulatory compliance standards
- Apply infrastructure security best practices (network segmentation, encryption, patching)
Required Qualifications
- 4+ years in DevOps, Cloud Engineering, or Site Reliability Engineering
- 3+ years hands-on AWS experience in production environments
- Proficiency in at least one scripting/programming language (Python, Bash, or Go)
- Experience with monitoring/observability tools (Prometheus, Grafana, Datadog, etc.)
- Strong understanding of distributed systems and reliability engineering principles
- Deep expertise in:
- Terraform (advanced modules, workspaces, state management)
- Ansible (roles, playbooks, dynamic inventories)
- Strong experience with:
- CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, etc.)
- Kubernetes (EKS preferred)
- Linux systems administration
- Networking fundamentals (VPC design, DNS, load balancing)
Preferred Qualifications
- Multi-account AWS experience using Organizations and Control Tower
- Experience implementing GitOps workflows
- AWS certifications (e.g., Solutions Architect, DevOps Engineer)
- Cost optimization and FinOps experience
- Experience in highly regulated environments (HIPAA, SOC 2, PCI)
- Service mesh experience
Additional Notes
- Permanent work authorization in the United States is required; no visa sponsorship is available.
About GR8 People
GR8 People is a talent provider focused on helping enterprising companies attract and retain high-potential talent. The company emphasizes recruiting effectively and improving business performance through better talent relationships. It operates in the recruiting and staffing/services space.
Scraped 4/9/2026