xelys jobs xelys jobs

Senior Site Reliability Engineer (SRE) Team Lead

XP Venture Labs

seniorpermanentbackenddevops Canada 2 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringAWSKubernetesDockerTerraformObservabilityIncident ResponseSLO/SLICI/CDInfrastructure as Code

About the role

Role Overview

Senior Site Reliability Engineer (SRE) Team Lead at XP Venture Labs. You will ensure the reliability, scalability, performance, and security of production systems while leading and mentoring an SRE team. The role blends hands-on SRE expertise with leadership and cross-functional collaboration to proactively prevent incidents and drive measurable reliability improvements.

Responsibilities

  • Own reliability, availability, and performance of production systems
  • Define and manage SLAs, SLOs, SLIs, and error budgets
  • Build and evolve monitoring, logging, and observability standards and metrics
  • Lead incident response, postmortems, and root cause analysis to reduce recurrence and improve MTTR
  • Architect and maintain scalable, highly available cloud infrastructure
  • Champion Infrastructure-as-Code (IaC), automation, and CI/CD best practices
  • Establish capacity planning and performance optimization strategies
  • Mentor and develop an SRE team; set on-call and operational excellence standards
  • Partner with Engineering, DevOps, Security, and Product to embed reliability into the SDLC
  • Evaluate and implement new tools/technologies/frameworks to improve resilience and efficiency

Requirements

  • Deep expertise in AWS, including services such as EC2, ECS/EKS, Lambda, RDS, DynamoDB, S3, IAM, VPC, and networking
  • Strong experience with Docker and Kubernetes for containerized applications
  • Windows Server and IIS administration experience, plus PowerShell for Windows/legacy automation
  • Experience with MS SQL Server performance tuning
  • Performance monitoring experience in a .NET environment, including Angular + C# applications and backend services
  • Advanced IaC experience with Terraform, AWS CloudFormation, and AWS SAM
  • Proven ability to architect secure, scalable, highly available AWS environments
  • Experience deploying and operating serverless and event-driven architectures using AWS Lambda

Leadership & Collaboration

  • Lead incident and reliability processes with measurable outcomes
  • Mentor a high-performing SRE team and drive operational standards
  • Work closely with cross-functional teams to improve reliability across the SDLC

About XP Venture Labs

XP Venture Labs partners with ambitious companies to solve complex technology challenges and accelerate growth. The firm embeds engineering teams as strategic partners, focusing on scalable systems, platform modernization, reliability improvements, and high-impact technical decisions across cloud and distributed systems.

Scraped 4/16/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.