xelys jobs xelys jobs

Staff Site Reliability Engineer

Stellar Cyber

full-remoteleadpermanentbackenddevops Full remote 2 days ago via WTTJ

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability EngineeringKubernetesTerraformCI/CDArgoCDObservabilityPrometheusGrafanaLokiIncident Management

About the role

Staff Site Reliability Engineer (SRE)

Join Stellar Cyber to drive reliability, scalability, and efficiency across production systems.

Responsibilities

  • Administer and maintain Kubernetes/container orchestration platforms and containerized workloads to ensure high availability and resilience.
  • Improve observability by enhancing monitoring, logging, and alerting across systems and data platforms.
  • Build and maintain CI/CD pipelines for efficient and reliable deployments, applying Infrastructure as Code (IaC) practices.
  • Lead or influence architecture, tooling, and SRE best practices as a senior member of the team.
  • Own production on-call operations, incident management, and reliability-focused culture.

Requirements

  • 5+ years in Site Reliability Engineering, DevOps, or Platform Engineering.
  • Advanced Kubernetes administration and troubleshooting.
  • Deep understanding of IaC (e.g., Terraform, Helm).
  • Experience with CI/CD tools such as GitHub Actions, Bitbucket, and ArgoCD.
  • Strong observability: Prometheus, Grafana, Loki, Alertmanager.
  • Strong production incident management/on-call experience.
  • Expertise operating data platforms including Elasticsearch and MongoDB, plus other listed systems.
  • Strong distributed systems, databases, networking, and Linux administration background.
  • Automation/programming skills in Python and Bash.
  • Proven success operating large-scale production systems in public cloud environments (AWS/GCP/Azure/OCI).
  • Excellent problem-solving, communication, and leadership skills.

Nice-to-haves

  • Knowledge of AI agents for auto-triaging alerts and correlating signals to form/root-cause hypotheses.
  • Experience with chat-based operations interfaces and/or auto-remediation controllers using AI agentic frameworks.
  • Certifications in AWS/GCP/Observability/Linux/Kubernetes.

Location

  • Full remote

About Stellar Cyber

Stellar Cyber is a technology company focused on operational excellence and reliable production systems. The role described centers on building and operating scalable cloud infrastructure, observability, and deployment pipelines for mission-critical platforms in the cyber/AI data space.

Scraped 5/15/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.