xelys jobs xelys jobs

Senior Site Reliability Engineer / Platform Engineer

SimScale

full-remoteseniorpermanentdevops Full remote - Munich, DE Today via WTTJ

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

AWSEKSKubernetesTerraformArgo CDOpenTelemetryPrometheusSLOs/SLIsDistributed SystemsSOC 2

About the role

Role Overview

Join SimScale as a Senior Site Reliability Engineer / Platform Engineer. This is a hands-on role where you will own and improve the organization’s cloud infrastructure, expand observability across teams, and help shape multi-region architecture.

Key Responsibilities

  • Own and improve cloud infrastructure across areas including AWS + EKS, observability, disaster recovery, and security/compliance controls.
  • Build standards, guardrails, and self-service tooling so engineering teams can safely run workloads on AWS.
  • Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics.
  • Help teams define meaningful SLOs/SLIs and improve reliability based on that data.
  • Collaborate with a small infrastructure team supporting 50+ engineers.

Requirements

  • Strong foundation in Linux internals and distributed systems to debug production behavior.
  • Software development background and ability to write production-quality code in at least one of:
    • Python, Go, Rust, or Java
  • Security and compliance awareness (e.g., impact on access control, auditability, disaster recovery, logging, and SOC 2).
  • Deep experience in production incident debugging, clear incident communication, and converting findings into durable improvements.
  • Hands-on cloud/platform experience including:
    • AWS (or GCP)
    • Terraform (declarative infrastructure)
    • Argo CD (GitOps workflow)
    • Kubernetes (container orchestration)
  • 5+ years professional experience in SRE, platform, or infrastructure engineering.
  • Clear communication and ability to explain trade-offs and enable adoption without unnecessary friction.
  • Observability/reliability experience with:
    • OpenTelemetry
    • Prometheus
    • distributed tracing
    • monitoring and SLOs/SLIs
  • Open source portfolio or contributions.

Nice to Have

  • Prior technical leadership experience, especially in infrastructure, reliability, or platform engineering.

Benefits (Highlights)

Mobile working, competitive health benefits, discounted gym membership, flexible hours, learning & development opportunities, child care contributions, and a retirement plan.

About SimScale

SimScale is a browser-based simulation platform company focused on providing simulation capabilities through web technology. The role supports engineering reliability and platform infrastructure that underpins large-scale, multi-region cloud workloads.

Scraped 6/20/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.