xelys jobs xelys jobs

Site Reliability Engineer

Starcom Consultant

full-remotemidfixed-termdevopsbackend Germany 2 days ago via LinkedIn

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

Site Reliability Engineering (SRE)GoPythonJavaPrometheusPromQLOpenTelemetryKubernetesGrafanaKafka

About the role

Site Reliability Engineer (SRE)

Location: Germany (Remote) Contract: Fixed-term (12 months)

Responsibilities

  • Design, build, and maintain observability platform components and integrations using Prometheus, Thanos, Grafana, OpenTelemetry, and streaming telemetry systems.
  • Contribute to architecture and technical design of scalable monitoring solutions on Kubernetes, Docker, and cloud-native environments.
  • Implement standardized instrumentation with OpenTelemetry (SDKs, collectors, exporters, agents).
  • Build and optimize telemetry pipelines for metrics, logs, and traces using Prometheus, OpenTelemetry Collector, Kafka/streaming pipelines, and time-series backends.
  • Create advanced PromQL queries, recording rules, and Alertmanager logic for complex monitoring scenarios.
  • Develop reusable Grafana dashboards and visualization templates (and Perses if applicable).
  • Automate deployments/config with Git, GitHub/GitLab, Jenkins, ArgoCD, Helm, and Infrastructure as Code practices.
  • Troubleshoot and optimize performance across collectors, exporters, storage backends, and query layers.
  • Support performance testing, load validation, and reliability analysis of observability components.
  • Collaborate with engineering/SRE teams to onboard services and improve telemetry coverage.
  • Document implementations, standards, and operational procedures.

Requirements

  • Strong programming experience in Go, Python, or Java (backend/platform focus).
  • Hands-on experience with the Prometheus ecosystem (Prometheus, Alertmanager, exporters, Pushgateway) and PromQL.
  • Experience implementing OpenTelemetry instrumentation, collectors, processors, and pipelines.
  • Strong knowledge of Kubernetes, containers, Helm, and microservices architecture.
  • CI/CD experience with Jenkins, GitHub Actions, GitLab CI, or Argo CD.
  • Understanding of distributed systems, performance tuning, debugging, and profiling.
  • Familiarity with streaming/messaging systems (e.g., Kafka) and time-series databases.
  • Experience building/integrating REST/gRPC APIs.
  • Proficiency with Git workflows, Bash/Python scripting, and automation frameworks.
  • Understanding of SNMP/exporters and infrastructure/device telemetry collection.
  • Awareness of security topics such as RBAC, secret management, and compliance requirements.

Nice-to-haves

  • Experience with Thanos and Perses (if applicable) for observability workflows.

About Starcom Consultant

Starcom Consultant provides consulting services, supporting organizations with software engineering and infrastructure/operations capabilities. The role indicates involvement in cloud-native platform and observability/SRE implementations across monitoring and telemetry tooling.

Scraped 4/16/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.