Site Reliability Engineer
Starcom Consultant
full-remotemidfixed-termdevopsbackend Germany 2 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Site Reliability Engineering (SRE)GoPythonJavaPrometheusPromQLOpenTelemetryKubernetesGrafanaKafka
About the role
Site Reliability Engineer (SRE)
Location: Germany (Remote) Contract: Fixed-term (12 months)
Responsibilities
- Design, build, and maintain observability platform components and integrations using Prometheus, Thanos, Grafana, OpenTelemetry, and streaming telemetry systems.
- Contribute to architecture and technical design of scalable monitoring solutions on Kubernetes, Docker, and cloud-native environments.
- Implement standardized instrumentation with OpenTelemetry (SDKs, collectors, exporters, agents).
- Build and optimize telemetry pipelines for metrics, logs, and traces using Prometheus, OpenTelemetry Collector, Kafka/streaming pipelines, and time-series backends.
- Create advanced PromQL queries, recording rules, and Alertmanager logic for complex monitoring scenarios.
- Develop reusable Grafana dashboards and visualization templates (and Perses if applicable).
- Automate deployments/config with Git, GitHub/GitLab, Jenkins, ArgoCD, Helm, and Infrastructure as Code practices.
- Troubleshoot and optimize performance across collectors, exporters, storage backends, and query layers.
- Support performance testing, load validation, and reliability analysis of observability components.
- Collaborate with engineering/SRE teams to onboard services and improve telemetry coverage.
- Document implementations, standards, and operational procedures.
Requirements
- Strong programming experience in Go, Python, or Java (backend/platform focus).
- Hands-on experience with the Prometheus ecosystem (Prometheus, Alertmanager, exporters, Pushgateway) and PromQL.
- Experience implementing OpenTelemetry instrumentation, collectors, processors, and pipelines.
- Strong knowledge of Kubernetes, containers, Helm, and microservices architecture.
- CI/CD experience with Jenkins, GitHub Actions, GitLab CI, or Argo CD.
- Understanding of distributed systems, performance tuning, debugging, and profiling.
- Familiarity with streaming/messaging systems (e.g., Kafka) and time-series databases.
- Experience building/integrating REST/gRPC APIs.
- Proficiency with Git workflows, Bash/Python scripting, and automation frameworks.
- Understanding of SNMP/exporters and infrastructure/device telemetry collection.
- Awareness of security topics such as RBAC, secret management, and compliance requirements.
Nice-to-haves
- Experience with Thanos and Perses (if applicable) for observability workflows.
About Starcom Consultant
Starcom Consultant provides consulting services, supporting organizations with software engineering and infrastructure/operations capabilities. The role indicates involvement in cloud-native platform and observability/SRE implementations across monitoring and telemetry tooling.
Scraped 4/16/2026