xelys jobs xelys jobs

Staff Backend Engineer

Grafana Labs

full-remoteleadpermanentbackendengineering-management Full remote Today via WTTJ

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

Tags

KafkaPrometheusGrafanaGoSLOs/SLIsDistributed SystemsObservabilityKubernetesMicroservicesAI-Assisted Development

About the role

Role Overview

Join Grafana Labs as a Staff Backend Engineer in the Adaptive Telemetry group. You will drive technical strategy across architecture and performance, lead cross-functional initiatives, and mentor engineering talent—while improving observability, reliability, and automation.

Key Missions

  • Drive technical strategy and roadmap: define the architectural vision and prioritize work that unlocks major improvements.
  • Lead end-to-end delivery: own planning, design, execution, deployment, and long-term operations for important cross-cutting projects.
  • Own architecture, reliability, performance, and cost: make pragmatic architecture decisions balancing scalability, availability, latency, and cost.
  • Define and improve SLOs/SLIs: establish reliability targets and execute reliability work end-to-end.
  • Improve observability and automation: enhance monitoring/visibility and operational tooling to increase effectiveness.
  • Influence stakeholders: align cross-functional teams and represent engineering internally and externally.
  • Mentor talent: contribute to engineering growth through mentoring and technical leadership.

Requirements

  • Experience with messaging and telemetry, including familiarity with streaming/messaging systems (e.g., Kafka) and observability tooling (e.g., Prometheus/Grafana or equivalents).
  • Strong coding and software design skills; ability to lead technical designs and write clear, maintainable, well-tested code. (The role uses Go, but experience with Go or similar languages like Python/C/C++/Rust translates well.)
  • Comfort with AI-assisted development and the ability to incorporate AI-powered developer tools into team workflows.
  • Ability to influence without authority in a remote-first environment; strong written and verbal communication.
  • Reliability and performance ownership: define SLOs/SLIs, do capacity planning, tune performance, and drive reliability improvements.
  • Strong systems design instincts and deep understanding of tradeoffs (latency, consistency, availability, scaling, cost).
  • Hands-on cloud/platform experience with cloud-native architectures (microservices, containers/Kubernetes, Infrastructure as Code) and operational practices.
  • Proven delivery of large distributed systems across multiple teams with evidence of technical leadership and impact.

Nice-to-haves

  • Practical experience integrating AI/agentic tools into engineering workflows.
  • Demonstrated experience with operational practices for keeping distributed systems healthy.

About Grafana Labs

Grafana Labs builds observability platforms and tools that help teams monitor, understand, and operate their systems. It is known for open-source-driven telemetry and monitoring solutions, including the Grafana ecosystem, and operates in the software/DevOps and data observability space.

Scraped 5/12/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.