Staff Backend Engineer
Grafana Labs
full-remoteleadpermanentbackendengineering-management Full remote Today via WTTJ
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
KafkaPrometheusGrafanaGoSLOs/SLIsDistributed SystemsObservabilityKubernetesMicroservicesAI-Assisted Development
About the role
Role Overview
Join Grafana Labs as a Staff Backend Engineer in the Adaptive Telemetry group. You will drive technical strategy across architecture and performance, lead cross-functional initiatives, and mentor engineering talent—while improving observability, reliability, and automation.
Key Missions
- Drive technical strategy and roadmap: define the architectural vision and prioritize work that unlocks major improvements.
- Lead end-to-end delivery: own planning, design, execution, deployment, and long-term operations for important cross-cutting projects.
- Own architecture, reliability, performance, and cost: make pragmatic architecture decisions balancing scalability, availability, latency, and cost.
- Define and improve SLOs/SLIs: establish reliability targets and execute reliability work end-to-end.
- Improve observability and automation: enhance monitoring/visibility and operational tooling to increase effectiveness.
- Influence stakeholders: align cross-functional teams and represent engineering internally and externally.
- Mentor talent: contribute to engineering growth through mentoring and technical leadership.
Requirements
- Experience with messaging and telemetry, including familiarity with streaming/messaging systems (e.g., Kafka) and observability tooling (e.g., Prometheus/Grafana or equivalents).
- Strong coding and software design skills; ability to lead technical designs and write clear, maintainable, well-tested code. (The role uses Go, but experience with Go or similar languages like Python/C/C++/Rust translates well.)
- Comfort with AI-assisted development and the ability to incorporate AI-powered developer tools into team workflows.
- Ability to influence without authority in a remote-first environment; strong written and verbal communication.
- Reliability and performance ownership: define SLOs/SLIs, do capacity planning, tune performance, and drive reliability improvements.
- Strong systems design instincts and deep understanding of tradeoffs (latency, consistency, availability, scaling, cost).
- Hands-on cloud/platform experience with cloud-native architectures (microservices, containers/Kubernetes, Infrastructure as Code) and operational practices.
- Proven delivery of large distributed systems across multiple teams with evidence of technical leadership and impact.
Nice-to-haves
- Practical experience integrating AI/agentic tools into engineering workflows.
- Demonstrated experience with operational practices for keeping distributed systems healthy.
About Grafana Labs
Grafana Labs builds observability platforms and tools that help teams monitor, understand, and operate their systems. It is known for open-source-driven telemetry and monitoring solutions, including the Grafana ecosystem, and operates in the software/DevOps and data observability space.
Scraped 5/12/2026