Site Reliability Engineer, Core Streaming (Remote - United States)
Yelp
full-remoteseniorpermanentbackenddevops San Francisco, CA 2 days ago via LinkedIn
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
KafkaJavaPythonLinuxApache FlinkEvent StreamingDistributed SystemsCI/CDIaCSRE
About the role
Role overview
Yelp is hiring a Site Reliability Engineer (SRE) specializing in Kafka to design, deploy, and operate large-scale, resilient event streaming systems. You’ll help maintain low-latency, always-on streaming infrastructure that supports Yelp’s real-time data processing and event-driven applications.
Responsibilities
- Design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments.
- Collaborate with engineers to enable new features and ensure pipeline reliability for real-time data processing.
- Execute and automate Kafka cluster upgrades, migrations, and major version rollouts with minimal impact.
- Build/enhance self-service and automation for cluster operations, scaling, and incident recovery.
- Troubleshoot complex issues affecting data flow, performance, or stability and lead root cause analyses.
- Participate in on-call rotations using a “follow-the-sun” model (not 24/7 personal on-call).
Requirements
- Hands-on experience implementing large-scale Kafka event streaming in production across hybrid/multi-cloud and Linux.
- In-depth knowledge of event streaming/data-in-motion architecture and operational practices.
- Programming proficiency in Java, Python, or similar languages for tooling/integration/automation.
- Familiarity with Kafka Client APIs (Producer/Consumer/Streams), including sizing/capacity planning for high-throughput clusters.
- Experience designing/optimizing real-time streaming solutions with Apache Flink.
- Experience automating operations with configuration management, IaC, and/or scripting.
- Bachelor’s degree or equivalent experience.
Nice-to-haves
- Strong initiative and problem-solving mindset focused on automation, reliability, and infrastructure best practices in a fast-paced environment.
About Yelp
Yelp is a consumer internet company that helps people find great local businesses. It operates large-scale, data-intensive services and relies on distributed systems and real-time data streaming to power critical platform functions.
Scraped 6/15/2026