Role Overview

Senior Software Engineer — AI Observability (Senior AI Engineer, Observability) You’ll join a product delivery team to ensure AI-powered features (RAG pipelines, semantic search, and agentic workflows) are instrumented, evaluated, and monitored with production-grade rigor. This is a hybrid engineering + platform role focused on turning trace data into actionable quality signals and scaling observability practices across product lines.

What You’ll Do

Instrumentation & Integration

Partner with product teams to instrument LLM, RAG, and agent workflows into observability platforms (e.g., Langfuse, Arize)
Define and enforce standards for tracing, metadata, and token tracking
Build shared SDKs/libraries to make correct instrumentation easier
Integrate observability into CI/CD to surface quality signals before production

Evaluation & Dataset Development

Define AI quality metrics with product/engineering teams
Build and maintain versioned “golden” datasets for real-world and edge cases
Implement evaluation pipelines including LLM-as-judge, heuristics, and human feedback
Establish prompt regression testing and support A/B experimentation

Monitoring, Cost & Incident Response

Own dashboards and alerts for latency, cost, quality, and failure signals
Implement cost controls (e.g., budgeting, caching, rate limiting, usage visibility)
Monitor guardrails and content safety as a distinct signal
Proactively surface issues and track model/provider changes
Maintain runbooks for common LLM failure modes and incidents
Deliver regular AI quality reports with business insights

Platform & Enablement

Administer and evolve the LLMOps platform (access, environments, integrations)
Evaluate tools to improve efficiency and quality
Ensure compliance (e.g., SOC 2, ISO 27001, PII handling, data residency)
Scale observability practices via reusable patterns
Share knowledge via documentation, workshops, and guidance

Requirements

5+ years of production software engineering experience
Hands-on experience with LLM-powered features or AI pipelines (instrumentation, evaluation, or monitoring)
Experience with LLMOps/observability tools (e.g., Langfuse, Arize, W&B, LangSmith)
Solid understanding of RAG, prompt engineering, vector search, and orchestration frameworks (e.g., LangChain, Semantic Kernel)
Experience designing LLM evaluations (datasets, scoring, LLM-as-judge, regression testing)
Proficiency in Python and/or C#/.NET; able to work across production codebases
Strong observability fundamentals (tracing, logging, metrics, alerting)

Nice-to-haves / Implied Fit

Strong AI quality mindset: not only shipping, but consistent performance, graceful degradation, cost control, and continuous improvement
Experience scaling standards across multiple product lines

Senior Software Engineer

Tags

About the role