Staff AI Systems Engineer
Flock Safety
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreAbout the role
Join our team as a Staff AI Systems Engineer, where you'll play a crucial role in developing Night Shift, an AI copilot designed to enhance the efficiency of investigators. You'll work closely with the Machine Learning team and various engineering partners in a fast-paced environment. Your responsibilities will include shaping the system architecture for agentic AI, owning the AI evaluation framework, and driving measurable improvements in lead accuracy and speed for law enforcement officers. Key missions: Contribuer à la conception et à l'architecture du système pour l'IA agentique, en possédant le cadre d'évaluation de l'IA.. Livrer le MVP du cadre d'évaluation pour produire des métriques initiales, permettre le débogage et effectuer des tests de régression.. Productionnaliser la plateforme d'évaluation et d'observabilité, en faisant d'elle la source de vérité pour la qualité et la sécurité. Profile: - ML Platform expertise: 5+ years building and shipping ML/LLM systems to production; experience in the following areas: - Data (ClickHouse, Postgres, Redis) - Observability (Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse) - ML Inference (PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains (text/image/video) - Web services (Express/FastAPI, REST, SSE, JWTs) - Backend JS (e.g. NodeJS) familiarity required; Typescript and Python familiarity welcome - Compute orchestration (Kubernetes, Prefect, Ray) - LLM Inference (LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs) - Cloud Infrastructure (AWS, Terraform, VPC, Networking) - Agent Design: tool use (via MCP), retrieval, memory, grounding/attribution for claims, and guardrails - Safety & robustness (security, compliance, red-teaming, regression testing) - Cost, performance and latency trade-offs - Search, retrieval, and recommendation performance - Agentic task success, trajectory quality, preference learning (SFT, DPO, RLHF, LLM-as-judge) - Architectural patterns: planning and hand-off for multi-agent systems, context management - Experience with LLM Evaluations at scale: You’ve built offline/online eval harnesses and are familiar with the methodologies and metrics to measure: - Familiarity with Agentic Systems: Hands-on experience with LLM agents including: - RAG: vector/hybrid search (e.g. pgvector, turbopuffer, chroma), re-rankers (e.g. Cohere, JinaAI) - Feeling uneasy that you haven’t ticked every box? That’s okay; we’ve felt that way too. Studies have shown women and minorities are less likely to apply unless they meet all qualifications. We encourage you to break the status quo and apply to roles that would make you excited to come to work every day - If you’re excited to build AI that tangibly amplifies real-world public safety outcomes—and you love making complex systems measurable, dependable, and fast—we’d love to talk
Scraped 5/12/2026