xelys jobs xelys jobs

AI Evaluation Engineer

Distyl

Full remote Today via WTTJ

See how well this job matches your profile

Sign up to get an AI match score and generate a tailored application in seconds.

Get your match score

About the role

Join Distyl, a company that builds AI systems using Evaluation-Driven Development. As an AI Evaluation Engineer, you will design and implement evaluation frameworks, build and maintain test cases, develop evaluation pipelines, and work closely with various teams to guide system development and deployment. You should have strong Python engineering skills, experience with evaluation-driven development, a systems-oriented mindset, and at least 2 years of software engineering experience. Key missions: Design and implement evaluation frameworks that enable Evaluation-Driven Development for AI systems deployed in customer environments.. Build and maintain golden test cases and regression suites in Python, using both human-authored and AI-assisted test generation to capture critical behaviors and edge cases.. Develop and maintain evaluation pipelines—offline and online—that integrate directly into system iteration loops, ensuring that system changes are driven by measurable improvements. Profile: - This role is for engineers who believe that AI systems only improve when measurement is tightly coupled to development—and who want to apply that philosophy directly to systems that matter - Strong Python Engineering Skills: Write clean, maintainable Python and are comfortable building evaluation and experimentation pipelines that run in production environments. You treat evaluation code with the same rigor as application code - Experience with Evaluation-Driven or Experiment-Driven Development: Experience using structured evaluation or experimentation frameworks to drive system iteration, and understand the pitfalls of overfitting to metrics that don’t reflect real outcomes - Systems-Oriented Mindset: Understand how evaluation interacts with prompts, agents, data, and deployment. You design evaluation systems that support fast iteration while maintaining trust and safety in production - 2+ years of software engineering experience - Travel: Ability to travel 25-50% - AI-Native Working Style: Use AI tools to generate tests, analyze failures, explore edge cases, and accelerate debugging and iteration - Ability to Translate Human Judgment into Code: Work with subject matter experts to elicit high-quality judgments and encode them into test cases, scoring functions, and graders that scale

Scraped 5/12/2026

xelys jobs xelys jobs

Built for remote job seekers. Powered by AI.