Machine Learning Engineer | Remote
Crossing Hurdles
full-remoteseniorcontractbackend United States 2 days ago via LinkedIn
140,000 - 237,600 USD/annual
See how well this job matches your profile
Sign up to get an AI match score and generate a tailored application in seconds.
Get your match scoreTags
Machine LearningPythonBenchmarkingAgentic AIReasoning TracesEvaluationFinanceData ScienceReproducible ExperimentsSTEM
About the role
Role Overview
Machine Learning Engineer (PhD Rater) — Part-time, remote
- Compensation: $70–$120/hour
- Commitment: 30+ hours/week (primarily weekdays)
Responsibilities
- Design challenging, real-world STEM benchmark problems across domains such as:
- Data science
- Machine learning
- Finance
- Software engineering
- Implement benchmark tasks in an agentic development environment using Python
- Create reproducible problem setups with clear specifications and executable tests
- Evaluate and analyze AI model behavior, including:
- Reasoning traces
- Agent workflows
- Diagnose reasoning failures, logic gaps, and problem-solving limitations in AI systems
- Help improve benchmark quality and evaluation frameworks for frontier AI models
Requirements
- Active or recently graduated PhD
- Deep expertise in data science and machine learning, and/or finance and Python-based software development
- Strong research background in advanced STEM topics
- Ability to reliably commit 30+ hours/week
- Demonstrated technical output (e.g., high-quality open-source contributions or research work)
- Ability to analyze agent behavior traces and diagnose failures beyond surface-level errors
Application Process
- Upload resume
- Complete interview process / submit form
About Crossing Hurdles
Crossing Hurdles builds benchmark and evaluation resources for frontier AI models. The company focuses on creating challenging, real-world STEM tasks and tools to measure and analyze AI behavior, particularly in agentic development settings.
Scraped 4/1/2026