About the role

Role overview

Join Anthropic’s Environment Scaling team to improve the intelligence of public models for novel verticals and use cases. As a Research Engineer, you will build and iterate on reinforcement learning (RL) environments, measure their impact on model performance, and collaborate closely with domain experts.

Key missions

Own the end-to-end creation of RL environments for new capabilities, including:
- Identifying high-value tasks
- Designing reward signals
Manage technical relationships with external data vendors, including:
- Evaluating data quality
- Informing reward design
Collaborate with domain experts to design:
- Data pipelines and evaluations
- Novel approaches for creating RL environments for high-value tasks
Explore novel RL-environment creation methods and develop QA frameworks for evaluation quality.

Requirements

Bachelor’s degree in a related field or equivalent experience
Comfort managing technical vendor relationships and iterating quickly on feedback
Strong project management and interpersonal skills
Motivated by a mix of ML research, data operations, and project management
Domain expertise in an area where models should become more useful
Experience with fine-tuning large language models for specific domains or real-world use cases
Familiarity with distributed systems and cloud infrastructure
Experience with reinforcement learning, reward design, and/or training data curation for LLMs
Ability to read and analyze datasets to understand them and spot issues
Experience working with external vendors/technical partners
Value-driven mindset focused on making AI more useful and accessible
Experience training production ML systems

Remote / location policy

Listed as full remote, but the posting also states a hybrid expectation: staff should be in an office at least 25% of the time (some roles may require more).

About Anthropic

Anthropic is an AI company focused on building public models. The company works on advancing model capabilities and making AI more useful across industries, including through research and evaluation of model performance in real-world settings.