Back to jobs
I

Research Scientist, Agentic Data & Benchmarking

Sunnyvale, CA$150K - $450KPosted 1 weeks ago
Full-timeremote

Job Description

About the Institute of Foundation Models 

The Institute of Foundation Models (IFM) is a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. 

As part of our team, you'll work at the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You'll help build groundbreaking AI systems with the potential to reshape entire industries, and contribute to establishing MBZUAI as a global hub for high-performance computing and deep learning. 

About the role 

The Agents team trains advanced agentic language models that use reasoning and tool use to complete real tasks on a computer. This is a specialist role at the center of the loop that drives those models: the data we train on and the benchmarks we measure against. 

You'll own the agentic data pipeline end-to-end — sourcing and generating high-quality trajectories, tool-use data, and RL environments — and the evaluation suite that tells us, rigorously and reproducibly, what our agents can actually do. These two halves are inseparable: benchmarks expose where models fail, and targeted data closes the gap. The agents are only as good as the data they learn from and the evals that keep us honest, and this role owns both. 

This is a research scientist position for someone who wants depth in data and measurement rather than breadth across the whole stack. You should be the kind of person who reads through datasets line by line, distrusts a metric until it's been validated, and gets satisfaction from making an eval suite that nobody questions. 

About the Institute of Foundation Models 

The Institute of Foundation Models (IFM) is a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy. 

As part of our team, you'll work at the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You'll help build groundbreaking AI systems with the potential to reshape entire industries, and contribute to establishing MBZUAI as a global hub for high-performance computing and deep learning. 

About the role 

The Agents team trains advanced agentic language models that use reasoning and tool use to complete real tasks on a computer. This is a specialist role at the center of the loop that drives those models: the data we train on and the benchmarks we measure against. 

You'll own the agentic data pipeline end-to-end — sourcing and generating high-quality trajectories, tool-use data, and RL environments — and the evaluation suite that tells us, rigorously and reproducibly, what our agents can actually do. These two halves are inseparable: benchmarks expose where models fail, and targeted data closes the gap. The agents are only as good as the data they learn from and the evals that keep us honest, and this role owns both. 

This is a research scientist position for someone who wants depth in data and measurement rather than breadth across the whole stack. You should be the kind of person who reads through datasets line by line, distrusts a metric until it's been validated, and gets satisfaction from making an eval suite that nobody questions. 

We encourage you to apply even if you don't meet every qualification listed. Strong candidates rarely match every line, and we'd rather hear from you than have you rule yourself out. 

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Research Scientist, Agentic Data & Benchmarking at Institute of Foundation Models | Renata