**Code-Data Eval Author — Machine Learning Engineer** (Mercor · remote contract) Mercor partners with frontier AI labs to build the evaluations their models are trained and measured against. You'll design ML/LLM evaluation tasks and rubrics and grade model/agent outputs — your training-side knowledge directly shapes reward and eval signals. **What you'll do** - Design ML/LLM evaluation tasks, rubrics, and metrics - Grade model/agent outputs and improve eval quality through review - Bring training-side judgment (SFT / RLHF / reward modeling) to eval design **You are** - ~5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evals - Ideally fluent in SFT / RLHF / reward modeling / eval metrics (rare, high-leverage here) - PyTorch/JAX, Hugging Face, experiment tracking; clear written communication **Engagement & pay** - Remote contract, flexible 30+ hrs/week - Hourly rate set to your local market (e.g., US/Canada $100–140/hr; Europe and LatAm scaled to region) **Hiring process — paid** A short Mercor Technical Screen, a live Code Review Session, and a Domain Expert Interview. You're paid $200 for completing all three, regardless of outcome.

Code-Data Eval Author — Machine Learning Engineer (Pilot)

Job Description

See Your Match Score

More jobs at Mercor

More jobs at Mercor