Applied Control Researcher

LondonPosted 5 months ago

Full-timehybrid

Job Description

Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable.

THE OPPORTUNITY

Join our new AGI safety product team and help transform AI control research into practical tools that directly reduce risks from AI. As an applied control researcher, you’ll work closely with Marius (CEO & currently leads the monitoring efforts), other control researchers and product engineers.

We are currently building Watcher, a monitoring tool for coding agents. Our monitoring research agenda (more details coming soon) attempts to translate compute into safety at scale. You will join a small team and will have significant ability to shape the team & tech, and have the ability to earn responsibility quickly.

You will like this opportunity if you're passionate about using empirical research to make AI systems safer in practice. You enjoy the challenge of translating theoretical AI risks into concrete detection mechanisms. You thrive on rapid iteration and learning from data. You want your research to directly impact real-world AI safety.

KEY RESPONSIBILITIES

TLDR: you will design & implement control protocols (see e.g. [Greenblatt et al, 2023]) and test them on real-world production systems at scale.

Research & Development

- Systematically collect and catalog coding agent failure modes from real-world instances, our internal deployments, public examples, research literature, and theoretical predictions

- Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors

- Build and maintain evaluation frameworks to measure progress on monitoring capabilities

- Build and maintain high-quality datasets to train and test monitors on

- Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency

- Stay current with research on AI safety, agent failures, and detection methodologies

- Stay current with research into coding security and safety vulnerabilities

Monitor Design & Optimization

- Develop & maintain a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors)

- Experiment with different reasoning strategies and output formats to improve monitor reliability

- Design and test hierarchical monitoring architectures and ensemble approaches

- Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs

- Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification

Fine-tuning & Red-teaming

- Fine-tune open-source models to create efficient monitors for high-volume production environments

- Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes

- Build automated red-teaming pipelines that attack monitors at scale

- Design iterative adversarial games where a red-team and blue team continuously attack and defend respectively

Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable.

THE OPPORTUNITY

KEY RESPONSIBILITIES

TLDR: you will design & implement control protocols (see e.g. [Greenblatt et al, 2023]) and test them on real-world production systems at scale.

Research & Development

- Systematically collect and catalog coding agent failure modes from real-world instances, our internal deployments, public examples, research literature, and theoretical predictions

- Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors

- Build and maintain evaluation frameworks to measure progress on monitoring capabilities

- Build and maintain high-quality datasets to train and test monitors on

- Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency

- Stay current with research on AI safety, agent failures, and detection methodologies

- Stay current with research into coding security and safety vulnerabilities

Monitor Design & Optimization

- Develop & maintain a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors)

- Experiment with different reasoning strategies and output formats to improve monitor reliability

- Design and test hierarchical monitoring architectures and ensemble approaches

- Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs

- Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification

Fine-tuning & Red-teaming

- Fine-tune open-source models to create efficient monitors for high-volume production environments

- Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes

- Build automated red-teaming pipelines that attack monitors at scale

- Design iterative adversarial games where a red-team and blue team continuously attack and defend respectively

ABOUT THE TEAM

The Product team is a new team. Especially early on, you will work closely with Marius Hobbhahn (CEO & currently leads the monitoring team), Victor Gillioz (Research Scientist), Monika Jotautaitė (Research Scientist), and our product engineers: Jeremy Neiman, Zak Walters, Zen van Riel, and Srdjan Miletic. Furthermore you will interact with our other SWEs and researchers, since we intend to be “our own customer” by using our products internally for our research work. You can find our full team here.

ABOUT APOLLO RESEARCH

The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks. At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g., building evaluations), the science of scheming (e.g., model organisms), and scheming mitigations (e.g., anti-scheming and control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment or collaborate on scheming mitigations. At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.

We're now also developing tools and products (See Watcher) that make it easier to prevent harms from AI systems widely deployed AI systems.

Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.

HOW TO APPLY

Please complete the application form with your CV. The provision of a cover letter is neither required nor encouraged. Please also feel free to share links to relevant work samples.

About the interview process: Our multi-stage process includes a screening interview, a take-home test (approx. 2 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews, we suggest building simple monitors for coding agents and running them on your own Claude Code / Cursor / Codex / etc. traffic.

Your Privacy and Fairness in Our Recruitment Process: We are committed to protecting your data, ensuring fairness, and adhering to workplace fairness principles in our recruitment process. To enhance hiring efficiency, we use AI-powered tools to assist with tasks such as resume screening. These tools are designed and deployed in compliance with internationally recognized AI governance frameworks. Your personal data is handled securely and transparently. We adopt a human-centred approach: all resumes are screened by a human and final hiring decisions are made by our team. If you have questions about how your data is processed or wish to report concerns about fairness, please contact us at [email protected].

See Your Match Score

About Apollo Research

More jobs at Apollo Research

Finance Associate (Expression of Interest)

London

Governance Researcher (Expression of Interest)

London

Software Engineer (Infrastructure)

San Francisco

People Ops, Associate/Senior Associate (Expression of Interest)

London

Senior Security Engineer

London

Research Scientist/Engineer (Evaluations)

London