Back to jobs
T

Research Engineer - Scalable Interpretability

San Francisco, USAPosted 4 days ago
Full-timeremote

Job Description

Salary range: $250,000 - $500,000/year + benefits

Description: Transluce is a non-profit research lab building tools for scalable, end-to-end oversight of AI systems. We build world-class, AI-backed analysis tools and use these to set industry standards for evaluation. Our tools are integrated with core agent benchmarks like SWE-bench, while our evaluations are directly underpinning regulation, including our role as EU AI Office’s main evaluation developer for harmful manipulation risks.

About the role: We are looking for strong scientists and engineers to help advance our vision of scalable end-to-end oversight assistants, building on our recent advances such as predictive concept decoders and user model extractors. As part of our highly collaborative team, you will learn and grow quickly, creating technology at the frontier of AI research and with high direct impact.

Core responsibility: Help us develop and train scalable interpretability assistants that can predict and detect unexpected and subtle behaviors from models’ activations. This includes:
  • Creating diverse evaluations that range in difficulty. This involves finding naturally occurring interesting and undesirable behaviors exhibited by open-source models.
  • Developing novel architectures and objectives for training interpretability assistants.
  • Scaling up the training and inference pipelines to support up to 1T-scale models.

Qualities of a strong candidate:
  • Experience with fine-tuning language models, designing new architectures, and creating evaluations.
  • Reliable results: good experimental design, epistemic self-awareness and transparency
  • Generativeness: coming up with original, productive ideas for unblocking progress
  • Curiosity: a desire to understand ML systems and how they work
  • Strong programming ability, including navigating trade-offs between prototyping speed and maintainability
  • Strong communication skills, low ego, openness to giving and receiving feedback

We are located in San Francisco and enthusiastic to work together in-person. We are open to sponsoring international visas.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Research Engineer - Scalable Interpretability at Transluce | Renata