Salary range: $250,000 - $500,000/year + benefits

Description: Transluce is a non-profit research lab building tools for scalable, end-to-end oversight of AI systems. We build world-class, AI-backed analysis tools and use these to set industry standards for evaluation. Our tools are integrated with core agent benchmarks like SWE-bench, while our evaluations are directly underpinning regulation, including our role as EU AI Office’s main evaluation developer for harmful manipulation risks.

About the role: We are looking for strong scientists and engineers to help advance our vision of scalable end-to-end oversight assistants, building on our recent advances such as predictive concept decoders and user model extractors. As part of our highly collaborative team, you will learn and grow quickly, creating technology at the frontier of AI research and with high direct impact.

Core responsibility: Help us develop and train scalable interpretability assistants that can predict and detect unexpected and subtle behaviors from models’ activations. This includes:

Creating diverse evaluations that range in difficulty. This involves finding naturally occurring interesting and undesirable behaviors exhibited by open-source models.
Developing novel architectures and objectives for training interpretability assistants.
Scaling up the training and inference pipelines to support up to 1T-scale models.

Qualities of a strong candidate:

Experience with fine-tuning language models, designing new architectures, and creating evaluations.
Reliable results: good experimental design, epistemic self-awareness and transparency
Generativeness: coming up with original, productive ideas for unblocking progress
Curiosity: a desire to understand ML systems and how they work
Strong programming ability, including navigating trade-offs between prototyping speed and maintainability
Strong communication skills, low ego, openness to giving and receiving feedback

We are located in San Francisco and enthusiastic to work together in-person. We are open to sponsoring international visas.

Research Engineer - Scalable Interpretability

Job Description

See Your Match Score

More jobs at Transluce

More jobs at Transluce