Back to jobs
Dahl Consulting

Research Scientist, Mechanistic Interpretability, Special Projects

Posted 1 weeks ago

Job Description

  • Guide and co-guide research projects exploring emerging mechanistic interpretability methods, including dictionary learning architectures (e.g., multitoken transcoders, Matryoshka sparse autoencoders), patchscopes, and agentic interpretability.
  • Design, develop, and maintain open-source infrastructure and evaluation suites (similar to SAEBench or the dictionary_learning library) to accelerate community and internal research.
  • Perform causal validation of discovered features and circuits using activation patching and feature steering to mitigate undesired behaviors like hallucinations or hidden objectives.
  • Write and present papers for machine learning conferences (e.g., NeurIPS, ICML) and author technical blog posts to communicate concepts to the broader artificial intelligence safety community.
  • Act as both a scientist and an engineer, writing code to run experiments on distributed compute clusters.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Research Scientist, Mechanistic Interpretability, Special Projects at Dahl Consulting | Renata