Research Scientist, Mechanistic Interpretability, Special Projects at Dahl Consulting

Guide and co-guide research projects exploring emerging mechanistic interpretability methods, including dictionary learning architectures (e.g., multitoken transcoders, Matryoshka sparse autoencoders), patchscopes, and agentic interpretability.
Design, develop, and maintain open-source infrastructure and evaluation suites (similar to SAEBench or the dictionary_learning library) to accelerate community and internal research.
Perform causal validation of discovered features and circuits using activation patching and feature steering to mitigate undesired behaviors like hallucinations or hidden objectives.
Write and present papers for machine learning conferences (e.g., NeurIPS, ICML) and author technical blog posts to communicate concepts to the broader artificial intelligence safety community.
Act as both a scientist and an engineer, writing code to run experiments on distributed compute clusters.

Research Scientist, Mechanistic Interpretability, Special Projects