Back to jobs
T
AI Systems Engineer
San Francisco, USAPosted 14 months ago
Full-timeremote
Job Description
Salary range: $350,000 - $600,000/year + benefits
Description: Transluce is a fast-moving research lab building the public tech stack for understanding and debugging AI systems. We build world-class, AI-backed analysis tools and use these to set industry standards for evaluation. We are a non-profit with a mission to steer the development of AI for the public good.
About the role: We are looking for an exceptional AI systems engineer to lead the design and development of our core ML stack, building systems that can scale to thousands of GPUs and performantly query trillion-token databases.
As an early member of a highly collaborative team, you will be free to innovate and move fast, building high-impact systems from the ground up. As part of a mission-focused non-profit, your work will have high direct impact (e.g. used by governments to inform AI policy) and cross-organisational reach (open-source tools the entire community can build on).
Core responsibilities:
- Set overall code culture and tooling for a fast-growing org
- Help to solve our core technical challenges across verticals. Examples include:
- Docent:
- High-concurrency container-based evals with quick ability to iterate on interventions to agentic trajectories
- Deterministic sandbox execution of code that can efficiently restore state from checkpoints
- Interpretability:
- Inference stacks that are as performant as vLLM but flexible enough to allow complex model introspection and intervention, steering, configurable sampling, etc., and that can scale to 400B+ parameter models
- Behavior elicitation:
- Distributed RL training and roll-outs allowing thousands of concurrent rollouts across machines
- Build great internal tools to speed up the team
- Docent:
- Help tone-set in the organization around best practices for building and path-set on what infra we should build
- Help other team members think through infra challenges
Qualities of a strong candidate:
- Exceptional programmer fluent in Python
- Bare metal optimization: know GPUs, other accelerators in and out (low-level performance + optimization + parallel programming)
- Experience engineering at scale (distributed systems, reliability, architecture design)
- Leader on global code quality and health (designing good primitives, managing complexity and scale)
- Bonus: can set up LLM pipelines, e.g. multiple specialized LLMs interacting with each other in a performant and reliable way
- Bonus: experience with open-source community management
We are located in San Francisco and enthusiastic to work together in-person. We are open to sponsoring international visas.