Back to jobs
I
Member of Technical Staff, Pre/Mid-Training
San Mateo, USAPosted 3 months ago
Full-timeremote
Job Description
The Role
We seek experienced scientists and engineers with deep expertise in pre- and mid-training large language models. You will advance our diffusion-based LLM models, developing novel training techniques and pushing the boundaries of parallel token generation.
Key Responsibilities
- Design, develop, and optimize architectures for diffusion-based language models.
- Implement innovative training objectives and loss functions for discrete diffusion LLMs.
- Research and implement techniques for controlled text generation and constraint satisfaction.
- Develop methods for multi-modal integration within the diffusion framework.
- Improve model efficiency, reduce training time, and optimize inference throughput.
Qualifications
- BS/MS/PhD in Computer Science or a related field (or equivalent experience).
- At least 2 years of experience working on ML projects in PyTorch (or equivalent), preferably in a research lab or engineering role.
- Excellent familiarity with transformers and core LLM concepts (autoregressive pretraining, instruction tuning, in-context learning, KV caching).
- Familiarity with training and inference in diffusion models.
- Experience training deep learning models at scale in distributed computing environments.
Preferred Skills
- Extensive experience training transformer-based language models from scratch.
- Knowledge of advanced training techniques (mixed precision, gradient accumulation, etc.).
- Experience with multi-modal learning and cross-modal architectures.
- Background in optimization theory and neural network architecture design.
- Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT.