The Role

We seek experienced scientists and engineers with deep expertise in pre- and mid-training large language models. You will advance our diffusion-based LLM models, developing novel training techniques and pushing the boundaries of parallel token generation.

Key Responsibilities

Design, develop, and optimize architectures for diffusion-based language models.
Implement innovative training objectives and loss functions for discrete diffusion LLMs.
Research and implement techniques for controlled text generation and constraint satisfaction.
Develop methods for multi-modal integration within the diffusion framework.
Improve model efficiency, reduce training time, and optimize inference throughput.

Qualifications

BS/MS/PhD in Computer Science or a related field (or equivalent experience).
At least 2 years of experience working on ML projects in PyTorch (or equivalent), preferably in a research lab or engineering role.
Excellent familiarity with transformers and core LLM concepts (autoregressive pretraining, instruction tuning, in-context learning, KV caching).
Familiarity with training and inference in diffusion models.
Experience training deep learning models at scale in distributed computing environments.

Preferred Skills

Extensive experience training transformer-based language models from scratch.
Knowledge of advanced training techniques (mixed precision, gradient accumulation, etc.).
Experience with multi-modal learning and cross-modal architectures.
Background in optimization theory and neural network architecture design.
Experience with LLM serving frameworks like vLLM, SGLang, or TensorRT.

Member of Technical Staff, Pre/Mid-Training

Job Description

See Your Match Score

More jobs at Inception

More jobs at Inception