Back to jobs
I

Member of Technical Staff, Kernels

San Mateo, USAPosted 3 months ago
Full-timeremote

Job Description

The Role
We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training and serving large models possible.

Key Responsibilities
  • Design and implement custom ML kernels (CUDA, CuTe, Triton) for core dLLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU architectures.
  • Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel efficiency.
  • Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.

Qualifications
  • BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
  • Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.
  • Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
  • Background in performance optimization and profiling of ML systems.
  • Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (XLA, TVM).
  • Familiarity with distributed training techniques (data parallel, model parallel, pipeline parallel).
  • Proficiency in Python and at least one systems programming language (C++/Rust/Go).
  • Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.

Preferred Skills
  • Experience building and maintaining large-scale language models with tens of billions of parameters or more.
  • Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
  • Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM.
  • Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Member of Technical Staff, Kernels at Inception | Renata