Back to jobs
Job Description
- Advance algorithms, sampling techniques and large-scale optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
- Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training deep learning models.
- Improve the end-to-end model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
- Collaborate with hardware and software teams to optimize kernels and inference engines, across different hardware and model architectures.
- Optimize latency, memory bandwidth, workloads.
