Advance algorithms, sampling techniques and large-scale optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training deep learning models.
Improve the end-to-end model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
Collaborate with hardware and software teams to optimize kernels and inference engines, across different hardware and model architectures.
Optimize latency, memory bandwidth, workloads.

Staff Research Scientist, ML Efficiency, Google Research

Job Description