Back to jobs
Job Description
- Conduct original research in multimodal AI (Gemini), including vision-language models (VLMs), image understanding, OCR and document intelligence, spatial reasoning and embodied perception, image-text alignment and retrieval, agentic multimodal systems, scaling laws, and data infra, pipeline, training data attribution, and mixture optimization.
- Design, train, and evaluate large-scale transformer-based architectures for image and video understanding.
- Develop novel methods for multimodal pretraining, instruction tuning, alignment, and reinforcement learning.
- Collaborate with cross-functional teams to transition research ideas into production-grade Gemini capabilities.
- Contribute to research direction, experimental design, and scientific strategy within the Gemini organization.
