Back to jobs
Rhoda AI

Research Engineer/Research Scientist- Video Generation Modeling

Palo AltoPosted Today
FullTime

Job Description

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data

  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale

  • Research scaling laws and data efficiency for web-scale video pretraining

  • Investigate what properties of web video transfer most effectively to robotic control and action prediction

  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance

  • Run rigorous ablations and benchmarking to understand what drives model quality at scale

  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems

  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)

  • Hands-on experience training large generative models from scratch at scale

  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior

  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)

  • Ability to design experiments, interpret results, and iterate quickly

  • Strong research taste: ability to identify high-leverage questions and cut through noise

  • Comfort operating in a fast-moving, ambiguous startup environment

  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice to Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience

  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)

  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)

  • Experience with large-scale autoregressive language model pretraining and scaling

  • Familiarity with web-scale video datasets and video data curation pipelines

  • Prior work connecting video generation to control, action prediction, or robotic learning

  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models

  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision

  • Direct collaboration with data, post-training, and deployment teams with no silos

  • High ownership and fast iteration in a small, elite team

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

51-200 employees
Palo Alto, US
Website
Research Engineer/Research Scientist- Video Generation Modeling at Rhoda AI | Renata