Back to jobs
X

Member of Technical Staff, Perception

San Mateo HybridPosted Yesterday
Full-timeremote

Job Description

At XDOF, we’re at an inflection point. Frontier labs are racing to build general-purpose robots, and high-quality training data is the bottleneck. We’re building the foundation behind the foundation models – the data collection systems, operational capability, exabyte-scale data warehouse, and software toolchain – to help our partners drive the field forward.

The Perception Algorithm team transforms raw multimodal sensor data into high-quality robot training annotations. You will be deeply involved in the complete loop from data collection to model delivery — sensor calibration, SLAM localization, human pose estimation, perception model training, and embedded deployment. Your work directly determines the quality ceiling of our training data.

Core Responsibilities

Human Pose Estimation

  • Design and optimize hand pose estimation pipelines supporting accurate joint angle extraction from teleoperation data collection

  • Build full-body pose estimation systems for motion capture and teleoperation action annotation ground truth generation

  • Research and apply vision-based pose estimation methods (markerless) to reduce data collection costs

  • Fuse pose estimation outputs with robot joint angle data to generate consistent training annotations

Robot Perception & Calibration

  • Design and maintain intrinsic/extrinsic calibration pipelines for multi-camera arrays (factory calibration + online recalibration)

  • Build visual SLAM / V-SLAM systems supporting real-time localization and scene reconstruction on data collection platforms

  • Implement hand-eye calibration between cameras and robot end-effectors

  • Develop temporal alignment solutions across multimodal sensors (cameras, IMU, data gloves, force sensors)

Perception Model Training & Deployment

  • Train and iterate on perception models including object detection, instance segmentation, and 6DoF pose estimation

  • Optimize model inference using TensorRT / CUDA for real-time performance on robot embedded platforms

  • Write custom CUDA kernels for low-level acceleration of perception tasks

  • Design evaluation metric frameworks for perception models; continuously track the relationship between model performance and data quality

End-to-End Loop from Data Collection to Model Delivery

  • Contribute to the design of automated annotation pipelines that convert sensor data into structured training labels

  • Build Auto QA modules to filter low-quality data including anomalous frames, failed demonstrations, and sensor dropouts

  • Collaborate with ML engineers and data infrastructure teams to ensure perception output formats meet downstream VLA model training requirements

  • Establish feedback mechanisms linking perception accuracy to model training outcomes, continuously improving annotation quality

Requirements

Must-Have

  • 5+ years of industry experience in robot perception or computer vision

  • Strong 3D vision fundamentals: stereo and structured-light camera principles, 3D reconstruction

  • Proficiency with SLAM frameworks (ORB-SLAM, VINS-Mono, FastLIO, etc.) or V-SLAM system development experience

  • Hands-on engineering experience with human pose estimation: hand joints (MediaPipe, MANO) or full-body pose (OpenPose, SMPLify, etc.)

  • Proficient in deep learning training frameworks for perception model training, tuning, and evaluation

  • TensorRT deployment experience with real-time inference optimization on embedded platforms (Jetson, Horizon, etc.)

  • CUDA programming fundamentals; ability to write or debug custom kernels

  • Proficient in C++ and Python with ROS / ROS2 development experience

  • Proficient with AI coding agents

Nice to Have

  • Engineering experience with 6DoF object pose estimation (FoundPose, FoundationPose, GDR-Net, etc.)

  • Familiarity with 3D Gaussian Splatting or NeRF for scene reconstruction or data augmentation

  • Experience with robot manipulation or teleoperation systems

  • End-to-end development experience with automated annotation pipelines or ground truth generation systems

  • Published research in perception, pose estimation, or robotics

What We Offer

  • Direct involvement in the most critical technical challenge in embodied intelligence: producing high-quality robot training data

  • An environment working alongside top-tier robotics engineers and ML researchers

  • Proprietary hardware platforms (humanoid robots, camera arrays, data gloves)

  • A fast-paced, high-autonomy 0→1 work environment

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Member of Technical Staff, Perception at XDOF | Renata