Member of Technical Staff, Perception
Job Description
At XDOF, we’re at an inflection point. Frontier labs are racing to build general-purpose robots, and high-quality training data is the bottleneck. We’re building the foundation behind the foundation models – the data collection systems, operational capability, exabyte-scale data warehouse, and software toolchain – to help our partners drive the field forward.
The Perception Algorithm team transforms raw multimodal sensor data into high-quality robot training annotations. You will be deeply involved in the complete loop from data collection to model delivery — sensor calibration, SLAM localization, human pose estimation, perception model training, and embedded deployment. Your work directly determines the quality ceiling of our training data.
Core Responsibilities
Human Pose Estimation
Design and optimize hand pose estimation pipelines supporting accurate joint angle extraction from teleoperation data collection
Build full-body pose estimation systems for motion capture and teleoperation action annotation ground truth generation
Research and apply vision-based pose estimation methods (markerless) to reduce data collection costs
Fuse pose estimation outputs with robot joint angle data to generate consistent training annotations
Robot Perception & Calibration
Design and maintain intrinsic/extrinsic calibration pipelines for multi-camera arrays (factory calibration + online recalibration)
Build visual SLAM / V-SLAM systems supporting real-time localization and scene reconstruction on data collection platforms
Implement hand-eye calibration between cameras and robot end-effectors
Develop temporal alignment solutions across multimodal sensors (cameras, IMU, data gloves, force sensors)
Perception Model Training & Deployment
Train and iterate on perception models including object detection, instance segmentation, and 6DoF pose estimation
Optimize model inference using TensorRT / CUDA for real-time performance on robot embedded platforms
Write custom CUDA kernels for low-level acceleration of perception tasks
Design evaluation metric frameworks for perception models; continuously track the relationship between model performance and data quality
End-to-End Loop from Data Collection to Model Delivery
Contribute to the design of automated annotation pipelines that convert sensor data into structured training labels
Build Auto QA modules to filter low-quality data including anomalous frames, failed demonstrations, and sensor dropouts
Collaborate with ML engineers and data infrastructure teams to ensure perception output formats meet downstream VLA model training requirements
Establish feedback mechanisms linking perception accuracy to model training outcomes, continuously improving annotation quality
Requirements
Must-Have
5+ years of industry experience in robot perception or computer vision
Strong 3D vision fundamentals: stereo and structured-light camera principles, 3D reconstruction
Proficiency with SLAM frameworks (ORB-SLAM, VINS-Mono, FastLIO, etc.) or V-SLAM system development experience
Hands-on engineering experience with human pose estimation: hand joints (MediaPipe, MANO) or full-body pose (OpenPose, SMPLify, etc.)
Proficient in deep learning training frameworks for perception model training, tuning, and evaluation
TensorRT deployment experience with real-time inference optimization on embedded platforms (Jetson, Horizon, etc.)
CUDA programming fundamentals; ability to write or debug custom kernels
Proficient in C++ and Python with ROS / ROS2 development experience
Proficient with AI coding agents
Nice to Have
Engineering experience with 6DoF object pose estimation (FoundPose, FoundationPose, GDR-Net, etc.)
Familiarity with 3D Gaussian Splatting or NeRF for scene reconstruction or data augmentation
Experience with robot manipulation or teleoperation systems
End-to-end development experience with automated annotation pipelines or ground truth generation systems
Published research in perception, pose estimation, or robotics
What We Offer
Direct involvement in the most critical technical challenge in embodied intelligence: producing high-quality robot training data
An environment working alongside top-tier robotics engineers and ML researchers
Proprietary hardware platforms (humanoid robots, camera arrays, data gloves)
A fast-paced, high-autonomy 0→1 work environment