Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation
Haoran Huang, Haonan Dong, Huixu Dong
- Year
- 2026
- Access
- Open access
Abstract
Mobile imitation learning on portable demonstration interfaces faces two coupled bottlenecks: locomotion-contaminated action labels and inference-induced execution latency on a continuously moving base. Recent wrist-mounted interfaces lower the cost of tabletop data collection, yet a single wrist view does not capture the global context required for base navigation. Adding a body-mounted camera entangles human walking with hand motion. Meanwhile, generative policies introduce hundreds of milliseconds of inference latency, during which the base advances past predicted waypoints, forcing backward corrections at action splices. This paper presents Mobile UMI, a hardware-free demonstration framework that addresses both gaps through three components. First, a dual-camera capture system records chest-centric global context and wrist-centric local interaction without any robot present. Second, a one-shot ChArUco-based spatial anchor unifies the chest and hand visual-inertial frames; the hand pose is then re-expressed relative to the chest to extract decoupled SE(3) manipulation and SE(2) base trajectories. Third, an asynchronous receding-horizon executor performs online state matching: each generated action chunk is realigned with the current physical pose so that expired waypoints are discarded before execution. The full system is evaluated on four long-horizon household tasks, achieving an average success rate of 83.8% over 100 trials per task. Controlled comparisons against ACT and Diffusion Policy show that the chest-relative label alone closes much of the gap; online state matching closes the remainder. These results indicate that, for mobile imitation learning under the tested conditions, explicit kinematic factorization combined with state-level latency alignment provides an effective solution without requiring architectural changes to the underlying policy class.
Keywords
Related papers
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986
A Mathematical Introduction to Robotic Manipulation
Richard M. Murray, Zexiang Li, Shankar Sastry
2017
Robot dynamics and control
Mark W. Spong
1989
A tutorial on visual servo control
Seth Hutchinson, Gregory D. Hager, Peter Corke
1996