MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies
Howard H. Qian, Kejia Ren, Yu Xiang, Vicente Ordonez, Kaiyu Hang
- Year
- 2026
- Access
- Open access
Abstract
Rigid bodies constitute the smallest manipulable elements in the real world, and understanding how they physically interact is fundamental to embodied reasoning and robotic manipulation. Thus, accurate detection, segmentation, and tracking of moving rigid bodies is essential for enabling reasoning modules to interpret and act in diverse environments. However, current segmentation models trained on semantic grouping are limited in their ability to provide meaningful interaction-level cues for completing embodied tasks. To address this gap, we introduce MotionBit, a novel concept that, unlike prior formulations, defines the smallest unit in motion-based segmentation through kinematic spatial twist equivalence, independent of semantics. In this paper, we contribute (1) the MotionBit concept and definition, (2) a hand-labeled benchmark, called MoRiBo, for evaluating moving rigid-body segmentation across robotic manipulation and human-in-the-wild videos, and (3) a learning-free graph-based MotionBits segmentation method that outperforms state-of-the-art embodied perception methods by 37.3\% in macro-averaged mIoU on the MoRiBo benchmark. Finally, we demonstrate the effectiveness of MotionBits segmentation for downstream embodied reasoning and manipulation tasks, highlighting its importance as a fundamental primitive for understanding physical interactions.
Keywords
Related papers
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986
A Mathematical Introduction to Robotic Manipulation
Richard M. Murray, Zexiang Li, Shankar Sastry
2017
Robot dynamics and control
Mark W. Spong
1989
A tutorial on visual servo control
Seth Hutchinson, Gregory D. Hager, Peter Corke
1996