Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
Shuanghao Bai, Wenxuan Song, Jiayi Chen, Yuheng Ji, Zhide Zhong, Jin Yang, Han Zhao, Wanqi Zhou, Zhe Li, Pengxiang Ding, Cheng Chi, Chang Xu, Xiaolong Zheng, Donglin Wang, Haoang Li, Shanghang Zhang, Badong Chen
- Year
- 2025
- Access
- Open access
Abstract
Recent advances in vision, language, and multimodal learning have substantially accelerated progress in robotic foundation models, with robot manipulation remaining a central and challenging problem. This survey examines robot manipulation from an algorithmic perspective and organizes recent learning-based approaches within a unified abstraction of high-level planning and low-level control. At the high level, we extend the classical notion of task planning to include reasoning over language, code, motion, affordances, and 3D representations, emphasizing their role in structured and long-horizon decision making. At the low level, we propose a training-paradigm-oriented taxonomy for learning-based control, organizing existing methods along input modeling, latent representation learning, and policy learning. Finally, we identify open challenges and prospective research directions related to scalability, data efficiency, multimodal physical interaction, and safety. Together, these analyses aim to clarify the design space of modern foundation models for robotic manipulation.
Keywords
Related papers
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986
A Mathematical Introduction to Robotic Manipulation
Richard M. Murray, Zexiang Li, Shankar Sastry
2017
Robot dynamics and control
Mark W. Spong
1989
A tutorial on visual servo control
Seth Hutchinson, Gregory D. Hager, Peter Corke
1996