LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies
I Made Aswin Nahrendra, Seunghyun Lee, Dongkyu Lee, Hyun Myung
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Recent advances in legged locomotion learning are still dominated by the utilization of geometric representations of the environment, limiting the robot's capability to respond to higher-level semantics such as human instructions. To address this limitation, we propose a novel approach that integrates high-level commonsense reasoning from foundation models into the process of legged locomotion adaptation. Specifically, our method utilizes a pre-trained large language model to synthesize an instruction-grounded skill database tailored for legged robots. A pre-trained vision-language model is employed to extract high-level environmental semantics and ground them within the skill database, enabling real-time skill advisories for the robot. To facilitate versatile skill control, we train a style-conditioned policy capable of generating diverse and robust locomotion skills with high fidelity to specified styles. To the best of our knowledge, this is the first work to demonstrate real-time adaptation of legged locomotion using high-level reasoning from environmental semantics and instructions with instruction-following accuracy of up to 87% without the need for online query to on-the-cloud foundation models.
关键词
相关论文
Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz 等 5 位作者
2015
Legged Robots That Balance
Marc H. Raibert, Ernest R. Tello
1986
Being there: putting brain, body, and world together again
1997
Small-scale soft-bodied robot with multimodal locomotion
Wenqi Hu, Guo Zhan Lum, Massimo Mastrangeli 等 4 位作者
2018