首页 /研究 /LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies

LOCOMOTION

LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies

I Made Aswin Nahrendra, Seunghyun Lee, Dongkyu Lee, Hyun Myung

发表年份: 2026
访问权限: 开放获取

摘要

Recent advances in legged locomotion learning are still dominated by the utilization of geometric representations of the environment, limiting the robot's capability to respond to higher-level semantics such as human instructions. To address this limitation, we propose a novel approach that integrates high-level commonsense reasoning from foundation models into the process of legged locomotion adaptation. Specifically, our method utilizes a pre-trained large language model to synthesize an instruction-grounded skill database tailored for legged robots. A pre-trained vision-language model is employed to extract high-level environmental semantics and ground them within the skill database, enabling real-time skill advisories for the robot. To facilitate versatile skill control, we train a style-conditioned policy capable of generating diverse and robust locomotion skills with high fidelity to specified styles. To the best of our knowledge, this is the first work to demonstrate real-time adaptation of legged locomotion using high-level reasoning from environmental semantics and instructions with instruction-following accuracy of up to 87% without the need for online query to on-the-cloud foundation models.

关键词

cs.RO

LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies

摘要

关键词

相关论文

Trust Region Policy Optimization

Legged Robots That Balance

Being there: putting brain, body, and world together again

Small-scale soft-bodied robot with multimodal locomotion