首页 /研究 /Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples

PERCEPTION

Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples

Tae-Woong Kim, Byeonghwi Kim, Jonghyun Choi

发表年份: 2025
引用次数: 3
访问权限: 开放获取

摘要

Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally propose to correct the mistakes using visual cues from the agent. The proposed scheme allows us to use a few language pairs thanks to the visual cues and outperforms state-of-the-art approaches. Our code and the dataset are publicly available to facilitate further research.

关键词

Embodied cognitionModalComputer scienceHuman–computer interactionArtificial intelligenceMaterials science

Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory