首页 /研究 /VST-LLM HRI: Multimodal Human-Robot Interaction via Large Language Model Prompts
HRI

VST-LLM HRI: Multimodal Human-Robot Interaction via Large Language Model Prompts

Weikai Ding, Shijun Xiao, Zhengguo Zhu, Teng Chen, Guoteng Zhang

发表年份
2025
引用次数
2

摘要

This paper proposes a Visual-Speech-Text Large Language Model framework for Human-Robot Interaction (VSTLLM HRI). By designing a Modality Language Model (MLM), the framework achieves a closed-loop system for robot perception, task planning, and control. Without requiring fine-tuning of the Large Language Model (LLM), the framework leverages visual semantic extraction, speech command conversion, and prompt engineering guidance to accomplish tasks. We conducted experiments on a bipedal robot to validate the adaptability and control performance of the framework in complex terrain task scenarios. The experimental results demonstrated that the proposed method exhibited good generalization capabilities. The related project files and programs have been uploaded to https://github.com/dwk-Suga/LLMandVLM.git.

关键词

Task (project management)RobotModality (human–computer interaction)GeneralizationTask analysisLanguage understandingAdaptabilityTerrainNatural language

相关论文

查看 HRI 分类全部论文