Social-LLaVA: Enhancing Social Robot Navigation through Human-Language Reasoning
Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao
- 发表年份
- 2025
- 引用次数
- 3
摘要
As mobile robots become increasingly common in human-centric environments, social navigation—adhering to unwritten social norms rather than merely avoiding pedestrians—has drawn growing attention. Existing methods, from hand-crafted techniques to learning-based approaches, often overlook the nuanced context and scene understanding that humans naturally exhibit. Inspired by studies indicating the critical role of language in cognition and reasoning, we propose a new approach to bridge robot perception and socially aware actions through human-like language reasoning. We introduce Social robot Navigation via Explainable Interactions (SNEI), a human-annotated vision-language dataset comprising over 40K Visual Question Answering (VQA) pairs across 2K unique social scenarios, drawn from diverse, unstructured public spaces. SNEI contains perception, prediction, chain-of-thought reasoning, action, and explanation, thereby allowing robots to interpret social contexts in human language. We fine-tune a Vision-Language Model, Social-LLaVA, on SNEI to demonstrate the potential of language-guided reasoning for high-level navigation tasks. Experimental evaluations—both quantitative and qualitative—demonstrate that Social-LLaVA can outperform state-of-the-art models.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">†</sup>.
关键词
相关论文
Artificial intelligence: a modern approach
1995
Self-Organizing Maps
Teuvo Kohonen
1995
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013
Probabilistic robotics
Sebastian Thrun
2002