首页 /研究 /KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models

PERCEPTION

KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models

Son Hai Nguyen, Diwei Wang, Jinhyeok Jang, Hyewon Seo

发表年份: 2025
访问权限: 开放获取

摘要

Accurate vision-based action recognition is crucial for developing autonomous robots that can operate safely and reliably in complex, real-world environments. In this work, we advance video-based recognition of indoor daily actions for robotic perception by leveraging vision-language models (VLMs) enriched with domain-specific knowledge. We adapt a prompt-learning framework in which class-level textual descriptions of each action are embedded as learnable prompts into a frozen pre-trained VLM backbone. Several strategies for structuring and encoding these textual descriptions are designed and evaluated. Experiments on the ETRI-Activity3D dataset demonstrate that our method, using only RGB video inputs at test time, achieves over 95\% accuracy and outperforms state-of-the-art approaches. These results highlight the effectiveness of knowledge-augmented prompts in enabling robust action recognition with minimal supervision.

关键词

cs.CVcs.AI

KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models

摘要

关键词

相关论文

Artificial intelligence: a modern approach

Are we ready for autonomous driving? The KITTI vision benchmark suite

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Vision meets robotics: The KITTI dataset