首页 /研究 /Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration

HRI

Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration

Wanqing Xia, Hao Zheng, Weiliang Xu, Xun Xu

发表年份: 2025
引用次数: 8

摘要

• Introduction of a method for novel object 6D pose estimation that leverages state-of-the-art Vision-Language Models (VLMs), enabling accurate pose prediction for any object given its name. • Advancements in generalizability, accuracy, and computational efficiency, as demonstrated through rigorous benchmarking against existing methods on publicly available datasets. • Seamless integration capability with existing object proposal modules, allowing it to address a wide spectrum of application scenarios. • Validation of the method's real-world applicability through comprehensive case studies. Six-Degree-of-Freedom (6D) pose estimation is essential for robotic manipulation tasks, especially in human-robot collaboration environments. Recently, 6D pose estimation has been extended from seen objects to novel objects due to the frequent encounters with unfamiliar items in real-life scenarios. This paper presents a three-stage pipeline for 6D pose estimation of previously unseen objects, leveraging the capabilities of large vision-language models. Our approach consists of vision-language model-based object detection and segmentation, mask selection with pose hypothesis generated from CAD models, and refinement and scoring of pose candidates. We evaluate our method on the YCB-Video dataset, achieving a state-of-the-art Average Recall (AR) score of 75.8 with RGB-D images, demonstrating its effectiveness in accurately estimating 6D poses for a diverse range of objects. The effectiveness of each operation stage is investigated in the ablation study. To validate the practical applicability of our approach, we conduct case studies on a real-world robotic platform, focusing on object pick-up tasks by integrating our 6D pose estimation pipeline with human intention prediction and task analysis algorithms. Results show that the proposed method can effectively handle novel objects in our test environments, as demonstrated through the YCB dataset evaluation and case studies. Our work contributes to the field of human-robot collaboration by introducing a flexible, generalizable approach to 6D pose estimation, enabling robots to adapt to new objects without requiring extensive retraining—a vital capability for advancing human-robot collaboration in dynamic environments. More information can be found in the project GitHub page: https://github.com/WanqingXia/HRC_DetAnyPose .

关键词

PoseArtificial intelligenceComputer scienceComputer visionRobotHuman–computer interaction

Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory