LIBERO-X: Robustness Litmus for Vision-Language-Action Models
Guodong Wang, Chenkai Zhang, Qingjie Liu, Jinjin Zhang, Jiancheng Cai, Junjie Liu, Xinmin Liu
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Reliable benchmarking is critical for advancing Vision-Language-Action (VLA) models, as it reveals their generalization, robustness, and alignment of perception with language-driven manipulation tasks. However, existing benchmarks often provide limited or misleading assessments due to insufficient evaluation protocols that inadequately capture real-world distribution shifts. This work systematically rethinks VLA benchmarking from both evaluation and data perspectives, introducing LIBERO-X, a benchmark featuring: 1) A hierarchical evaluation protocol with progressive difficulty levels targeting three core capabilities: spatial generalization, object recognition, and task instruction understanding. This design enables fine-grained analysis of performance degradation under increasing environmental and task complexity; 2) A high-diversity training dataset collected via human teleoperation, where each scene supports multiple fine-grained manipulation objectives to bridge the train-evaluation distribution gap. Experiments with representative VLA models reveal significant performance drops under cumulative perturbations, exposing persistent limitations in scene comprehension and instruction grounding. By integrating hierarchical evaluation with diverse training data, LIBERO-X offers a more reliable foundation for assessing and advancing VLA development.
关键词
相关论文
The Uncanny Valley [From the Field]
Masahiro Mori, Karl F. MacDorman, Norri Kageki
2012
Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots
Christoph Bartneck, Dana Kulić, Elizabeth A. Croft 等 4 位作者
2008
The development of Honda humanoid robot
Kazuo Hirai, Masato Hirose, Y. Haikawa 等 4 位作者
2002
A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction
Peter A. Hancock, Deborah R. Billings, Kristin E. Schaefer 等 6 位作者
2011