V-VLAPS: Value-Guided Planning for Vision-Language-Action Models
Ke Ren, Ali Salamatian, Kieran Pattison, Cyrus Neary
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Vision-language-action (VLA) models provide strong action priors for robotic manipulation, but their reactive behavior can fail under distribution shift and long-horizon task structure. Recent VLA-guided planning methods improve execution by using pretrained policies to guide tree search, yet node selection still depends heavily on policy priors and visit-count exploration. Consequently, when the policy favors poor actions, the planner lacks a learned value signal to correct this bias. Prior work has shown that VLA representations encode rollout success and failure information, suggesting that they may also support value estimation during planning. We introduce Value-Guided Vision-Language-Action Planning and Search (V-VLAPS), which augments VLA-guided planning with a lightweight value head trained on offline VLA rollouts to predict Monte Carlo returns. These predictions guide Monte Carlo Tree Search toward higher-value branches. Across five LIBERO suites, V-VLAPS matches value-free planning baseline at the default search budget in aggregate, and analysis shows that many hard failures are root-level timeouts where predicted values are weakly separated. With a larger search budget, V-VLAPS improves over the baseline in all task suites with +6 percentage points on LIBERO-Object and +4 percentage points on LIBERO-10. Our results suggest that VLA representations can support not only failure prediction, but also value-guided planning when search reaches branches where value-based ranking matters.
关键词
相关论文
Real-Time Obstacle Avoidance for Manipulators and Mobile Robots
Oussama Khatib
1986
A Mathematical Introduction to Robotic Manipulation
Richard M. Murray, Zexiang Li, Shankar Sastry
2017
Robot dynamics and control
Mark W. Spong
1989
A tutorial on visual servo control
Seth Hutchinson, Gregory D. Hager, Peter Corke
1996