CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors
Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent features. We propose $CorridorVLA$, which predicts sparse spatial anchors as incremental physical changes (e.g., $Δ$-positions) and uses them to impose an explicit tolerance region in the training objective for action generation. The anchors define a corridor that guides a flow-matching action head: trajectories whose implied spatial evolution falls outside it receive corrective gradients, while minor deviations from contacts and execution noise are permitted. On the more challenging LIBERO-Plus benchmark, CorridorVLA yields consistent gains across both SmolVLA and GR00T, improving success rate by $3.4\%$--$12.4\%$ over the corresponding baselines; notably, our GR00T-Corr variant reaches a success rate of $83.21\%$. These results indicate that action-aligned physical cues can provide direct and interpretable constraints for generative action policies, complementing spatial guidance encoded in visual or latent forms. Code is available at https://github.com/corridorVLA.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992