Home /Research /Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation

MANIPULATION

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation

Ju-Young Kim, Ji-Hong Park, Myeongjun Kim, Gun-Woo Kim

Year: 2025
Access: Open access

Abstract

Smart farming has emerged as a key technology for advancing modern agriculture through automation and intelligent control. However, systems relying on RGB cameras for perception and robotic manipulators for control, common in smart farming, are vulnerable to photometric perturbations such as hue, illumination, and noise changes, which can cause malfunction under adversarial attacks. To address this issue, we propose an explainable adversarial-robust Vision-Language-Action model based on the OpenVLA-OFT framework. The model integrates an Evidence-3 module that detects photometric perturbations and generates natural language explanations of their causes and effects. Experiments show that the proposed model reduces Current Action L1 loss by 21.7% and Next Actions L1 loss by 18.4% compared to the baseline, demonstrating improved action prediction accuracy and explainability under adversarial conditions.

Keywords

cs.CVcs.AIcs.RO

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation

Abstract

Keywords

Related papers

Real-Time Obstacle Avoidance for Manipulators and Mobile Robots

A Mathematical Introduction to Robotic Manipulation

Robot dynamics and control

A tutorial on visual servo control