首页 /研究 /Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization
SWARM

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Ruixiao Xu, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Yuqing Ma, Bo An, Yaodong Yang, Xianglong Liu

发表年份
2025
引用次数
3

摘要

In cooperative multi-agent reinforcement learning (MARL), ensuring robustness against cooperative agents making unpredictable or worst-case adversarial actions is crucial for real-world deployment. In multi-agent settings, each agent may be perturbed or unperturbed, leading to an exponential increase in potential threat scenarios as the number of agents grows. Existing robust MARL methods either enumerate, or approximate all possible threat scenarios, leading to intense computation and insufficient robustness. In contrast, humans develop robust behaviors by maintaining a general level of caution rather than preparing for every possible threat. Inspired by human decision making, we frame robust MARL as a control-as-inference problem, and optimize worst-case robustness across all threat scenarios implicitly optimized through off-policy evaluation. Specifically, we introduce mutual information regularization as robust regularization (MIR3), which maximizes a lower bound on robustness during routine training, serving as a kind of caution for MARL without adversarial inputs. Further insights show that MIR3 acts as an information bottleneck, preventing agents from over-reacting to others and aligning policies with robust action priors. In the presence of worst-case adversaries, our MIR3 significantly surpasses baseline methods in robustness and training efficiency, and maintaining cooperative performance in StarCraft II, quadrotor swarm control, and robot swarm control. When deploying the robot swarm control algorithm in the real world, our method also outperforms the best baseline by 14.29% in reward. See code and demo videos at https://github.com/DIG-Beihang/MIR3.

关键词

Robustness (evolution)Computer scienceReinforcement learningSwarm behaviourArtificial intelligenceAdversarial systemRegularization (linguistics)Mathematical optimizationMachine learningMathematics

相关论文

查看 SWARM 分类全部论文