首页 /研究 /Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning

SWARM

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning

Yoann Poupart, Aurélie Beynier, Nicolas Maudet

发表年份: 2025
访问权限: 开放获取

摘要

Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games, yet most of the trained models are hard to interpret. While learning intrinsically interpretable models remains a prominent approach, its scalability and flexibility are limited in handling complex tasks or multi-agent dynamics. This paper advocates for direct interpretability, generating post hoc explanations directly from trained models, as a versatile and scalable alternative, offering insights into agents' behaviour, emergent phenomena, and biases without altering models' architectures. We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery, to highlight their applicability to single-agent, multi-agent, and training process challenges. By addressing MADRL interpretability, we propose directions aiming to advance active topics such as team identification, swarm coordination and sample efficiency.

关键词

cs.AI

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning

摘要

关键词

相关论文

A new optimizer using particle swarm theory

Swarm Intelligence

Design and use paradigms for gazebo, an open-source multi-robot simulator

Swarm robotics: a review from the swarm engineering perspective