Home /Research /Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation
SWARM

Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation

Ifrah Saeed, Andrew C. Cullen, Zainab Zaidi, Sarah Erfani, Tansu Alpcan

Year
2025
Citations
3

Abstract

The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to faster convergence and higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics. • A new approach that uses well-known single-agent pathfinding for semi-automated MARL reward shaping, reducing manual effort. • Tested in diverse navigation environments, proving versatility and robustness. • Trains up to faster than state-of-the-art, reducing training time and computational costs. • Achieves up to higher rewards than state-of-the-art, demonstrating its ability to deliver more optimised solutions. • Makes MARL more accessible to non-domain experts and scalable for complex multiagent systems.

Keywords

PathfindingReinforcement learningSwarm behaviourScalabilityAutonomous agentFlocking (texture)Robustness (evolution)Automation

Related papers

Browse all SWARM papers