Distributional Reinforcement Learning via the Cramér Distance
Vanya Aziz, Ivo Nowak, E. M. T Hendrix
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of such algorithm named Cramér-based Distributional Soft Actor-Critic (C-DSAC). The novel approach employs distributional reinforcement learning to represent state-action values, and minimizes the squared Cramér distance for learning the distribution. Empirical results across various robotic benchmarks indicate that our algorithm surpasses the performance of baseline SAC and contemporary distributional methods, with the performance advantage becoming increasingly pronounced in high-complexity environments. To explain the efficiency of the new approach, we conduct an analysis showing that its superior performance is partly due to \textit{confidence-driven} Q-value updates: High-variance target distributions (low confidence in target) lead to more conservative model updates, thereby attenuating the impact of overestimated values. This work deepens the understanding of distributional reinforcement learning, offering insights into the algorithmic mechanisms governing convergence and value estimation.
关键词
相关论文
The Organization of Behavior
D. O. Hebb
2005
Fractional Brownian Motions, Fractional Noises and Applications
Benoît B. Mandelbrot, John W. Van Ness
1968
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi 等 10 位作者
2021
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar 等 10 位作者
2018