On Reward-Balancing Methods for Reinforcement Learning
Simone Baroncini, Bahman Gharesifard, Giuseppe Notarstefano
- Year
- 2026
- Access
- Open access
Abstract
This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform the RL problem into an equivalent one in which the optimal policies are greedy. For this procedure, referred to as normalization process, we provide a theoretical analysis of the involved transformations, emphasizing their algebraic structure. Then, we introduce a control-theoretic reformulation, recasting the reward-balancing procedure into an optimal control framework. The approach is further extended to address model uncertainty through stochastic model sampling, yielding normalization guarantees and probabilistic bounds on stochastic fluctuations. Using the proposed optimal control framework within a scenario model predictive control (MPC) setting, we demonstrate, through simulation studies, performance improvements over the current state-of-the-art.
Keywords
Related papers
The Organization of Behavior
D. O. Hebb
2005
Fractional Brownian Motions, Fractional Noises and Applications
Benoît B. Mandelbrot, John W. Van Ness
1968
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi +7 more
2021
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar +7 more
2018