首页 /研究 /Reinforcement learning algorithms for average-payoff markovian decision processes

LEARNING

Reinforcement learning algorithms for average-payoff markovian decision processes

Satinder Singh

发表年份: 1994
引用次数: 80

摘要

Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in robotics and artificial intelligence. RL researchers have focussed almost exclusively on problems where the controller has to maximize the discounted sum of payoffs. However, as emphasized by Schwartz (1993), in many problems, e.g., those for which the optimal behavior is a limit cycle, it is more natural and computationally advantageous to formulate tasks so that the controller&apos;s objective is to maximize the average payoff received per time step. In this paper I derive new average-payoff RL algorithms as stochastic approximation methods for solving the system of equations associated with the policy evaluation and optimal control questions in average-payoff RL tasks. These algorithms are analogous to the popular TD and Q-learning algorithms already developed for the discounted-payoff case. One of the algorithms derived here is a significant variation of Schwartz&apos;s R-learning algorithm. P...

关键词

Stochastic gameReinforcement learningMarkov decision processComputer scienceQ-learningArtificial intelligenceMathematical optimizationMarkov processLimit (mathematics)Optimal control

Reinforcement learning algorithms for average-payoff markovian decision processes

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory