Reinforcement learning algorithms for average-payoff markovian decision processes
Satinder Singh
- 发表年份
- 1994
- 引用次数
- 80
摘要
Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in robotics and artificial intelligence. RL researchers have focussed almost exclusively on problems where the controller has to maximize the discounted sum of payoffs. However, as emphasized by Schwartz (1993), in many problems, e.g., those for which the optimal behavior is a limit cycle, it is more natural and computationally advantageous to formulate tasks so that the controller's objective is to maximize the average payoff received per time step. In this paper I derive new average-payoff RL algorithms as stochastic approximation methods for solving the system of equations associated with the policy evaluation and optimal control questions in average-payoff RL tasks. These algorithms are analogous to the popular TD and Q-learning algorithms already developed for the discounted-payoff case. One of the algorithms derived here is a significant variation of Schwartz's R-learning algorithm. P...
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002