Q-PSP Learning: An Exploitation-Oriented Q-Learning Algorithm and Its Applications
Tadashi Horiuchi, Akinori Fujino, Osamu Katai, Tetsuo Sawaragi
- 发表年份
- 1999
- 引用次数
- 18
- 访问权限
- 开放获取
摘要
Reinforcement learning alogrithms can be classified into two approaches. One is “exploitation-oriented” approach which attempts to acquire action rules mainly by reinforcing and relying on good experiences, and the other is “exploration-oriented” approach which pursuits the optimality of actions to receive highest rewards by exploring the environment. In this paper, we propose Q-PSP Learning method which incorporates the the idea of PSP (Profit Sharing Plan) used in Classifier System as “exploitation-oriented” reinforcement learning into Q-Learning as “exploration-oriented” reinforcement learning in order to take the merits of these two approaches. Through applying the Q-PSP Learning to several control problems and a robot navigation problem, it will be shown that not only the speed up of learning but also effectiveness for complex problems can be expected and that an appropriate balance between exploration and exploitation can be attained in Q-PSP Learning.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002