PQ−Learning: An Efficient Robot Learning Method for Intelligent Behavior Acquisition
Weiyu Zhu, Stephen E. Levinson
- 发表年份
- 2001
- 引用次数
- 7
摘要
Abstract This paper presents an efficient reinforcement learning method, called the PQ-learning, for intelligent behavior acquisition by an autonomous robot. This method uses a special action value propagation technique, named the spatial propagation and temporal propagation, to achieve fast learning convergence in large state spaces. Compared with the approaches in literature, the proposed method offers three benefits for robot learning. First, this is a general method, which should be applicable to most reinforcement learning tasks. Second, the learning is guaranteed to converge to the optimum with a much faster converging speed than the traditional Q and Q(λ)-learning methods. Third, it supports both self and teacher-directed learning, where the help from the teacher is directing the robot to explore, instead of explicitly offering labels or ground truths as in the supervised-learning regime. The proposed method had been tested with a simulated robot navigation-learning problem. The results show that this method significantly outperforms the Q(λ)-learning algorithm in terms of the learning speeds in both self and teacher-directed learning regimes. 1.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002