Combining learned controllers to achieve new goals based on linearly solvable MDPs
Eiji Uchibe, Kenji Doya
- 发表年份
- 2014
- 引用次数
- 9
摘要
Learning complicated behaviors usually involves intensive manual tuning and expensive computational optimization because we have to solve a nonlinear Hamilton-Jacobi-Bellman (HJB) equation. Recently, Todorov proposed a class of the so-called Linearly solvable Markov Decision Process (LMDP) which converts a nonlinear HJB equation to a linear differential equation. Linearity of the simplified HJB equation allows us to apply superposition to derive a new composite controller from a set of learned primitive controllers. However, his method was a model-based approach and it was not evaluated in a real domain. This study proposes a model-free method which is similar to the Least Squares Temporal Difference (LSTD) learning. In this method, the exponentially transformed cost function can be regarded as the discount factor in LSTD. Our proposed method is applied to learning walking behaviors with the quadruped robot to evaluate in real robot experiments. The goal of each primitive task is to go to the specific target position in the environment and that of the composite task is to approach arbitrary region represented by the primitives' target positions. Experimental results show that the composite policy can be used as a good initial policy for the new task.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002