首页 /研究 /Combining learned controllers to achieve new goals based on linearly solvable MDPs

LOCOMOTION

Combining learned controllers to achieve new goals based on linearly solvable MDPs

Eiji Uchibe, Kenji Doya

发表年份: 2014
引用次数: 9

摘要

Learning complicated behaviors usually involves intensive manual tuning and expensive computational optimization because we have to solve a nonlinear Hamilton-Jacobi-Bellman (HJB) equation. Recently, Todorov proposed a class of the so-called Linearly solvable Markov Decision Process (LMDP) which converts a nonlinear HJB equation to a linear differential equation. Linearity of the simplified HJB equation allows us to apply superposition to derive a new composite controller from a set of learned primitive controllers. However, his method was a model-based approach and it was not evaluated in a real domain. This study proposes a model-free method which is similar to the Least Squares Temporal Difference (LSTD) learning. In this method, the exponentially transformed cost function can be regarded as the discount factor in LSTD. Our proposed method is applied to learning walking behaviors with the quadruped robot to evaluate in real robot experiments. The goal of each primitive task is to go to the specific target position in the environment and that of the composite task is to approach arbitrary region represented by the primitives' target positions. Experimental results show that the composite policy can be used as a good initial policy for the new task.

关键词

Hamilton–Jacobi–Bellman equationMarkov decision processComputer scienceMathematical optimizationReinforcement learningBellman equationDomain (mathematical analysis)Dynamic programmingNonlinear systemMobile robot

Combining learned controllers to achieve new goals based on linearly solvable MDPs

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory