Policy Search by Dynamic Programming

J. Andrew Bagnell, Sham M. Kakade, Andrew Y. Ng, Jeff Schneider

发表年份: 2018
引用次数: 133
访问权限: 开放获取

摘要

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

关键词

Reinforcement learningComputer scienceDynamic programmingBaseline (sea)GridMathematical optimizationState (computer science)RobotArtificial intelligenceAlgorithm

Policy Search by Dynamic Programming

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory