Learning behaviours for robot soccer
James Brusey
- 发表年份
- 2002
- 引用次数
- 2
摘要
A central problem in autonomous robotics is how to design programs that determine what the robot should do next. Behaviour-based control is a popular paradigm, but current approaches to behavior design typically involve hand-coded behaviours. The aim of this work is to explore the use of reinforcement learning to development autonomous robot behaviours automatically, and specifically to look at the performance of the resulting behaviours.\nThis thesis examines the question of whether behaviours for a real behaviour-based autonomous robot can be learnt under simulation using the Monte Carlo Exploring Starts, ε-soft On Policy Monte Carlo or linear, gradient-descent Sarsa( λ) algorithms. A further question is whether the increased performance of learnt behaviours carries through to increased performance on the real robot. In addition, this work looks at whether continuing to learn on the real robot causes further improvement in the performance of the behaviour.\nA novel method is developed, termed Policy Initialisation, that makes use of the domain knowledge in an existing, hand-coded behaviour by converting the behaviour into either a reinforcement learning policy or an action-value function. This is then used to bootstrap the learning process.\nThe Markov Design Process model is central to reinforcement learning algorithms. This work examines whether it is possible to use an internal world model in the real robot to suit the requirements of the Markov Decision Process model.\nThe methodology used to answer these questions is to take three realistic, non-trivial robotic tasks, and attempt to learn behaviours for each. The learnt behaviours are then compared with hand-coded behaviours that have either been published or used in international competition. The tasks are based on real task requirements for robots used in a RoboCup Formula 2000 robot soccer team. The first is a generic movement behaviour that moves the robot to a target point. The second requires the robot to dribble the ball in an arc so that the robot maintains possession and so that the final position is lined up with the goal. The third addresses the problem of kicking the ball away from the wall.\nThe results show that for these three different types of behavioural problem, reinforcement learning on a simulator produced significantly better performance than hand-coded equivalents, not only under simulation but also on the real robot. In contrast to this, continuing the learning process on the real robot did not significantly improve performance.\nThe Policy Initialisation technique is found to accelerate learning for tabular Monte Carlo methods, but makes minimal improvement and is, in fact, costly to use in conjunction with linear, gradient-descent Sarsa( λ). This approach, unlike some other techniques for accelerating learning, does not appear to bias the solution.\nFinally, the evidence from this thesis is that internal world models that maintain the requirements of the Markov Decision Processes can be constructed, and this appears to be a sound approach to avoiding problems connected with partial observability that have previously occurred in the use of reinforcement learning in robotic environments.\n
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002