Reinforcement Learning on autonomous humanoid robots
E. Schuitema
- 发表年份
- 2012
- 引用次数
- 17
- 访问权限
- 开放获取
摘要
Service robots have the potential to be of great value in households, health care and other labor intensive environments. However, these environments are typically unique, not very structured and frequently changing, which makes it difficult to make service robots robust and versatile through manual programming. Having robots learn to solve tasks autonomously through interaction with the real world forms an attractive alternative. With Reinforcement Learning (RL), a system can learn to perform tasks by receiving only coarse feedback on its actions: desired behavior is reinforced by positive rewards, undesired behavior is punished by negative rewards. In this research, a bipedal walking robot named Leo was designed and built specifically to study the application of RL to real robots. Robot Leo is able to learn two basic motor control tasks: placing a foot on a step of stairs, and walking. To learn to walk, Leo receives a positive reward for moving its foot forward, and negative rewards for falling and for spending time and energy. This process takes about 5 hours of practice in simulation, as well as thousands of falls. On the real prototype, the learning time was shortened by first letting the robot observe a hand coded, sub-optimal controller, which it was quickly able to mimic and even improve in a matter of hours. Algorithmic improvements are proposed to address complications of RL on real robots, such as time delays in the control loop and large disturbances such as a sudden push. To reduce the continuous risk of damage due to the trial-and-error nature of RL, a modular approach is proposed through which the robot can coarsely but quickly learn about the risk of its behavior and learn the actual task more safely and in more detail.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002