Reinforcement learning for dynamic robotic systems
Yendo Hu, R.D. Fellman
- Year
- 1996
- Citations
- 3
Abstract
Adaptive algorithms are the only solution towards solving control problems in an unknown or constantly changing environment. In dynamic robotic control, these algorithms may offer solutions that will one day bring laboratory robots into the unpredictable real world. Three feedback methods exist for adaptive algorithms: supervised, reinforced, and unsupervised. Of these, reinforcement feedback balances the trade-off between learning ability and a priori knowledge. This dissertation focuses on various issues associated with reinforcement learning algorithms for dynamic robotic control. Reinforcement algorithms differ from each other in their performance in the three key areas: (1) computational complexity, (2) learning speed, and (3) flexibility. This dissertation presents five different reinforcement algorithms, each originating from research emphasizing different combinations of these three key areas. The first algorithm is the result of research emphasizing both reducing the computational complexity and increasing the learning speed. This algorithm uses a state history queue to decouple the computational demand from the number of quantized states within the input space. It increases learning speed by associating neighboring state information, and optimizes memory use by dynamically allocating memory when needed. The second algorithm is designed to learn complex tasks fast. A framework within the algorithm permits the creation of localized reinforcement rules, enabling one to break down a complex task into a set of simple tasks. This algorithm can learn to guide the cartpole balancer through a cyclic trajectory. The third algorithm came from research focused on generalizing the algorithm. It eliminates the need to prequantize the input state space by adaptively quantizing a continuous input state space based on the robot kinematics. The fourth algorithm presents an alternative adaptive quantization method that exploits the reset event. And finally, the last algorithm is the result of research attempting to address all three key areas. This algorithm adaptively learns by building a graph. With this graph structure, the algorithm can store all observed events. By effectively analyzing the stored events, it can then derive successful controllers for the specific application. This algorithm is flexible in that it assumes no knowledge of an effective state space quantization setup, goal region, or robot dynamics. This method can learn to control a puck-on-hill task starting at sub-optimal positions and rocket landing missions.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002