首页 /研究 /Reward-penalty reinforcement learning scheme for planning and reactive behaviour
LEARNING

Reward-penalty reinforcement learning scheme for planning and reactive behaviour

A.F.R. Araújo, Andreza Pereira Braga

发表年份
2002
引用次数
8

摘要

This paper describes a reinforcement learning algorithm that allows a point robot to learn navigation strategies within initially unknown indoor environments with fixed and dynamic obstacles. The knowledge is encoded in two surfaces, called reward and penalty surfaces, that are updated either when a target is found or whenever the robot moves respectively. The proposed policy is suitable for both planning and reactive behaviour. The tests involve different kinds of obstacles: a fixed passage, a barrier, a U-shape obstacle and a simple maze. The results suggest that the model solves the goal-directed exploration problem. Thus, the robot is able to reach a desired goal, starting its movement from any position within the environment, avoiding obstacles, and following a viable trajectory. The robot may get stuck in dynamic obstacles, may depend on randomness to avoid them, and generally does not solve the goal-directed reinforcement learning problem.

关键词

Reinforcement learningRobotComputer scienceRandomnessTrajectoryObstacleObstacle avoidanceScheme (mathematics)Motion planningMobile robot

相关论文

查看 LEARNING 分类全部论文