首页 /研究 /Action Selection in a hypothetical house robot: Using those RL numbers
LEARNING

Action Selection in a hypothetical house robot: Using those RL numbers

Mark Humphrys

发表年份
1996
引用次数
2

摘要

Reinforcement Learning (RL) methods, in contrast to many forms of machine learning, build up value functions for actions. That is, an agent not only knows `what' it wants to do, it also knows `how much' it wants to do it. Traditionally, the latter are used to produce the former and are then ignored, since the agent is assumed to act alone. But the latter numbers contain useful information - they tell us how much the agent will suffer if its action is not executed (perhaps not much). They tell us which actions the agent can compromise on and which it cannot. It is clear that many interesting systems possess multiple parallel and conflicting goals, all demanding attention, and none of which can be fully satisfied expect at the expense of others. Animals are the prime example of such systems. In [Humphrys, 1995], I introduced the W-learning algorithms, showing one method of resolving competition among behaviors automatically by reference to their RL values. The scheme has the unusal feature that behaviors are at all times in selfish pursuit of their own goals and have no explicit concept of cooperation, despite residing in the same body. In this paper, I apply W-learning to the world of a hypothetical house robot, which doubles as family toy, movile security camera, mobile smoke alarm and occasional vacuum cleaner. I show how a W-learning community of behaviors inside the robot will support a robust behavior pattern, capabable of opportunistic behavior, avoiding dithering, and allowing for the concept of default behavior and expression of low-priority goals.

关键词

Reinforcement learningComputer scienceAction selectionAction (physics)Artificial intelligenceCompromiseRobotPsychology

相关论文

查看 LEARNING 分类全部论文