首页 /研究 /H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

LEARNING

H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

Prasad Tadepalli, DoKyeong Ok

发表年份: 1994
引用次数: 14

摘要

In this paper, we introduce a model-bases reinforcement learning method called H-learning, which optimizes undiscounted average reward. We compare it with three other reinforcement learning methods in the domain of scheduling Automatic Guided Vehicles, and transportation robots used in modern manufacturing plants and facilities. The four methods differ along two dimensions. They are either model-based or model-free, and optimize discounted total reward or undiscounted average reward. Our experimental results indicate that H-learning is more robust with respect to changes in the domain parameters, and in many cases, converges in fewer steps to better average reward per time step than all the other methods. An added advantage is that unlike the other methods it does not have any parameters to tune.

关键词

Reinforcement learningComputer scienceScheduling (production processes)Artificial intelligenceDomain (mathematical analysis)ReinforcementMachine learningMathematical optimizationMathematicsEngineering

H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control