Auto-exploratory average reward reinforcement learning

DoKyeong Ok, Prasad Tadepalli

Year: 1996
Citations: 9

Abstract

We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this &quot;Auto-exploratory H-learning&quot; performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Q-learning (Watkins and Dayan 92) and Adaptive Real-Time Dynamic Programming (ARTDP) (Barto, Bradtke, &amp; Singh 95), optimize the total discounted reward the ...

Keywords

Reinforcement learningComputer scienceBellman equationArtificial intelligenceScheduling (production processes)Task (project management)RobotState spaceMachine learningMathematical optimization

Auto-exploratory average reward reinforcement learning

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory