Home /Research /A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning
LEARNING

A heuristic Q-learning architecture for fully exploring a world and deriving an optimal policy by model-based planning

Gang Zhao, Shoji Tatsumi, Ruoying Sun

Year
2003
Citations
2

Abstract

For solving Markov decision processes with incomplete information on robot learning tasks, model-based algorithm makes effective use of gathered data, but usually requires extreme computation. Dyna-Q is an architecture that uses experiences to build a model and uses the model to adjust the policy simultaneously, however, it does not help an agent to explore an environment actively. In, this paper, we present an Exa-Q architecture which learns models and makes plans using learned models to help the reinforcement learning agent explore an environment actively and improve the reinforcement function estimate. As a result, the Exa-Q architecture can identify an environment fully and speed up the learning rate for deriving the optimal policy. Experimental results demonstrate that the proposed method is efficient.

Keywords

Reinforcement learningComputer scienceArchitectureHeuristicMarkov decision processArtificial intelligenceQ-learningFunction (biology)Machine learningComputation

Related papers

Browse all LEARNING papers