Mixed reinforcement learning for partially observable Markov decision process

Le Tien Dung, Takashi Komeda, Motoki Takagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
Pages7-12
Number of pages6
DOIs
Publication statusPublished - 2007
Event2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 - Jacksonville, FL
Duration: 2007 Jun 202007 Jun 23

Other

Other2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
CityJacksonville, FL
Period07/6/2007/6/23

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Dung, L. T., Komeda, T., & Takagi, M. (2007). Mixed reinforcement learning for partially observable Markov decision process. In Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 (pp. 7-12). [4269910] https://doi.org/10.1109/CIRA.2007.382910