Mixed reinforcement learning for partially observable Markov decision process

Le Tien Dung, Takashi Komeda, Motoki Takagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
Pages7-12
Number of pages6
DOIs
Publication statusPublished - 2007 Oct 9
Event2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 - Jacksonville, FL, United States
Duration: 2007 Jun 202007 Jun 23

Publication series

NameProceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

Conference

Conference2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
CountryUnited States
CityJacksonville, FL
Period07/6/2007/6/23

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Mixed reinforcement learning for partially observable Markov decision process'. Together they form a unique fingerprint.

  • Cite this

    Dung, L. T., Komeda, T., & Takagi, M. (2007). Mixed reinforcement learning for partially observable Markov decision process. In Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 (pp. 7-12). [4269910] (Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007). https://doi.org/10.1109/CIRA.2007.382910