Mixed reinforcement learning for partially observable Markov decision process

Le Tien Dung, Takashi Komeda, Motoki Takagi

研究成果: Conference contribution

5 被引用数 (Scopus)

抄録

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

本文言語English
ホスト出版物のタイトルProceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
ページ7-12
ページ数6
DOI
出版ステータスPublished - 2007 10月 9
イベント2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 - Jacksonville, FL, United States
継続期間: 2007 6月 202007 6月 23

出版物シリーズ

名前Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

Conference

Conference2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
国/地域United States
CityJacksonville, FL
Period07/6/2007/6/23

ASJC Scopus subject areas

  • 人工知能
  • ソフトウェア
  • 制御およびシステム工学
  • 電子工学および電気工学

フィンガープリント

「Mixed reinforcement learning for partially observable Markov decision process」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル