Mixed reinforcement learning for partially observable Markov decision process

Le Tien Dung, Takashi Komeda, Motoki Takagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
Pages7-12
Number of pages6
DOIs
Publication statusPublished - 2007
Event2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 - Jacksonville, FL
Duration: 2007 Jun 202007 Jun 23

Other

Other2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
CityJacksonville, FL
Period07/6/2007/6/23

Fingerprint

Recurrent neural networks
Reinforcement learning
Lighting
Feedback
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Dung, L. T., Komeda, T., & Takagi, M. (2007). Mixed reinforcement learning for partially observable Markov decision process. In Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 (pp. 7-12). [4269910] https://doi.org/10.1109/CIRA.2007.382910

Mixed reinforcement learning for partially observable Markov decision process. / Dung, Le Tien; Komeda, Takashi; Takagi, Motoki.

Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007. 2007. p. 7-12 4269910.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dung, LT, Komeda, T & Takagi, M 2007, Mixed reinforcement learning for partially observable Markov decision process. in Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007., 4269910, pp. 7-12, 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007, Jacksonville, FL, 07/6/20. https://doi.org/10.1109/CIRA.2007.382910
Dung LT, Komeda T, Takagi M. Mixed reinforcement learning for partially observable Markov decision process. In Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007. 2007. p. 7-12. 4269910 https://doi.org/10.1109/CIRA.2007.382910
Dung, Le Tien ; Komeda, Takashi ; Takagi, Motoki. / Mixed reinforcement learning for partially observable Markov decision process. Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007. 2007. pp. 7-12
@inproceedings{687f326037d3436685d9331bcfeb4ebe,
title = "Mixed reinforcement learning for partially observable Markov decision process",
abstract = "Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.",
author = "Dung, {Le Tien} and Takashi Komeda and Motoki Takagi",
year = "2007",
doi = "10.1109/CIRA.2007.382910",
language = "English",
isbn = "1424407907",
pages = "7--12",
booktitle = "Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007",

}

TY - GEN

T1 - Mixed reinforcement learning for partially observable Markov decision process

AU - Dung, Le Tien

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2007

Y1 - 2007

N2 - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

AB - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

UR - http://www.scopus.com/inward/record.url?scp=34948826477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34948826477&partnerID=8YFLogxK

U2 - 10.1109/CIRA.2007.382910

DO - 10.1109/CIRA.2007.382910

M3 - Conference contribution

SN - 1424407907

SN - 9781424407903

SP - 7

EP - 12

BT - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

ER -