TY - GEN
T1 - Mixed reinforcement learning for partially observable Markov decision process
AU - Dung, Le Tien
AU - Komeda, Takashi
AU - Takagi, Motoki
PY - 2007/10/9
Y1 - 2007/10/9
N2 - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
AB - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.
UR - http://www.scopus.com/inward/record.url?scp=34948826477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34948826477&partnerID=8YFLogxK
U2 - 10.1109/CIRA.2007.382910
DO - 10.1109/CIRA.2007.382910
M3 - Conference contribution
AN - SCOPUS:34948826477
SN - 1424407907
SN - 9781424407903
T3 - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
SP - 7
EP - 12
BT - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
T2 - 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007
Y2 - 20 June 2007 through 23 June 2007
ER -