### Abstract

Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

Original language | English |
---|---|

Title of host publication | Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 |

Pages | 7-12 |

Number of pages | 6 |

DOIs | |

Publication status | Published - 2007 |

Event | 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 - Jacksonville, FL Duration: 2007 Jun 20 → 2007 Jun 23 |

### Other

Other | 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007 |
---|---|

City | Jacksonville, FL |

Period | 07/6/20 → 07/6/23 |

### Fingerprint

### ASJC Scopus subject areas

- Artificial Intelligence
- Software
- Control and Systems Engineering
- Electrical and Electronic Engineering

### Cite this

*Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007*(pp. 7-12). [4269910] https://doi.org/10.1109/CIRA.2007.382910

**Mixed reinforcement learning for partially observable Markov decision process.** / Dung, Le Tien; Komeda, Takashi; Takagi, Motoki.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007.*, 4269910, pp. 7-12, 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007, Jacksonville, FL, 07/6/20. https://doi.org/10.1109/CIRA.2007.382910

}

TY - GEN

T1 - Mixed reinforcement learning for partially observable Markov decision process

AU - Dung, Le Tien

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2007

Y1 - 2007

N2 - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

AB - Reinforcement Learning has been widely used to solve problems with a little feedback from environment. Q learning can solve full observable Markov Decision Processes quite well. For Partially Observable Markov Decision Processes (POMDPs), a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, Mixed Reinforcement Learning is presented to And an optimal policy for POMDPs in a shorter learning time. This method uses both a Q value table and a RNN. Q value table stores Q values for full observable states and the RNN approximates Q values for hidden states. An observable degree is calculated for each state while the agent explores the environment. If the observable degree is less than a threshold, the state is considered as a hidden state. Results of experiment in lighting grid world problem show that the proposed method enables an agent to acquire a policy, as good as the policy acquired by using only a RNN, with better learning performance.

UR - http://www.scopus.com/inward/record.url?scp=34948826477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34948826477&partnerID=8YFLogxK

U2 - 10.1109/CIRA.2007.382910

DO - 10.1109/CIRA.2007.382910

M3 - Conference contribution

SN - 1424407907

SN - 9781424407903

SP - 7

EP - 12

BT - Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007

ER -