Reinforcement learning for POMDP using state classification

Le Tien Dung, Takashi Komeda, Motoki Takagi

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.

Original languageEnglish
Pages (from-to)761-779
Number of pages19
JournalApplied Artificial Intelligence
Volume22
Issue number7-8
DOIs
Publication statusPublished - 2008 Aug

Fingerprint

Recurrent neural networks
Reinforcement learning
Feedback
Experiments

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Reinforcement learning for POMDP using state classification. / Dung, Le Tien; Komeda, Takashi; Takagi, Motoki.

In: Applied Artificial Intelligence, Vol. 22, No. 7-8, 08.2008, p. 761-779.

Research output: Contribution to journalArticle

Dung, Le Tien ; Komeda, Takashi ; Takagi, Motoki. / Reinforcement learning for POMDP using state classification. In: Applied Artificial Intelligence. 2008 ; Vol. 22, No. 7-8. pp. 761-779.
@article{5e191fc79c4340efaa06e60163f7a3f4,
title = "Reinforcement learning for POMDP using state classification",
abstract = "Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.",
author = "Dung, {Le Tien} and Takashi Komeda and Motoki Takagi",
year = "2008",
month = "8",
doi = "10.1080/08839510802170538",
language = "English",
volume = "22",
pages = "761--779",
journal = "Applied Artificial Intelligence",
issn = "0883-9514",
publisher = "Taylor and Francis Ltd.",
number = "7-8",

}

TY - JOUR

T1 - Reinforcement learning for POMDP using state classification

AU - Dung, Le Tien

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2008/8

Y1 - 2008/8

N2 - Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.

AB - Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.

UR - http://www.scopus.com/inward/record.url?scp=49749119543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49749119543&partnerID=8YFLogxK

U2 - 10.1080/08839510802170538

DO - 10.1080/08839510802170538

M3 - Article

AN - SCOPUS:49749119543

VL - 22

SP - 761

EP - 779

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

SN - 0883-9514

IS - 7-8

ER -