Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

Tien Dung Le, Takashi Komeda, Motoki Takagi

研究成果: Conference contribution

抄録

Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

元の言語English
ホスト出版物のタイトルProceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
ページ179-184
ページ数6
出版物ステータスPublished - 2007
イベント11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007 - Palma de Mallorca
継続期間: 2007 8 292007 8 31

Other

Other11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
Palma de Mallorca
期間07/8/2907/8/31

Fingerprint

Recurrent neural networks
Reinforcement learning
Time delay
Lighting
Neural networks
Feedback
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software

これを引用

Le, T. D., Komeda, T., & Takagi, M. (2007). Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. : Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007 (pp. 179-184)

Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. / Le, Tien Dung; Komeda, Takashi; Takagi, Motoki.

Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. p. 179-184.

研究成果: Conference contribution

Le, TD, Komeda, T & Takagi, M 2007, Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. : Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. pp. 179-184, 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007, Palma de Mallorca, 07/8/29.
Le TD, Komeda T, Takagi M. Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. : Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. p. 179-184
Le, Tien Dung ; Komeda, Takashi ; Takagi, Motoki. / Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. pp. 179-184
@inproceedings{583d2ca9a36349a295f5c0dd812459cd,
title = "Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments",
abstract = "Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.",
keywords = "And neural networks, Machine learning",
author = "Le, {Tien Dung} and Takashi Komeda and Motoki Takagi",
year = "2007",
language = "English",
isbn = "9780889866935",
pages = "179--184",
booktitle = "Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007",

}

TY - GEN

T1 - Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

AU - Le, Tien Dung

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2007

Y1 - 2007

N2 - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

AB - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

KW - And neural networks

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=54949127139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54949127139&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:54949127139

SN - 9780889866935

SP - 179

EP - 184

BT - Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007

ER -