Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

Tien Dung Le, Takashi Komeda, Motoki Takagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

Original languageEnglish
Title of host publicationProceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
Pages179-184
Number of pages6
Publication statusPublished - 2007
Event11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007 - Palma de Mallorca
Duration: 2007 Aug 292007 Aug 31

Other

Other11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
CityPalma de Mallorca
Period07/8/2907/8/31

Fingerprint

Recurrent neural networks
Reinforcement learning
Time delay
Lighting
Neural networks
Feedback
Experiments

Keywords

  • And neural networks
  • Machine learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software

Cite this

Le, T. D., Komeda, T., & Takagi, M. (2007). Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. In Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007 (pp. 179-184)

Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. / Le, Tien Dung; Komeda, Takashi; Takagi, Motoki.

Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. p. 179-184.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Le, TD, Komeda, T & Takagi, M 2007, Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. in Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. pp. 179-184, 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007, Palma de Mallorca, 07/8/29.
Le TD, Komeda T, Takagi M. Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. In Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. p. 179-184
Le, Tien Dung ; Komeda, Takashi ; Takagi, Motoki. / Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments. Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007. 2007. pp. 179-184
@inproceedings{583d2ca9a36349a295f5c0dd812459cd,
title = "Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments",
abstract = "Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.",
keywords = "And neural networks, Machine learning",
author = "Le, {Tien Dung} and Takashi Komeda and Motoki Takagi",
year = "2007",
language = "English",
isbn = "9780889866935",
pages = "179--184",
booktitle = "Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007",

}

TY - GEN

T1 - Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

AU - Le, Tien Dung

AU - Komeda, Takashi

AU - Takagi, Motoki

PY - 2007

Y1 - 2007

N2 - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

AB - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.

KW - And neural networks

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=54949127139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54949127139&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:54949127139

SN - 9780889866935

SP - 179

EP - 184

BT - Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007

ER -