TY - GEN
T1 - Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments
AU - Le, Tien Dung
AU - Komeda, Takashi
AU - Takagi, Motoki
PY - 2007
Y1 - 2007
N2 - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.
AB - Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.
KW - And neural networks
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=54949127139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=54949127139&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:54949127139
SN - 9780889866935
T3 - Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
SP - 179
EP - 184
BT - Proceedings of the 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
T2 - 11th IASTED International Conference on Artificial Intelligence and Soft Computing, ASC 2007
Y2 - 29 August 2007 through 31 August 2007
ER -