Behavior learning based on a policy gradient method

Separation of environmental dynamics and state values in policies

Seiji Ishihara, Harukazu Igarashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages164-174
Number of pages11
Volume5351 LNAI
DOIs
Publication statusPublished - 2008
Event10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008 - Hanoi
Duration: 2008 Dec 152008 Dec 19

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5351 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008
CityHanoi
Period08/12/1508/12/19

Fingerprint

Gradient methods
Reinforcement learning
Experiments

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Ishihara, S., & Igarashi, H. (2008). Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5351 LNAI, pp. 164-174). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5351 LNAI). https://doi.org/10.1007/978-3-540-89197-0_18

Behavior learning based on a policy gradient method : Separation of environmental dynamics and state values in policies. / Ishihara, Seiji; Igarashi, Harukazu.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5351 LNAI 2008. p. 164-174 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5351 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ishihara, S & Igarashi, H 2008, Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5351 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5351 LNAI, pp. 164-174, 10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008, Hanoi, 08/12/15. https://doi.org/10.1007/978-3-540-89197-0_18
Ishihara S, Igarashi H. Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5351 LNAI. 2008. p. 164-174. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-89197-0_18
Ishihara, Seiji ; Igarashi, Harukazu. / Behavior learning based on a policy gradient method : Separation of environmental dynamics and state values in policies. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5351 LNAI 2008. pp. 164-174 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b95989aa7bb24ee69d5c6530bc5018b7,
title = "Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies",
abstract = "Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.",
author = "Seiji Ishihara and Harukazu Igarashi",
year = "2008",
doi = "10.1007/978-3-540-89197-0_18",
language = "English",
isbn = "354089196X",
volume = "5351 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "164--174",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Behavior learning based on a policy gradient method

T2 - Separation of environmental dynamics and state values in policies

AU - Ishihara, Seiji

AU - Igarashi, Harukazu

PY - 2008

Y1 - 2008

N2 - Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

AB - Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

UR - http://www.scopus.com/inward/record.url?scp=58349115123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58349115123&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-89197-0_18

DO - 10.1007/978-3-540-89197-0_18

M3 - Conference contribution

SN - 354089196X

SN - 9783540891963

VL - 5351 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 164

EP - 174

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -