Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies

Seiji Ishihara, Harukazu Igarashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent's decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

Original languageEnglish
Title of host publicationPRICAI 2008
Subtitle of host publicationTrends in Artificial Intelligence - 10th Pacific Rim International Conference on Artificial Intelligence, Proceedings
Pages164-174
Number of pages11
DOIs
Publication statusPublished - 2008 Dec 1
Event10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008 - Hanoi, Viet Nam
Duration: 2008 Dec 152008 Dec 19

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5351 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008
Country/TerritoryViet Nam
CityHanoi
Period08/12/1508/12/19

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Behavior learning based on a policy gradient method: Separation of environmental dynamics and state values in policies'. Together they form a unique fingerprint.

Cite this