Behavior learning based on a policy gradient method: Separation of environmental dynamics and state-values in policies

Ishihara Seiji, Harukazu Igarashi

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

Original languageEnglish
JournalIEEJ Transactions on Electronics, Information and Systems
Volume129
Issue number9
DOIs
Publication statusPublished - 2009

Fingerprint

Gradient methods
Reinforcement learning

Keywords

  • Policy gradient method
  • Pursuit problem
  • Reinforcement learning
  • State transition probabilities

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

@article{2072aa6ba45d4ee1bd1bf9581344db5b,
title = "Behavior learning based on a policy gradient method: Separation of environmental dynamics and state-values in policies",
abstract = "Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.",
keywords = "Policy gradient method, Pursuit problem, Reinforcement learning, State transition probabilities",
author = "Ishihara Seiji and Harukazu Igarashi",
year = "2009",
doi = "10.1541/ieejeiss.129.1737",
language = "English",
volume = "129",
journal = "IEEJ Transactions on Electronics, Information and Systems",
issn = "0385-4221",
publisher = "The Institute of Electrical Engineers of Japan",
number = "9",

}

TY - JOUR

T1 - Behavior learning based on a policy gradient method

T2 - Separation of environmental dynamics and state-values in policies

AU - Seiji, Ishihara

AU - Igarashi, Harukazu

PY - 2009

Y1 - 2009

N2 - Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

AB - Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

KW - Policy gradient method

KW - Pursuit problem

KW - Reinforcement learning

KW - State transition probabilities

UR - http://www.scopus.com/inward/record.url?scp=70350148263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350148263&partnerID=8YFLogxK

U2 - 10.1541/ieejeiss.129.1737

DO - 10.1541/ieejeiss.129.1737

M3 - Article

AN - SCOPUS:70350148263

VL - 129

JO - IEEJ Transactions on Electronics, Information and Systems

JF - IEEJ Transactions on Electronics, Information and Systems

SN - 0385-4221

IS - 9

ER -