Path planning of a mobile robot as a discrete optimization problem and adjustment of weight parameters in the objective function by reinforcement learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In a previous paper, we proposed a solution to path planning of a mobile robot. In our approach, we formulated the problem as a discrete optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal term, a smoothness term and a collision term. This paper presents a theoretical method using reinforcement learning for adjusting weight parameters in the objective functions. However, the conventional Q-learning method cannot be applied to a non-Markov decision process. Thus, we applied Williams's learning algorithm, REINFORCE, to derive an updating rule for the weight parameters. This is a stochastic hill-climbing method to maximize a value function. We verified the updating rule by experiment.

Original languageEnglish
Title of host publicationRoboCup 2000
Subtitle of host publicationRobot Soccer World Cup IV
EditorsPeter Stone, Tucker Balch, Gerhard Kraetzschmar
PublisherSpringer Verlag
Pages315-320
Number of pages6
ISBN (Print)3540421858, 9783540421856
DOIs
Publication statusPublished - 2001
Event4th Robot World Cup Soccer Games and Conferences, RoboCup 2000 - Melbourne, VIC, Australia
Duration: 2000 Aug 272000 Sep 3

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2019 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th Robot World Cup Soccer Games and Conferences, RoboCup 2000
CountryAustralia
CityMelbourne, VIC
Period00/8/2700/9/3

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Path planning of a mobile robot as a discrete optimization problem and adjustment of weight parameters in the objective function by reinforcement learning'. Together they form a unique fingerprint.

Cite this