Learning of soccer player agents using a policy gradient method

Coordination between kicker and receiver during free kicks

Harukazu Igarashi, K. Nakamura, S. Ishihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.

Original languageEnglish
Title of host publicationProceedings of the International Joint Conference on Neural Networks
Pages46-52
Number of pages7
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong
Duration: 2008 Jun 12008 Jun 8

Other

Other2008 International Joint Conference on Neural Networks, IJCNN 2008
CityHong Kong
Period08/6/108/6/8

Fingerprint

Gradient methods
Function evaluation
Reinforcement learning
Experiments

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Igarashi, H., Nakamura, K., & Ishihara, S. (2008). Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks. In Proceedings of the International Joint Conference on Neural Networks (pp. 46-52). [4633765] https://doi.org/10.1109/IJCNN.2008.4633765

Learning of soccer player agents using a policy gradient method : Coordination between kicker and receiver during free kicks. / Igarashi, Harukazu; Nakamura, K.; Ishihara, S.

Proceedings of the International Joint Conference on Neural Networks. 2008. p. 46-52 4633765.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Igarashi, H, Nakamura, K & Ishihara, S 2008, Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks. in Proceedings of the International Joint Conference on Neural Networks., 4633765, pp. 46-52, 2008 International Joint Conference on Neural Networks, IJCNN 2008, Hong Kong, 08/6/1. https://doi.org/10.1109/IJCNN.2008.4633765
Igarashi H, Nakamura K, Ishihara S. Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks. In Proceedings of the International Joint Conference on Neural Networks. 2008. p. 46-52. 4633765 https://doi.org/10.1109/IJCNN.2008.4633765
Igarashi, Harukazu ; Nakamura, K. ; Ishihara, S. / Learning of soccer player agents using a policy gradient method : Coordination between kicker and receiver during free kicks. Proceedings of the International Joint Conference on Neural Networks. 2008. pp. 46-52
@inproceedings{5f2651a9df824ef79764744d35433412,
title = "Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks",
abstract = "The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.",
author = "Harukazu Igarashi and K. Nakamura and S. Ishihara",
year = "2008",
doi = "10.1109/IJCNN.2008.4633765",
language = "English",
isbn = "9781424418213",
pages = "46--52",
booktitle = "Proceedings of the International Joint Conference on Neural Networks",

}

TY - GEN

T1 - Learning of soccer player agents using a policy gradient method

T2 - Coordination between kicker and receiver during free kicks

AU - Igarashi, Harukazu

AU - Nakamura, K.

AU - Ishihara, S.

PY - 2008

Y1 - 2008

N2 - The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.

AB - The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponent's penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammate's action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.

UR - http://www.scopus.com/inward/record.url?scp=56349109992&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56349109992&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2008.4633765

DO - 10.1109/IJCNN.2008.4633765

M3 - Conference contribution

SN - 9781424418213

SP - 46

EP - 52

BT - Proceedings of the International Joint Conference on Neural Networks

ER -