Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay

Yuki Miyashita*, Toshiharu Sugawara

*この研究の対応する著者

研究成果

抄録

Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.

本文言語English
ホスト出版物のタイトルNeural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
編集者Haiqin Yang, Kitsuchart Pasupa, Andrew Chi-Sing Leung, James T. Kwok, Jonathan H. Chan, Irwin King
出版社Springer Science and Business Media Deutschland GmbH
ページ257-269
ページ数13
ISBN(印刷版)9783030638320
DOI
出版ステータスPublished - 2020
イベント27th International Conference on Neural Information Processing, ICONIP 2020 - Bangkok, Thailand
継続期間: 2020 11月 182020 11月 22

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12533 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference27th International Conference on Neural Information Processing, ICONIP 2020
国/地域Thailand
CityBangkok
Period20/11/1820/11/22

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル