Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks

Yoshihiro Oguni*, Yuki Miyashita, Toshiharu Sugawara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a variable reward scheme in decentralized multi-agent deep reinforcement learning for a sequential task consisting of a number of subtasks which can be completed when all subtasks are executed in a certain order before a deadline by agents with different capabilities. Developments in computer science and robotics are drawing attention to multi-agent systems for complex tasks. However, coordinated behavior among agents requires sophistication and is highly dependent on the structures of tasks and environments; thus, it is preferable to individually learn appropriate coordination depending on specific tasks. This study focuses on the learning of a sequential task by cooperative agents from a practical perspective. In such tasks, agents must learn both efficiency for their own subtasks and coordinated behavior for other agents because the former provides more chances for the subsequent agents to learn, while the latter facilitates the execution of subsequent subtasks. Our proposed reward scheme enables agents to learn these behaviors in a balanced manner. We then experimentally show that agents in the proposed reward scheme can achieve more efficient task execution compared to baseline methods based on static reward schemes. We also analyzed the learned coordinated behavior to see the reasons of efficiency.

Original languageEnglish
Title of host publicationAdvances in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection - 20th International Conference, PAAMS 2022, Proceedings
EditorsFrank Dignum, Philippe Mathieu, Juan Manuel Corchado, Fernando De La Prieta, Juan Manuel Corchado
PublisherSpringer Science and Business Media Deutschland GmbH
Pages294-306
Number of pages13
ISBN (Print)9783031181917
DOIs
Publication statusPublished - 2022
Event20th International Conference on Practical Applications of Agents and Multi-Agent Systems , PAAMS 2022 - L'Aquila, Italy
Duration: 2022 Jul 132022 Jul 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13616 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Practical Applications of Agents and Multi-Agent Systems , PAAMS 2022
Country/TerritoryItaly
CityL'Aquila
Period22/7/1322/7/15

Keywords

  • Multi-agent reinforcement learning
  • Sequential tasks
  • Variable reward scheme

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks'. Together they form a unique fingerprint.

Cite this