TY - GEN
T1 - Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks
AU - Oguni, Yoshihiro
AU - Miyashita, Yuki
AU - Sugawara, Toshiharu
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - We propose a variable reward scheme in decentralized multi-agent deep reinforcement learning for a sequential task consisting of a number of subtasks which can be completed when all subtasks are executed in a certain order before a deadline by agents with different capabilities. Developments in computer science and robotics are drawing attention to multi-agent systems for complex tasks. However, coordinated behavior among agents requires sophistication and is highly dependent on the structures of tasks and environments; thus, it is preferable to individually learn appropriate coordination depending on specific tasks. This study focuses on the learning of a sequential task by cooperative agents from a practical perspective. In such tasks, agents must learn both efficiency for their own subtasks and coordinated behavior for other agents because the former provides more chances for the subsequent agents to learn, while the latter facilitates the execution of subsequent subtasks. Our proposed reward scheme enables agents to learn these behaviors in a balanced manner. We then experimentally show that agents in the proposed reward scheme can achieve more efficient task execution compared to baseline methods based on static reward schemes. We also analyzed the learned coordinated behavior to see the reasons of efficiency.
AB - We propose a variable reward scheme in decentralized multi-agent deep reinforcement learning for a sequential task consisting of a number of subtasks which can be completed when all subtasks are executed in a certain order before a deadline by agents with different capabilities. Developments in computer science and robotics are drawing attention to multi-agent systems for complex tasks. However, coordinated behavior among agents requires sophistication and is highly dependent on the structures of tasks and environments; thus, it is preferable to individually learn appropriate coordination depending on specific tasks. This study focuses on the learning of a sequential task by cooperative agents from a practical perspective. In such tasks, agents must learn both efficiency for their own subtasks and coordinated behavior for other agents because the former provides more chances for the subsequent agents to learn, while the latter facilitates the execution of subsequent subtasks. Our proposed reward scheme enables agents to learn these behaviors in a balanced manner. We then experimentally show that agents in the proposed reward scheme can achieve more efficient task execution compared to baseline methods based on static reward schemes. We also analyzed the learned coordinated behavior to see the reasons of efficiency.
KW - Multi-agent reinforcement learning
KW - Sequential tasks
KW - Variable reward scheme
UR - http://www.scopus.com/inward/record.url?scp=85141818421&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141818421&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-18192-4_24
DO - 10.1007/978-3-031-18192-4_24
M3 - Conference contribution
AN - SCOPUS:85141818421
SN - 9783031181917
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 294
EP - 306
BT - Advances in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection - 20th International Conference, PAAMS 2022, Proceedings
A2 - Dignum, Frank
A2 - Mathieu, Philippe
A2 - Corchado, Juan Manuel
A2 - De La Prieta, Fernando
A2 - Corchado, Juan Manuel
PB - Springer Science and Business Media Deutschland GmbH
T2 - 20th International Conference on Practical Applications of Agents and Multi-Agent Systems , PAAMS 2022
Y2 - 13 July 2022 through 15 July 2022
ER -