TY - GEN
T1 - Coordination in adversarial multi-agent with deep reinforcement learning under partial observability
AU - Diallo, Elhadji Amadou Oury
AU - Sugawara, Toshiharu
N1 - Funding Information:
This work is partly supported by JSPS KAKENHI Grant Number 17KT0044.
PY - 2019/11
Y1 - 2019/11
N2 - We propose a method using several variants of deep Q-network for learning strategic formations in large-scale adversarial multi-agent systems. The goal is to learn how to maximize the probability of acting jointly as coordinated as possible. Our method is called the centralized training and decentralized testing (CTDT) framework that is based on the POMDP during training and dec-POMDP during testing. During the training phase, the centralized neural network's inputs are the collections of local observations of agents of the same team. Although agents only know their action, the centralized network decides the joint action and subsequently distributes these actions to the individual agents. During the test, however, each agent uses a copy of the centralized network and independently decides its action based on its policy and local view. We show that deep reinforcement learning techniques using the CTDT framework can converge and generate several strategic group formations in large-scale multi-agent systems. We also compare the results using the CTDT with those derived from a centralized shared DQN and then we investigate the characteristics of the learned behaviors.
AB - We propose a method using several variants of deep Q-network for learning strategic formations in large-scale adversarial multi-agent systems. The goal is to learn how to maximize the probability of acting jointly as coordinated as possible. Our method is called the centralized training and decentralized testing (CTDT) framework that is based on the POMDP during training and dec-POMDP during testing. During the training phase, the centralized neural network's inputs are the collections of local observations of agents of the same team. Although agents only know their action, the centralized network decides the joint action and subsequently distributes these actions to the individual agents. During the test, however, each agent uses a copy of the centralized network and independently decides its action based on its policy and local view. We show that deep reinforcement learning techniques using the CTDT framework can converge and generate several strategic group formations in large-scale multi-agent systems. We also compare the results using the CTDT with those derived from a centralized shared DQN and then we investigate the characteristics of the learned behaviors.
KW - Coordination and cooperation
KW - Dec POMDP
KW - Deep reinforcement learning
KW - Multi agent learning
KW - Multi agent systems
UR - http://www.scopus.com/inward/record.url?scp=85081087221&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081087221&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2019.00036
DO - 10.1109/ICTAI.2019.00036
M3 - Conference contribution
AN - SCOPUS:85081087221
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 198
EP - 205
BT - Proceedings - IEEE 31st International Conference on Tools with Artificial Intelligence, ICTAI 2019
PB - IEEE Computer Society
T2 - 31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019
Y2 - 4 November 2019 through 6 November 2019
ER -