We propose an interpretable neural network architecture for multi-agent deep reinforcement learning to understand the rationale for learned cooperative behavior of the agents. Although the deep learning technology has contributed significantly to multi-agent systems to build coordination among agents, it is still unclear what information the agents depend on to behave cooperatively. Removing this ambiguity may further improve the efficiency and productivity of multi-agent systems. The main idea of our proposal is to adopt the transformer to deep Q-network for addressing the above-mentioned issue. By extracting multi-head attention weights from the transformer encoder, we propose a multi-agent transformer deep Q-network (MAT-DQN) and show that agents using attention mechanisms possess better coordination capability with other agents despite being trained individually for a cooperative patrolling task problem; thus, they can exhibit better performance results compared with the agents with vanilla DQN (which is a baseline method). Furthermore, we indicate that it is possible to visualize heatmaps of attentions, which indicate the influential input-information in agents’ decision-making process for their cooperative behaviors.