This paper proposes a method to improve the policies trained with multi-agent deep learning by adding a policy advisory module (PAM) in the testing phase to relax the exploration hindrance problem. Cooperation and coordination are central issues in the study of multi-agent systems, but agents’ policies learned in slightly different contexts may lead to ineffective behavior that reduces the quality of cooperation. For example, in a disaster rescue scenario, agents with different functions must work cooperatively as well as avoid collision. In the early stages, all agents work effectively, but when only a few tasks remain with the passage of time, agents are likely to focus more on avoiding negative rewards brought about by collision, but this avoidance behavior may hinder cooperative actions. For this problem, we propose a PAM that navigates agents in the testing phase to improve performance. Using an example problem of disaster rescue, we investigated whether the PAM could improve the entire performance by comparing cases with and without it. Our experimental results show that the PAM could break the exploration hindrance problem and improve the entire performance by navigating the trained agents.