This paper investigates how opinions are polarized by simulating opinion formation with Q-learning in multiplex networks. People sometimes change their opinions to accommodate themselves to the surrounding people in communities, but opinions may still be polarized. To investigate the mechanism of opinion polarization, many studies including studies using agent-based simulations were conducted, but most of these simulations were performed by assuming that people belong to a single community. A number of studies assumed multiple communities, but they usually considered only simple opinion formation methods and more studies are needed. In this paper, we propose an opinion formation model on multiplex networks using Q-learning for agents to identify better individual opinions and analyze how opinions are polarized or agreed on various network structures. Our experiments indicate that opinions are more likely to lead to a consensus on multiplex networks than on single-layer networks. They also suggested that opinions are easily polarized when their cluster coefficient were high and the characteristic path length were longer.