We investigate the emergence and stability of social conventions for efficiently resolving conflicts through reinforcement learning. Facilitation of coordination and conflict resolution is an important issue in multi-agent systems. However, exhibiting coordinated and negotiation activities is computationally expensive. In this paper, we first describe a conflict situation using a Markov game which is iterated if the agents fail to resolve their conflicts, where the repeated failures result in an inefficient society. Using this game, we show that social conventions for resolving conflicts emerge, but their stability and social efficiency depend on the payoff matrices that characterize the agents. We also examine how unbalanced populations and small heterogeneous agents affect efficiency and stability of the resulting conventions. Our results show that (a) a type of indecisive agent that is generous for adverse results leads to unstable societies, and (b) selfish agents that have an explicit order of benefits make societies stable and efficient.