Vision plays an important role in motion planning for mobile robots which coexist with humans. Because a method predicting a pedestrian path with a camera has a trade-off relationship between the calculation speed and accuracy, such a path prediction method is not good at instantaneously detecting multiple people at a distance. In this study, we thus present a method with visual recognition and prediction of transition of human action states to assess the risk of collision for selecting the avoidance target. The proposed system calculates the risk assessment score based on recognition of human body direction, human walking patterns with an object, and face orientation as well as prediction of transition of human action states. First, we investigated the validation of each recognition model, and we confirmed that the proposed system can recognize and predict human actions with high accuracy ahead of 3 m. Then, we compared the risk assessment score with video interviews to ask a human whom a mobile robot should pay attention to, and we found that the proposed system could capture the features of human states that people pay attention to when avoiding collision with other people from vision.