A two-stage calving sign detection system is proposed to effectively use state information on the calving behavior of Japanese black beef cows (e.g. standing or sitting, tail raising). Automatic calving sign detection from cameras can help livestock farmers prevent fatal accidents to calves during calving. The following requirements were identified for such camera-based detection systems: 1) ability to work with a small volume of data (because calving events are not very frequent), 2) robustness to changing environments, and 3) ability to explain the reasons for the prediction results. However, these requirements are not realistic for end-to-end approaches such as the predictions using a single deep neural network (DNN). This study presents a two-stage calving prediction system, in which calving-relevant information obtained using a DNN-based feature extractor is used as the input for another DNN-based calving sign detector. The first-stage DNN extracts discriminative features of typical pre-calving behaviour in cows, such as increased lying time and tail raising. The former DNN is expected to achieve accurate feature extraction and to enable training of the latter DNN using small-scale data. Furthermore, the states observable from the video frames, which are outputs of the former DNN, can make use of the crowdsourcing for sustainable growth; moreover, these states can also provide the basis for the accurate prediction of calving. Experimental comparisons conducted using video scenes of five cows during the normal and pre-calving states demonstrated that the proposed system achieved a calving precision rate of 81% and a calving recall rate of 91%.