This paper tackles on building a pattern recognition system that detects whether a pair of Japanese black beefs captured in a given image region is in a “mounting” action, which is known to be a sign critically important to be detected for cattle farmers before artificial insemination. The “mounting” action refers to a cattle's action where a cow bends over another cow usually when either cow is in estrus. Although a pattern recognition-based approach for detecting such an action would be appreciated as being low-cost and robust, it had not been discussed much due to the complexity of the system architecture, unavailability of datasets, etc. This study presents i) our image dataset construction technique that exploits both object detection algorithm and crowdsourcing for collecting cattle pair images with labels of either “mounting” or not; and ii) a system for detecting the mounting action from any given image of a cattle pair, developed based on the dataset. Starting with an algorithm for extracting regions of cattle pairs from a video frame based on intersection of single cattle regions, we then designed our crowdsourcing microtask in which crowd workers were given simple guidelines to annotate mounting-action-relevant labels to the extracted regions, to finally obtain a dataset. We also introduce our tandem-layered pattern recognition system trained with the dataset. The system is comprised of two serially-connected machine learning components, and is capable of more robustly detecting mounting actions even with a small amount of training data than a normal end-to-end neural network. Experimental comparisons demonstrated that our detection system was capable of detecting estrus with a precision rate of 80% and a recall rate of 76%.