The present study proposes a method for detecting objects with a high recall rate for human-supported video annotation. In recent years, automatic annotation techniques such as object detection and tracking have become more powerful; however, detection and tracking of occluded objects, small objects, and blurred objects are still difficult. In order to annotate such objects, manual annotation is inevitably required. For this reason, we envision a human-supported video annotation framework in which over-detected objects (i.e., false positives) are allowed to minimize oversight (i.e., false negatives) in automatic annotation and then the over-detected objects are removed manually. This study attempts to achieve human-in-the-loop object detection with an emphasis on suppressing the oversight for the former stage of processing in the aforementioned annotation framework: bi-directional deep SORT is proposed to reliably capture missed objects and annotation-free segment identification (AFSID) is proposed to identify video frames in which manual annotation is not required. These methods are reinforced each other, yielding an increase in the detection rate while reducing the burden of human intervention. Experimental comparisons using a pedestrian video dataset demonstrated that bi-directional deep SORT with AFSID was successful in capturing object candidates with a higher recall rate over the existing deep SORT while reducing the cost of manpower compared to manual annotation at regular intervals.