Local feature extraction is an important solution for video analysis. The common framework of local feature extraction consists of a local keypoint detector and a keypoint descriptor. Existing keypoint detectors mainly focus on the spatial relationships among pixels, resulting in a large amount of redundant keypoints on background which are often temporally stationary. This paper proposes an object-aware local keypoint selection approach to keep the active keypoints on object and to reduce the redundant keypoints on background by exploring the temporal coherence among successive frames in video. The proposed approach is made up of three local temporal coherence criteria: (1) local temporal intensity coherence; (2) local temporal motion coherence; (3) local temporal orientation coherence. Experimental results on two publicly available datasets show that the proposed approach reduces more than 60% keypoints, which are redundant, and doubles the precision of keypoints.