Recently, cloud systems start to be utilized for services to analyze user's data in the region of computer vision. In these services, keypoints are extracted from images or videos and the data is identified by machine learning with large database of cloud. Conventional keypoint extraction algorithms utilize only spatial information and many unnecessary keypoints for recognition are detected. Thus, the systems have to communicate large data and require processing time of descriptor calculations. This paper proposes a spatio-temporal keypoint extraction algorithm that detects only Keypoints of Interest (KOI) based on spatio-temporal feature considering mutual dependency and camera motion. The proposed method includes an approximated Kanade-Lucas-Tomasi (KLT) tracker to calculate the positions of keypoints and optical flow. This algorithm calculates the weight at each keypoint using two kinds of features: intensity gradient and optical flow. It reduces noise of extraction by comparing with states of surrounding keypoints. The camera motion estimation is added and it calculates camera-motion invariant optical flow. Evaluation results show that the proposed algorithm achieves 95% reduction of keypoint data and 53% reduction of computational complexity comparing a conventional keypoint extraction. KOI are extracted in the region whose motion and gradient are large.