Accurately establishing pixel-level correspondence between images taken from same objects is an essential problem in many computer vision applications, such as 3D reconstruction, simultaneous localization and mapping (SLAM), and augmented reality (AR). Existing local feature descriptor based image matching approaches are unable to avoid mismatches which cause negative effects to the above mentioned applications. This paper proposes a motion statistic based local homography transformation estimation method for removing mismatches. The proposed method estimates local homography transformations between the grids in a pair of images and then classifies each match as correct or incorrect by checking whether it is consisting with the corresponding local homography transformation or not. Experimental results on the widely used Oxford affine image dataset show that the proposed approach finds out more potential correct matches than the existing state-of-the-art method.