Variable block size motion estimation algorithm is the efficient approach to reduce the temporal redundancies and it has been adopted by the latest video coding standard H.264/AVC. The computational complexity augment coming from the variable block size technique makes the hardwired accelerator essential, especially for real-time applications. In this paper, the authors apply the architecture level and the circuits level approaches to improve the performance of Propagate Partial SAD and SAD Tree hardwired engines, which outperform other counterparts when considering the impact of supporting the variable block size technique. Experiments demonstrate that by using the proposed approaches, compared with the original architectures, 14.7% and 18.0% hardware cost can be saved for Propagate Partial SAD architecture and SAD Tree architecture, respectively. With TSMC 0.18 mm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture attains 231.6MHz operating frequency at a cost of 84.1k gates. Correspondingly, the execution speed of the optimized SAD Tree architecture is improved to 204.8MHz with 88.5k gate hardware overhead.