One hardware efficient and high speed architecture for variableblock size motion estimation in H.264 is presented in this paper. Through compressing the propagated data and optimizing theprocessing element and adder tree circuits in pipeline, this architecture gets more hardware efficient datapath logic. Compared with the original Propagate Partial SAD structure, 12.1% hardware cost can be saved. With TSMC 0.18m CMOS 1P6M standard celllibrary, the maximum clock speed of this design is 227MHz in worstwork conditions (1.62V, 125°C). With the 48x32 search range, the maximum throughput of our design is 147786 MB/S, which can be used in the real-time encoding of VGA resolution frame with 4 reference frames at 30Hz.