Architecture and circuit optimization of hardwired integer motion estimation engine for H.264/AVC

Zhenyu Liu, Dongsheng Wang, Takeshi Ikenaga

研究成果: Article

抄録

Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 μm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.

元の言語English
ページ(範囲)2065-2073
ページ数9
ジャーナルIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
E93-A
発行部数11
DOI
出版物ステータスPublished - 2010 11

Fingerprint

Motion Estimation
Motion estimation
Partial Sums
Engine
Engines
Hardware
Integer
Optimization
Networks (circuits)
Image coding
Computer hardware
Particle accelerators
Redundancy
Computational complexity
Bandwidth
Data storage equipment
Video Coding
Costs
Accelerator
Experiments

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Graphics and Computer-Aided Design
  • Applied Mathematics
  • Signal Processing

これを引用

@article{4bb2ef8d63ae447d9545b43b407921cd,
title = "Architecture and circuit optimization of hardwired integer motion estimation engine for H.264/AVC",
abstract = "Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8MHz operating frequency, compared with the original architectures, 14.7{\%} and 18.0{\%} gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 μm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.",
keywords = "H.264/AVC, Hardwired engine, Variable block size motion estimation, Very large scale integration (VLSI)",
author = "Zhenyu Liu and Dongsheng Wang and Takeshi Ikenaga",
year = "2010",
month = "11",
doi = "10.1587/transfun.E93.A.2065",
language = "English",
volume = "E93-A",
pages = "2065--2073",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "11",

}

TY - JOUR

T1 - Architecture and circuit optimization of hardwired integer motion estimation engine for H.264/AVC

AU - Liu, Zhenyu

AU - Wang, Dongsheng

AU - Ikenaga, Takeshi

PY - 2010/11

Y1 - 2010/11

N2 - Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 μm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.

AB - Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 μm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.

KW - H.264/AVC

KW - Hardwired engine

KW - Variable block size motion estimation

KW - Very large scale integration (VLSI)

UR - http://www.scopus.com/inward/record.url?scp=78049508947&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049508947&partnerID=8YFLogxK

U2 - 10.1587/transfun.E93.A.2065

DO - 10.1587/transfun.E93.A.2065

M3 - Article

AN - SCOPUS:78049508947

VL - E93-A

SP - 2065

EP - 2073

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 11

ER -