Scalable VLSI architecture for variable block size integer motion estimation in H.264/AVC

Yang Song, Zhenyu Liu, Satoshi Goto, Takeshi Ikenaga

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Because of the data correlation in the motion estimation (ME) algorithm of H.264/AVC reference software, it is difficult to implement an efficient ME hardware architecture. In order to make parallel processing feasible, four modified hardware friendly ME workflows are proposed in this paper. Based on these workflows. a scalable full search ME architecture is presented, which has following characteristics: (1) The sum of absolute differences (SAD) results of 4 × 4 sub-blocks is accumulated and reused to calculate SADs of bigger sub-blocks. (2) The number of PE groups is configurable. For a search range of M×N pixels, where M is width and N is height, up to M PE groups can be configured to work in parallel with a peak processing speed of N×16 clock cycles to fulfill a full search variable block size ME (VBSME). (3) Only conventional single port SRAM is required, which makes this architecture suitable for standard-cell-based implementation. A design with 8 PE groups has been realized with TSMC 0.18 μm CMOS technology. The core area is 2.13mm × 1.60 mm and clock frequency is 228 MHz in typical condition (1.8 V, 25°C).

Original languageEnglish
Pages (from-to)979-987
Number of pages9
JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
VolumeE89-A
Issue number4
DOIs
Publication statusPublished - 2006 Apr

Fingerprint

VLSI Architecture
Motion Estimation
Motion estimation
Integer
Work Flow
Clocks
Hardware
Hardware Architecture
Efficient Estimation
Static random access storage
Processing
Parallel Processing
Estimation Algorithms
Pixel
Pixels
Cycle
Calculate
Software
Cell
Range of data

Keywords

  • H.264/AVC
  • Variable block size motion estimation (VBSME)
  • Very large scale integration (VLSI) architecture

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Information Systems

Cite this

Scalable VLSI architecture for variable block size integer motion estimation in H.264/AVC. / Song, Yang; Liu, Zhenyu; Goto, Satoshi; Ikenaga, Takeshi.

In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E89-A, No. 4, 04.2006, p. 979-987.

Research output: Contribution to journalArticle

@article{bcb25c49f5644426a465df65b29a9caf,
title = "Scalable VLSI architecture for variable block size integer motion estimation in H.264/AVC",
abstract = "Because of the data correlation in the motion estimation (ME) algorithm of H.264/AVC reference software, it is difficult to implement an efficient ME hardware architecture. In order to make parallel processing feasible, four modified hardware friendly ME workflows are proposed in this paper. Based on these workflows. a scalable full search ME architecture is presented, which has following characteristics: (1) The sum of absolute differences (SAD) results of 4 × 4 sub-blocks is accumulated and reused to calculate SADs of bigger sub-blocks. (2) The number of PE groups is configurable. For a search range of M×N pixels, where M is width and N is height, up to M PE groups can be configured to work in parallel with a peak processing speed of N×16 clock cycles to fulfill a full search variable block size ME (VBSME). (3) Only conventional single port SRAM is required, which makes this architecture suitable for standard-cell-based implementation. A design with 8 PE groups has been realized with TSMC 0.18 μm CMOS technology. The core area is 2.13mm × 1.60 mm and clock frequency is 228 MHz in typical condition (1.8 V, 25°C).",
keywords = "H.264/AVC, Variable block size motion estimation (VBSME), Very large scale integration (VLSI) architecture",
author = "Yang Song and Zhenyu Liu and Satoshi Goto and Takeshi Ikenaga",
year = "2006",
month = "4",
doi = "10.1093/ietfec/e89-a.4.979",
language = "English",
volume = "E89-A",
pages = "979--987",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "4",

}

TY - JOUR

T1 - Scalable VLSI architecture for variable block size integer motion estimation in H.264/AVC

AU - Song, Yang

AU - Liu, Zhenyu

AU - Goto, Satoshi

AU - Ikenaga, Takeshi

PY - 2006/4

Y1 - 2006/4

N2 - Because of the data correlation in the motion estimation (ME) algorithm of H.264/AVC reference software, it is difficult to implement an efficient ME hardware architecture. In order to make parallel processing feasible, four modified hardware friendly ME workflows are proposed in this paper. Based on these workflows. a scalable full search ME architecture is presented, which has following characteristics: (1) The sum of absolute differences (SAD) results of 4 × 4 sub-blocks is accumulated and reused to calculate SADs of bigger sub-blocks. (2) The number of PE groups is configurable. For a search range of M×N pixels, where M is width and N is height, up to M PE groups can be configured to work in parallel with a peak processing speed of N×16 clock cycles to fulfill a full search variable block size ME (VBSME). (3) Only conventional single port SRAM is required, which makes this architecture suitable for standard-cell-based implementation. A design with 8 PE groups has been realized with TSMC 0.18 μm CMOS technology. The core area is 2.13mm × 1.60 mm and clock frequency is 228 MHz in typical condition (1.8 V, 25°C).

AB - Because of the data correlation in the motion estimation (ME) algorithm of H.264/AVC reference software, it is difficult to implement an efficient ME hardware architecture. In order to make parallel processing feasible, four modified hardware friendly ME workflows are proposed in this paper. Based on these workflows. a scalable full search ME architecture is presented, which has following characteristics: (1) The sum of absolute differences (SAD) results of 4 × 4 sub-blocks is accumulated and reused to calculate SADs of bigger sub-blocks. (2) The number of PE groups is configurable. For a search range of M×N pixels, where M is width and N is height, up to M PE groups can be configured to work in parallel with a peak processing speed of N×16 clock cycles to fulfill a full search variable block size ME (VBSME). (3) Only conventional single port SRAM is required, which makes this architecture suitable for standard-cell-based implementation. A design with 8 PE groups has been realized with TSMC 0.18 μm CMOS technology. The core area is 2.13mm × 1.60 mm and clock frequency is 228 MHz in typical condition (1.8 V, 25°C).

KW - H.264/AVC

KW - Variable block size motion estimation (VBSME)

KW - Very large scale integration (VLSI) architecture

UR - http://www.scopus.com/inward/record.url?scp=33646244730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646244730&partnerID=8YFLogxK

U2 - 10.1093/ietfec/e89-a.4.979

DO - 10.1093/ietfec/e89-a.4.979

M3 - Article

AN - SCOPUS:33646244730

VL - E89-A

SP - 979

EP - 987

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 4

ER -