Highly parallel and fully reused H.264/AVC high profile intra predictor generation engine for super hi-vision 4k × 4k@60 fps

Yiqing Huang, Xiaocong Jin, Jin Zhou, Jia Su, Takeshi Ikenaga

Research output: Contribution to journalArticle

Abstract

One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 μm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.

Original languageEnglish
Pages (from-to)428-438
Number of pages11
JournalIEICE Transactions on Electronics
VolumeE94-C
Issue number4
DOIs
Publication statusPublished - 2011 Apr

Fingerprint

Engines
Processing
Pixels
Hardware

Keywords

  • H.264/AVC
  • Hardware architecture
  • Intra prediction
  • Super hi-vision

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Electronic, Optical and Magnetic Materials

Cite this

Highly parallel and fully reused H.264/AVC high profile intra predictor generation engine for super hi-vision 4k × 4k@60 fps. / Huang, Yiqing; Jin, Xiaocong; Zhou, Jin; Su, Jia; Ikenaga, Takeshi.

In: IEICE Transactions on Electronics, Vol. E94-C, No. 4, 04.2011, p. 428-438.

Research output: Contribution to journalArticle

@article{da96e1579fff471a9a6146a00fb5c290,
title = "Highly parallel and fully reused H.264/AVC high profile intra predictor generation engine for super hi-vision 4k × 4k@60 fps",
abstract = "One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5{\%} processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50{\%} cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5{\%} and 73.2{\%} saving of processing cycles, respectively. Synthesized by TSMC 0.18 μm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.",
keywords = "H.264/AVC, Hardware architecture, Intra prediction, Super hi-vision",
author = "Yiqing Huang and Xiaocong Jin and Jin Zhou and Jia Su and Takeshi Ikenaga",
year = "2011",
month = "4",
doi = "10.1587/transele.E94.C.428",
language = "English",
volume = "E94-C",
pages = "428--438",
journal = "IEICE Transactions on Electronics",
issn = "0916-8524",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "4",

}

TY - JOUR

T1 - Highly parallel and fully reused H.264/AVC high profile intra predictor generation engine for super hi-vision 4k × 4k@60 fps

AU - Huang, Yiqing

AU - Jin, Xiaocong

AU - Zhou, Jin

AU - Su, Jia

AU - Ikenaga, Takeshi

PY - 2011/4

Y1 - 2011/4

N2 - One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 μm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.

AB - One high profile intra predictor generation engine is proposed in this paper. Firstly, hardware level algorithm optimization for intra 8 × 8 (I8MB) mode is introduced. The original candidate pixels for generating prediction samples of I8MB are replaced with boundary pixels of intra 4 × 4 (I4MB) blocks. Based on this adoption, full data reuse between predictors of I4MB and filtered samples of I8MB can be achieved with almost no quality loss. Secondly, one lossless two-4 × 4-block based parallel predictor generation flow is proposed. The original predictor generation flow is optimized from 16 stages to 10 stages for I4MB and Intra 16 × 16 (I16MB), which saves 37.5% processing cycles. For I8MB, similar methodology with different processing order of 4 × 4 scaled blocks is introduced. Thirdly, fully utilized hardwired engines for I4MB, I16MB and I8MB are proposed in this paper. Except DC (direct current) and plane modes, full data reuse among all intra modes of high profile can be achieved. Fourthly, for DC mode, one combined predictor generation process is introduced and predictor generation of I16MB's DC mode is merged into the process of I4MB's DC mode. Moreover, by configuring proposed hardwired engines, predictor generation of I16MB's plane mode and chrominance plane mode can be accomplished with only 50% cycles of original design. Totally, when compared with original full-mode design and latest dynamic mode reused design, the proposed predictor generation engine can achieve 89.5% and 73.2% saving of processing cycles, respectively. Synthesized by TSMC 0.18 μm technology under worst work conditions (1.62V, 125°C), with 380MHz and 37.2k gates, the proposed design can handle real-time high profile intra predictor generation of Super Hi-Vision 4k × 4k@60 fps. The maximum work frequency of our design under worst condition is 468 MHz.

KW - H.264/AVC

KW - Hardware architecture

KW - Intra prediction

KW - Super hi-vision

UR - http://www.scopus.com/inward/record.url?scp=79953314961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953314961&partnerID=8YFLogxK

U2 - 10.1587/transele.E94.C.428

DO - 10.1587/transele.E94.C.428

M3 - Article

AN - SCOPUS:79953314961

VL - E94-C

SP - 428

EP - 438

JO - IEICE Transactions on Electronics

JF - IEICE Transactions on Electronics

SN - 0916-8524

IS - 4

ER -