An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder

Heming Sun, Dajiang Zhou, Jiayi Zhu, Shinji Kimura, Satoshi Goto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.

Original languageEnglish
Title of host publication2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages197-200
Number of pages4
ISBN (Print)9781479961399
DOIs
Publication statusPublished - 2015 Feb 27
Event2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014 - Valletta, Malta
Duration: 2014 Dec 72014 Dec 10

Other

Other2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014
CountryMalta
CityValletta
Period14/12/714/12/10

Fingerprint

Data storage equipment
Hardware
Costs
Discrete cosine transforms
Static random access storage
Decoding

Keywords

  • area-efficient
  • HEVC
  • IDCT
  • SRAM
  • video coding

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition

Cite this

Sun, H., Zhou, D., Zhu, J., Kimura, S., & Goto, S. (2015). An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder. In 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014 (pp. 197-200). [7051538] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VCIP.2014.7051538

An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder. / Sun, Heming; Zhou, Dajiang; Zhu, Jiayi; Kimura, Shinji; Goto, Satoshi.

2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014. Institute of Electrical and Electronics Engineers Inc., 2015. p. 197-200 7051538.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sun, H, Zhou, D, Zhu, J, Kimura, S & Goto, S 2015, An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder. in 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014., 7051538, Institute of Electrical and Electronics Engineers Inc., pp. 197-200, 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014, Valletta, Malta, 14/12/7. https://doi.org/10.1109/VCIP.2014.7051538
Sun H, Zhou D, Zhu J, Kimura S, Goto S. An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder. In 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014. Institute of Electrical and Electronics Engineers Inc. 2015. p. 197-200. 7051538 https://doi.org/10.1109/VCIP.2014.7051538
Sun, Heming ; Zhou, Dajiang ; Zhu, Jiayi ; Kimura, Shinji ; Goto, Satoshi. / An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder. 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 197-200
@inproceedings{15bb4b855494478698802798dceda1a3,
title = "An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder",
abstract = "This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100{\%} I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35{\%} and 62{\%} reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.",
keywords = "area-efficient, HEVC, IDCT, SRAM, video coding",
author = "Heming Sun and Dajiang Zhou and Jiayi Zhu and Shinji Kimura and Satoshi Goto",
year = "2015",
month = "2",
day = "27",
doi = "10.1109/VCIP.2014.7051538",
language = "English",
isbn = "9781479961399",
pages = "197--200",
booktitle = "2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder

AU - Sun, Heming

AU - Zhou, Dajiang

AU - Zhu, Jiayi

AU - Kimura, Shinji

AU - Goto, Satoshi

PY - 2015/2/27

Y1 - 2015/2/27

N2 - This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.

AB - This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.

KW - area-efficient

KW - HEVC

KW - IDCT

KW - SRAM

KW - video coding

UR - http://www.scopus.com/inward/record.url?scp=84925447183&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925447183&partnerID=8YFLogxK

U2 - 10.1109/VCIP.2014.7051538

DO - 10.1109/VCIP.2014.7051538

M3 - Conference contribution

SN - 9781479961399

SP - 197

EP - 200

BT - 2014 IEEE Visual Communications and Image Processing Conference, VCIP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -