TY - JOUR
T1 - Cache based motion compensation architecture for quad-HD H.264/AVC video decoder
AU - Zhou, Jinjia
AU - Zhou, Dajiang
AU - He, Gang
AU - Goto, Satoshi
PY - 2011/4
Y1 - 2011/4
N2 - In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4S × 4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 39 ∼ 49%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 24 ∼ 40%. When implemented with SMIC 90 nm process, this design costs a logic gate count and on-chip memory of 108.8k and 3.1kB respectively. The proposed MC architecture can support real-time processing of 3840 × 2160@60 fps with less than 166 MHz.
AB - In this paper, we present a cache based motion compensation (MC) architecture for Quad-HD H.264/AVC video decoder. With the significantly increased throughput requirement, VLSI design for MC is greatly challenged by the huge area cost and power consumption. Moreover, the long memory system latency leads to performance drop of the MC pipeline. To solve these problems, three optimization schemes are proposed in this work. Firstly, a high-performance interpolator based on Horizontal-Vertical Expansion and Luma-Chroma Parallelism (HVE-LCP) is proposed to efficiently increase the processing throughput to at least over 4 times as the previous designs. Secondly, an efficient cache memory organization scheme (4S × 4) is adopted to improve the on-chip memory utilization, which contributes to memory area saving of 25% and memory power saving of 39 ∼ 49%. Finally, by employing a Split Task Queue (STQ) architecture, the cache system is capable of tolerating much longer latency of the memory system. Consequently, the cache idle time is saved by 90%, which contributes to reducing the overall processing time by 24 ∼ 40%. When implemented with SMIC 90 nm process, this design costs a logic gate count and on-chip memory of 108.8k and 3.1kB respectively. The proposed MC architecture can support real-time processing of 3840 × 2160@60 fps with less than 166 MHz.
KW - 2-D cache
KW - H.264/AVC
KW - Interpolation
KW - Motion compensation
KW - Quad-HD
KW - Ultra high definition
UR - http://www.scopus.com/inward/record.url?scp=79953300707&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953300707&partnerID=8YFLogxK
U2 - 10.1587/transele.E94.C.439
DO - 10.1587/transele.E94.C.439
M3 - Article
AN - SCOPUS:79953300707
VL - E94-C
SP - 439
EP - 447
JO - IEICE Transactions on Electronics
JF - IEICE Transactions on Electronics
SN - 0916-8524
IS - 4
ER -