High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding

Gang He, Dajiang Zhou, Yunsong Li, Zhixiang Chen, Tianruo Zhang, Satoshi Goto

    Research output: Contribution to journalArticle

    40 Citations (Scopus)

    Abstract

    Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.

    Original languageEnglish
    JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
    DOIs
    Publication statusAccepted/In press - 2015 Mar 19

    Fingerprint

    Motion estimation
    Image coding
    Throughput
    Pixels
    Hardware
    Image compression
    Computational complexity
    Energy dissipation
    Interpolation
    Pipelines
    Data storage equipment
    Processing
    Costs

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Hardware and Architecture
    • Software

    Cite this

    High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding. / He, Gang; Zhou, Dajiang; Li, Yunsong; Chen, Zhixiang; Zhang, Tianruo; Goto, Satoshi.

    In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 19.03.2015.

    Research output: Contribution to journalArticle

    @article{a44a1482c7174836afcad11416cacc83,
    title = "High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding",
    abstract = "Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52{\%} improvement on power efficiency, relative to previous works in H.264.",
    author = "Gang He and Dajiang Zhou and Yunsong Li and Zhixiang Chen and Tianruo Zhang and Satoshi Goto",
    year = "2015",
    month = "3",
    day = "19",
    doi = "10.1109/TVLSI.2014.2386897",
    language = "English",
    journal = "IEEE Transactions on Very Large Scale Integration (VLSI) Systems",
    issn = "1063-8210",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",

    }

    TY - JOUR

    T1 - High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding

    AU - He, Gang

    AU - Zhou, Dajiang

    AU - Li, Yunsong

    AU - Chen, Zhixiang

    AU - Zhang, Tianruo

    AU - Goto, Satoshi

    PY - 2015/3/19

    Y1 - 2015/3/19

    N2 - Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.

    AB - Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 x 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.

    UR - http://www.scopus.com/inward/record.url?scp=84925435248&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84925435248&partnerID=8YFLogxK

    U2 - 10.1109/TVLSI.2014.2386897

    DO - 10.1109/TVLSI.2014.2386897

    M3 - Article

    AN - SCOPUS:84959461430

    JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    SN - 1063-8210

    ER -