CNN-MERP

An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.

Original languageEnglish
Title of host publicationProceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages320-327
Number of pages8
ISBN (Electronic)9781509051427
DOIs
Publication statusPublished - 2016 Nov 22
Event34th IEEE International Conference on Computer Design, ICCD 2016 - Scottsdale, United States
Duration: 2016 Oct 22016 Oct 5

Other

Other34th IEEE International Conference on Computer Design, ICCD 2016
CountryUnited States
CityScottsdale
Period16/10/216/10/5

Fingerprint

Field programmable gate arrays (FPGA)
Neural networks
Data storage equipment
Bandwidth
Dynamic random access storage
Application specific integrated circuits
Learning systems
Throughput

Keywords

  • backward propagation
  • convolutional neural networks
  • FPGA
  • memory bandwidth
  • reconfigurable processor

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Han, X., Zhou, D., Wang, S., & Kimura, S. (2016). CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016 (pp. 320-327). [7753296] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCD.2016.7753296

CNN-MERP : An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. / Han, Xushen; Zhou, Dajiang; Wang, Shihao; Kimura, Shinji.

Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 320-327 7753296.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Han, X, Zhou, D, Wang, S & Kimura, S 2016, CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. in Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016., 7753296, Institute of Electrical and Electronics Engineers Inc., pp. 320-327, 34th IEEE International Conference on Computer Design, ICCD 2016, Scottsdale, United States, 16/10/2. https://doi.org/10.1109/ICCD.2016.7753296
Han X, Zhou D, Wang S, Kimura S. CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 320-327. 7753296 https://doi.org/10.1109/ICCD.2016.7753296
Han, Xushen ; Zhou, Dajiang ; Wang, Shihao ; Kimura, Shinji. / CNN-MERP : An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 320-327
@inproceedings{81d82876f555448dbeeaa599c27d2920,
title = "CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks",
abstract = "Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55{\%} lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.",
keywords = "backward propagation, convolutional neural networks, FPGA, memory bandwidth, reconfigurable processor",
author = "Xushen Han and Dajiang Zhou and Shihao Wang and Shinji Kimura",
year = "2016",
month = "11",
day = "22",
doi = "10.1109/ICCD.2016.7753296",
language = "English",
pages = "320--327",
booktitle = "Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - CNN-MERP

T2 - An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

AU - Han, Xushen

AU - Zhou, Dajiang

AU - Wang, Shihao

AU - Kimura, Shinji

PY - 2016/11/22

Y1 - 2016/11/22

N2 - Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.

AB - Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.

KW - backward propagation

KW - convolutional neural networks

KW - FPGA

KW - memory bandwidth

KW - reconfigurable processor

UR - http://www.scopus.com/inward/record.url?scp=85006705647&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006705647&partnerID=8YFLogxK

U2 - 10.1109/ICCD.2016.7753296

DO - 10.1109/ICCD.2016.7753296

M3 - Conference contribution

SP - 320

EP - 327

BT - Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -