Convolution neural networks (CNNs) have shown great success in many areas such as object detection and pattern recognition at the cost of extreme high computation complexity and significant external memory access, which makes state-of-the-art deep CNNs difficult to be implemented on resource-constrained portable/wearable devices with limited capacity of battery. To address this design challenge, a power-efficient CNN design through zero-gating processing elements (PEs) and partial-sum reuse centric dataflow is proposed in this paper. Unlike the existing works which either only consider the zeros in activation maps or use off-chip training process for on-chip computation reduction, a zero-gating PE design is proposed to avoid unnecessary on-chip computation by taking advantages of the large number of zeros in both the filter's weights of pre-trained models and the activation maps. Furthermore, a partial-sum reuse centric dataflow is also proposed for off-chip DRAM access reduction. The evaluation results show that the overall power consumption of PE arrays with our proposal can be reduced by 37% and 14% at the cost of 8% and 1% area overhead when compared to the baseline PE design and the existing only-activation-gated design (i.e. that in Eyeriss), respectively. Moreover, the proposed method can achieve 35% and 47% DRAM access reduction with the corresponding 14% and 49% energy savings for AlexNet and VGG-16 when compared to that in Eyeriss.
ASJC Scopus subject areas
- コンピュータ サイエンス（全般）