CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

03/22/2017
by   Xushen Han, et al.
0

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55 than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.

READ FULL TEXT
research
05/25/2018

f-CNN^x: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an...
research
12/01/2018

DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in v...
research
08/21/2021

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Convolutional Neural Networks (CNNs) are widely used in deep learning ap...
research
03/04/2017

Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks

Deep convolutional neural networks (CNN) have shown their good performan...
research
02/23/2022

Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

Chiplets have become a common methodology in modern chip design. Chiplet...
research
05/31/2023

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

Surveillance systems, autonomous vehicles, human monitoring systems, and...
research
03/09/2021

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

Single computation engines have become a popular design choice for FPGA-...

Please sign up or login with your details

Forgot password? Click here to reset