DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

12/01/2018
by   Akanksha Baranwal, et al.
0

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in varied fields. Due to their increasingly deep and computationally heavy structures, it is difficult to deploy them on energy constrained mobile applications. Hardware accelerators such as FPGAs have come up as an attractive alternative. However, with the limited on-chip memory and computation resources of FPGA, meeting the high memory throughput requirement and exploiting the parallelism of CNNs is a major challenge. We propose a high-performance FPGA based architecture - Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator - DeCoILFNet which exploits the intra-layer parallelism of CNNs by flattening across depth and combines it with a highly pipelined data flow across the layers enabling inter-layer fusion. This architecture significantly reduces off-chip memory accesses and maximizes the throughput. Compared to a 3.5GHz hexa-core Intel Xeon E7 caffe-implementation, our 120MHz FPGA accelerator is 30X faster. In addition, our design reduces external memory access by 11.5X along with a speedup of more than 2X in the number of clock cycles compared to state-of-the-art FPGA accelerators.

READ FULL TEXT
research
11/14/2020

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerator...
research
07/21/2019

Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

Real-time Deep Neural Network (DNN) inference with low-latency requireme...
research
06/08/2023

Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

Neural Network designs are quite diverse, from VGG-style to ResNet-style...
research
08/02/2021

Accelerating Markov Random Field Inference with Uncertainty Quantification

Statistical machine learning has widespread application in various domai...
research
06/15/2021

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Residual block is a very common component in recent state-of-the art CNN...
research
03/22/2017

CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

Large-scale deep convolutional neural networks (CNNs) are widely used in...
research
09/03/2020

Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors

Modern multicore systems are migrating from homogeneous systems to heter...

Please sign up or login with your details

Forgot password? Click here to reset