A flexible FPGA accelerator for convolutional neural networks

12/16/2019
by   Kingshuk Majumder, et al.
0

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to exploit all forms of reuse available to minimize off-chip memory access while increasing utilization of available resources. The proposed design is composed of cores, each of which contains a one-dimensional array of processing elements. These cores can exploit different types of reuse available in CNN layers of varying shapes without requiring any reconfiguration; in particular, our design minimizes underutilization due to problem sizes that are not perfect multiples of the underlying hardware array dimensions. A major obstacle in the adoption of FPGAs as a platform for CNN inference is the difficulty to program these devices using hardware description languages. Our end goal is to also address this, and we develop preliminary software support via a codesign in order to leverage the accelerator through TensorFlow, a dominant high-level programming model. Our framework takes care of tiling and scheduling of neural network layers and generates necessary low-level commands to execute the CNN. Experimental evaluation on a real system with a PCI-express based Xilinx VC709 board demonstrates the effectiveness of our approach. As a result of an effective interconnection, the design maintains a high frequency when we scale the number of PEs. The sustained performance overall is a good fraction of the accelerator's theoretical peak performance.

READ FULL TEXT

page 1

page 2

page 4

research
10/30/2018

MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks

The state-of-the-art accelerators for Convolutional Neural Networks (CNN...
research
10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
research
05/03/2020

Lupulus: A Flexible Hardware Accelerator for Neural Networks

Neural networks have become indispensable for a wide range of applicatio...
research
04/27/2020

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training

Modern deep learning models have high memory and computation cost. To ma...
research
05/09/2018

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

Many FPGAs vendors have recently included embedded processors in their d...
research
04/27/2020

A scalable and efficient convolutional neural network accelerator using HLS for a System on Chip design

This paper presents a configurable Convolutional Neural Network Accelera...
research
07/21/2022

Hardware-Efficient Template-Based Deep CNNs Accelerator Design

Acceleration of Convolutional Neural Network (CNN) on edge devices has r...

Please sign up or login with your details

Forgot password? Click here to reset