FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary F

03/28/2018
by   Yuechao Gao, et al.
0

It is a challenging task to deploy computationally and memory intensive State-of-the-art deep neural networks (DNNs) on embedded systems with limited hardware resources and power budgets. Recently developed techniques like Deep Compression make it possible to fit large DNNs, such as AlexNet and VGGNet, fully in on-chip SRAM. But sparse networks compressed using existing encoding formats, like CSR or CSC, complex the computation at runtime due to their irregular memory access characteristics. In [1], we introduce a computation dataflow, stacked filters stationary dataflow (SFS), and a corresponding data encoding format, relative indexed compressed sparse filter format (CSF), to make the best of data sparsity, and simplify data handling at execution time. In this paper we present FPGA implementations of these methods. We implement several compact streaming fully connected (FC) and Convolutional (CONV) neural network processors to show their efficiency. Comparing with the state-of-the-art results [2,3,4], our methods achieve at least 2x improvement for computation efficiency per PE on most layers. Especially, our methods achieve 8x improvement on AlexNet layer CONV4 with 384 filters, and 11x improvement on VGG16 layer CONV5-3 with 512 filters.

READ FULL TEXT
research
01/23/2018

Stacked Filters Stationary Flow For Hardware-Oriented Acceleration Of Deep Convolutional Neural Networks

To address memory and computation resource limitations for hardware-orie...
research
02/04/2016

EIE: Efficient Inference Engine on Compressed Deep Neural Network

State-of-the-art deep neural networks (DNNs) have hundreds of millions o...
research
02/04/2016

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Deep neural networks (DNNs) demand a very large amount of computation an...
research
12/17/2020

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

With the growing demand for deploying deep learning models to the "edge"...
research
12/03/2017

ALLSAT compressed with wildcards. Part 4: An invitation for C-programmers

The model set of a general Boolean function in CNF is calculated in a co...
research
01/17/2019

CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs

This paper proposes CodeX, an end-to-end framework that facilitates enco...
research
10/03/2017

Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming

Deep Neural Networks (DNNs) require very large amounts of computation bo...

Please sign up or login with your details

Forgot password? Click here to reset