CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

08/29/2017
by   Caiwen Ding, et al.
0

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning; 2) the increased training complexity; and 3) the lack of rigorous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(nlogn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: it can converge to the same effectiveness as DNNs without compression. The CirCNN architecture, a universal DNN inference engine that can be implemented on various hardware/software platforms with configurable network architecture. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6-102X energy efficiency improvements compared with the best state-of-the-art results.

READ FULL TEXT

page 4

page 8

research
03/20/2018

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

Recurrent Neural Networks (RNNs) are becoming increasingly important for...
research
12/13/2017

FFT-Based Deep Learning Deployment in Embedded Systems

Deep learning has delivered its powerfulness in many application domains...
research
04/23/2020

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

Deep neural network (DNN) has emerged as the most important and popular ...
research
05/30/2018

MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression

Deep neural networks (DNNs) have become the state-of-the-art technique f...
research
07/31/2019

Tuning Algorithms and Generators for Efficient Edge Inference

A surge in artificial intelligence and autonomous technologies have incr...
research
06/01/2017

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

Accelerating the inference of a trained DNN is a well studied subject. I...
research
03/14/2018

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs

Recently, significant accuracy improvement has been achieved for acousti...

Please sign up or login with your details

Forgot password? Click here to reset