Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

03/28/2018
by   Caiwen Ding, et al.
0

Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n^2) to O(n n) and storage complexity from O(n^2) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline accelerator. For ASIC implementations, the SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.

READ FULL TEXT

page 3

page 4

research
02/18/2018

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Hardware accelerations of deep learning systems have been extensively in...
research
03/14/2018

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs

Recently, significant accuracy improvement has been achieved for acousti...
research
06/22/2023

To Spike or Not to Spike? A Quantitative Comparison of SNN and CNN FPGA Implementations

Convolutional Neural Networks (CNNs) are widely employed to solve variou...
research
12/12/2018

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

Recurrent Neural Networks (RNNs) are becoming increasingly important for...
research
01/07/2021

BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification

In this paper, first, a hardware-friendly pruning algorithm for reducing...
research
04/24/2023

Design optimization for high-performance computing using FPGA

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs)...
research
07/13/2018

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Overlays have shown significant promise for field-programmable gate-arra...

Please sign up or login with your details

Forgot password? Click here to reset