C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs

03/14/2018
by   Shuo Wang, et al.
0

Recently, significant accuracy improvement has been achieved for acoustic recognition systems by increasing the model size of Long Short-Term Memory (LSTM) networks. Unfortunately, the ever-increasing size of LSTM model leads to inefficient designs on FPGAs due to the limited on-chip resources. The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs. However, the random nature of the pruning technique transforms the dense matrices of the model to highly unstructured sparse ones, which leads to unbalanced computation and irregular memory accesses and thus hurts the overall performance and energy efficiency. In contrast, we propose to use a structured compression technique which could not only reduce the LSTM model size but also eliminate the irregularities of computation and memory accesses. This approach employs block-circulant instead of sparse matrices to compress weight matrices and reduces the storage requirement from O(k^2) to O(k). Fast Fourier Transform algorithm is utilized to further accelerate the inference by reducing the computational complexity from O(k^2) to O(klogk). The datapath and activation functions are quantized as 16-bit to improve the resource utilization. More importantly, we propose a comprehensive framework called C-LSTM to automatically optimize and implement a wide range of LSTM variants on FPGAs. According to the experimental results, C-LSTM achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art LSTM implementation under the same experimental setup, and the accuracy degradation is very small.

READ FULL TEXT
research
03/28/2018

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

Both industry and academia have extensively investigated hardware accele...
research
08/29/2017

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

Large-scale deep neural networks (DNNs) are both compute and memory inte...
research
03/20/2018

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

Recurrent Neural Networks (RNNs) are becoming increasingly important for...
research
01/26/2019

Intrinsically Sparse Long Short-Term Memory Networks

Long Short-Term Memory (LSTM) has achieved state-of-the-art performances...
research
12/01/2016

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Long Short-Term Memory (LSTM) is widely used in speech recognition. In o...
research
06/18/2018

Semi-tied Units for Efficient Gating in LSTM and Highway Networks

Gating is a key technique used for integrating information from multiple...
research
04/23/2020

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

Deep neural network (DNN) has emerged as the most important and popular ...

Please sign up or login with your details

Forgot password? Click here to reset