-
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
The effectiveness of LSTM neural networks for popular tasks such as Auto...
read it
-
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Long Short-Term Memory (LSTM) is widely used in speech recognition. In o...
read it
-
Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs
Both industry and academia have extensively investigated hardware accele...
read it
-
Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM Neural Networks
In cloud and edge computing models, it is important that compute devices...
read it
-
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
The attention mechanism is becoming increasingly popular in Natural Lang...
read it
-
The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA
Recently, FPGA has been increasingly applied to problems such as speech ...
read it
-
Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM
Many long short-term memory (LSTM) applications need fast yet compact mo...
read it
BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification
In this paper, first, a hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented. Next, an FPGA-based platform for efficient execution of the pruned networks based on the proposed algorithm is introduced. By considering the sensitivity of two weight matrices of the LSTM models in pruning, different sparsity ratios (i.e., dual-ratio sparsity) are applied to these weight matrices. To reduce memory accesses, a row-wise sparsity pattern is adopted. The proposed hardware architecture makes use of computation overlapping and pipelining to achieve low-power and high-speed. The effectiveness of the proposed pruning algorithm and accelerator is assessed under some benchmarks for natural language processing, binary sentiment classification, and speech recognition. Results show that, e.g., compared to a recently published work in this field, the proposed accelerator could provide up to 272 1.4
READ FULL TEXT
Comments
There are no comments yet.