LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

11/04/2019
by   Reza Yazdani, et al.
76

The effectiveness of LSTM neural networks for popular tasks such as Automatic Speech Recognition has fostered an increasing interest in LSTM inference acceleration. Due to the recurrent nature and data dependencies of LSTM computations, designing a customized architecture specifically tailored to its computation pattern is crucial for efficiency. Since LSTMs are used for a variety of tasks, generalizing this efficiency to diverse configurations, i.e., adaptiveness, is another key feature of these accelerators. In this work, we first show the problem of low resource-utilization and adaptiveness for the state-of-the-art LSTM implementations on GPU, FPGA and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism that efficiently handles the data dependencies and increases the adaptiveness of LSTM computation. To do so, we propose LSTM-Sharp as a hardware accelerator, which pipelines LSTM computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, LSTM-Sharp employs dynamic reconfigurable architecture to adapt to the model's characteristics. LSTM-Sharp achieves 1.5x, 2.86x, and 82x speedups on average over the state-of-the-art ASIC, FPGA, and GPU implementations respectively, for different LSTM models and resource budgets. Furthermore, we provide significant energy-reduction with respect to the previous solutions, due to the low power dissipation of LSTM-Sharp (383 GFLOPs/Watt).

READ FULL TEXT

page 1

page 4

research
01/07/2021

BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification

In this paper, first, a hardware-friendly pruning algorithm for reducing...
research
12/01/2016

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Long Short-Term Memory (LSTM) is widely used in speech recognition. In o...
research
11/20/2017

E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a key technology for emerging appli...
research
07/11/2018

FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs

It is well known that many types of artificial neural networks, includin...
research
11/07/2019

Boosting LSTM Performance Through Dynamic Precision Selection

The use of low numerical precision is a fundamental optimization include...
research
02/14/2022

Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn...
research
08/04/2021

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-temporal Sparsity

Long Short-Term Memory (LSTM) recurrent networks are frequently used for...

Please sign up or login with your details

Forgot password? Click here to reset