ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

12/01/2016
by   Song Han, et al.
0

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of a data center. In order to speedup the prediction and make it energy efficient, we first propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. The pruned model is friendly for parallel processing. Next, we propose scheduler that encodes and partitions the compressed model to each PE for parallelism, and schedule the complicated LSTM data flow. Finally, we design the hardware architecture, named Efficient Speech Recognition Engine (ESE) that works directly on the compressed model. Implemented on Xilinx XCKU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working directly on the compressed LSTM network, corresponding to 2.52 TOPS on the uncompressed one, and processes a full LSTM for speech recognition with a power dissipation of 41 Watts. Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations. It achieves 40x and 11.5x higher energy efficiency compared with the CPU and GPU respectively.

READ FULL TEXT

page 1

page 6

page 8

research
11/04/2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

The effectiveness of LSTM neural networks for popular tasks such as Auto...
research
12/05/2022

Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition

Long short-term memory (LSTM) is a type of powerful deep neural network ...
research
01/07/2021

BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification

In this paper, first, a hardware-friendly pruning algorithm for reducing...
research
02/04/2016

EIE: Efficient Inference Engine on Compressed Deep Neural Network

State-of-the-art deep neural networks (DNNs) have hundreds of millions o...
research
09/26/2019

Optimizing Speech Recognition For The Edge

While most deployed speech recognition systems today still run on server...
research
03/14/2018

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs

Recently, significant accuracy improvement has been achieved for acousti...
research
08/03/2018

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware

Modern deep learning systems rely on (a) a hand-tuned neural network top...

Please sign up or login with your details

Forgot password? Click here to reset