4-bit Quantization of LSTM-based Speech Recognition Models

08/27/2021
by   Andrea Fasoli, et al.
0

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a naïve quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation. DBLSTM-HMMs trained with 300 or 2000 hours of SWB data achieves <0.5 respectively. On the more challenging RNN-T models, our quantization strategy limits degradation in 4-bit inference to 1.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2022

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

For on-device automatic speech recognition (ASR), quantization aware tra...
research
03/29/2022

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant resear...
research
06/16/2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

We report on aggressive quantization strategies that greatly accelerate ...
research
11/30/2016

Effective Quantization Methods for Recurrent Neural Networks

Reducing bit-widths of weights, activations, and gradients of a Neural N...
research
03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...
research
05/12/2023

Accelerator-Aware Training for Transducer-Based Speech Recognition

Machine learning model weights and activations are represented in full-p...
research
09/20/2021

iRNN: Integer-only Recurrent Neural Network

Recurrent neural networks (RNN) are used in many real-world text and spe...

Please sign up or login with your details

Forgot password? Click here to reset