DeepAI AI Chat
Log In Sign Up

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

by   Kai Zhen, et al.

For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency. Among existing QAT methods, one major drawback is that the quantization centroids have to be predetermined and fixed. To overcome this limitation, we introduce a regularization-free, "soft-to-hard" compression mechanism with self-adjustable centroids in a mu-Law constrained space, resulting in a simpler yet more versatile quantization scheme, called General Quantizer (GQ). We apply GQ to ASR tasks using Recurrent Neural Network Transducer (RNN-T) and Conformer architectures on both LibriSpeech and de-identified far-field datasets. Without accuracy degradation, GQ can compress both RNN-T and Conformer into sub-8-bit, and for some RNN-T layers, to 1-bit for fast and accurate inference. We observe a 30.73 and 31.75 device benchmarking.


page 1

page 2

page 3

page 4


Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition

We present a novel sub-8-bit quantization-aware training (S8BQAT) scheme...

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of...

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant resear...

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...

Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

We present Bifocal RNN-T, a new variant of the Recurrent Neural Network ...

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (...