Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

10/17/2022
by   Kai Zhen, et al.
0

For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency. Among existing QAT methods, one major drawback is that the quantization centroids have to be predetermined and fixed. To overcome this limitation, we introduce a regularization-free, "soft-to-hard" compression mechanism with self-adjustable centroids in a mu-Law constrained space, resulting in a simpler yet more versatile quantization scheme, called General Quantizer (GQ). We apply GQ to ASR tasks using Recurrent Neural Network Transducer (RNN-T) and Conformer architectures on both LibriSpeech and de-identified far-field datasets. Without accuracy degradation, GQ can compress both RNN-T and Conformer into sub-8-bit, and for some RNN-T layers, to 1-bit for fast and accurate inference. We observe a 30.73 and 31.75 device benchmarking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition

We present a novel sub-8-bit quantization-aware training (S8BQAT) scheme...
research
08/27/2021

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of...
research
03/29/2022

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant resear...
research
03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...
research
08/03/2021

Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

We present Bifocal RNN-T, a new variant of the Recurrent Neural Network ...
research
05/21/2018

DEEPEYE: A Compact and Accurate Video Comprehension at Terminal Devices Compressed with Quantization and Tensorization

As it requires a huge number of parameters when exposed to high dimensio...
research
01/02/2020

Attention based on-device streaming speech recognition with large speech corpus

In this paper, we present a new on-device automatic speech recognition (...

Please sign up or login with your details

Forgot password? Click here to reset