Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

03/31/2021
by   Sehoon Kim, et al.
0

End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks. However, these models perform poorly on edge hardware due to large memory and computation requirements. While quantizing model weights and/or activations to low-precision can be a promising solution, previous research on quantizing ASR models is limited. Most quantization approaches use floating-point arithmetic during inference; and thus they cannot fully exploit integer processing units, which use less power than their floating-point counterparts. Moreover, they require training/validation data during quantization for finetuning or calibration; however, this data may not be available due to security/privacy concerns. To address these limitations, we propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models. In particular, we generate synthetic data whose runtime statistics resemble the real data, and we use it to calibrate models during quantization. We then apply Q-ASR to quantize QuartzNet-15x5 and JasperDR-10x5 without any training data, and we show negligible WER change as compared to the full-precision baseline models. For INT8-only quantization, we observe a very modest WER degradation of up to 0.29 2.44x speedup on a T4 GPU. Furthermore, Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.

READ FULL TEXT

page 1

page 6

research
07/04/2022

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Vision Transformers (ViTs) have achieved state-of-the-art performance on...
research
03/29/2022

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant resear...
research
08/27/2021

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of...
research
07/13/2021

Zero-shot Speech Translation

Speech Translation (ST) is the task of translating speech in one languag...
research
10/17/2022

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

For on-device automatic speech recognition (ASR), quantization aware tra...
research
07/24/2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Recent advancement in Automatic Speech Recognition (ASR) has produced la...
research
07/19/2023

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

In the complex domain of large language models (LLMs), striking a balanc...

Please sign up or login with your details

Forgot password? Click here to reset