RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

05/24/2023
by   David Qiu, et al.
0

With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different training dynamics compared to sequence tasks. In this paper, we first benchmark the impact of popular techniques such as straight through estimator, pseudo-quantization noise, learnable scale parameter, clipping, etc. on 4-bit seq2seq models across a suite of speech recognition datasets ranging from 1,000 hours to 1 million hours, as well as one machine translation dataset to illustrate its applicability outside of speech. Through the experiments, we report that noise based QAT suffers when there is insufficient regularization signal flowing back to the quantization scale. We propose low complexity changes to the QAT process to improve model accuracy (outperforming popular learnable scale and clipping methods). With the improved accuracy, it opens up the possibility to exploit some of the other benefits of noise based QAT: 1) training a single model that performs well in mixed precision mode and 2) improved generalization on long form speech recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

Reducing the latency and model size has always been a significant resear...
research
06/30/2022

Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition

We present a novel sub-8-bit quantization-aware training (S8BQAT) scheme...
research
06/24/2022

QReg: On Regularization Effects of Quantization

In this paper we study the effects of quantization in DNN training. We h...
research
11/29/2021

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

State-of-the-art language models (LMs) represented by long-short term me...
research
12/20/2018

SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks

Deep neural networks have achieved state-of-the-art accuracies in a wide...
research
08/25/2023

A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

We present accumulator-aware quantization (A2Q), a novel weight quantiza...
research
02/08/2022

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

For binary neural networks (BNNs) to become the mainstream on-device com...

Please sign up or login with your details

Forgot password? Click here to reset