Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers

11/29/2021
by   Junhao Xu, et al.
0

The high memory consumption and computational costs of Recurrent neural network language models (RNNLMs) limit their wider application on resource constrained devices. In recent years, neural network quantization techniques that are capable of producing extremely low-bit compression, for example, binarized RNNLMs, are gaining increasing research interests. Directly training of quantized neural networks is difficult. By formulating quantized RNNLMs training as an optimization problem, this paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM). This method can also flexibly adjust the trade-off between the compression rate and model performance using tied low-bit quantization tables. Experiments on two tasks: Penn Treebank (PTB), and Switchboard (SWBD) suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs. Faster convergence of 5 times in model training over the baseline binarized RNNLM quantization was also obtained. Index Terms: Language models, Recurrent neural networks, Quantization, Alternating direction methods of multipliers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2021

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

State-of-the-art language models (LMs) represented by long-short term me...
research
07/24/2017

Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM

Although deep learning models are highly effective for various learning ...
research
08/08/2023

Quantization Aware Factorization for Deep Neural Network Compression

Tensor decomposition of convolutional and fully-connected layers is an e...
research
02/01/2018

Alternating Multi-bit Quantization for Recurrent Neural Networks

Recurrent neural networks have achieved excellent performance in many ap...
research
06/20/2022

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

The recent advance of self-supervised learning associated with the Trans...
research
11/29/2021

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

State-of-the-art neural language models represented by Transformers are ...
research
06/23/2022

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus

State of the art time automatic speech recognition (ASR) systems are bec...

Please sign up or login with your details

Forgot password? Click here to reset