Neural Machine Translation with 4-Bit Precision and Beyond

09/13/2019
by   Alham Fikri Aji, et al.
0

Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress fit NMT models better for devices with limited hardware capability. We use logarithmic quantization, instead of the more commonly used fixed-point quantization, based on the empirical fact that parameters distribution is not uniform. We find that biases do not take a lot of memory and show that biases can be left uncompressed to improve the overall quality without affecting the compression rate. We also propose to use an error-feedback mechanism during retraining, to preserve the compressed model as a stale gradient. We empirically show that NMT models based on Transformer or RNN architecture can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. RNN architecture seems to be more robust towards compression, compared to the Transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2020

Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation

Transformer is being widely used in Neural Machine Translation (NMT). De...
research
06/01/2021

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

Is bias amplified when neural machine translation (NMT) models are optim...
research
08/11/2020

The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

We present Sockeye 2, a modernized and streamlined version of the Sockey...
research
04/13/2018

Pieces of Eight: 8-bit Neural Machine Translation

Neural machine translation has achieved levels of fluency and adequacy t...
research
06/29/2016

Compression of Neural Machine Translation Models via Pruning

Neural Machine Translation (NMT), like many other deep learning domains,...
research
05/25/2020

The Unreasonable Volatility of Neural Machine Translation Models

Recent works have shown that Neural Machine Translation (NMT) models ach...
research
02/09/2023

Binarized Neural Machine Translation

The rapid scaling of language models is motivating research using low-bi...

Please sign up or login with your details

Forgot password? Click here to reset