Binarized Neural Machine Translation

02/09/2023
by   Yichi Zhang, et al.
0

The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

Advancing Text-to-GLOSS Neural Translation Using a Novel Hyper-parameter Optimization Technique

In this paper, we investigate the use of transformers for Neural Machine...
research
06/21/2023

Training Transformers with 4-bit Integers

Quantizing the activation, weight, and gradient to 4-bit is promising to...
research
04/13/2018

Pieces of Eight: 8-bit Neural Machine Translation

Neural machine translation has achieved levels of fluency and adequacy t...
research
06/04/2021

Scalable Transformers for Neural Machine Translation

Transformer has been widely adopted in Neural Machine Translation (NMT) ...
research
11/23/2022

TorchScale: Transformers at Scale

Large Transformers have achieved state-of-the-art performance across man...
research
09/13/2019

Neural Machine Translation with 4-Bit Precision and Beyond

Neural Machine Translation (NMT) is resource intensive. We design a quan...
research
07/07/2023

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

The recent rise of large language models (LLMs) has resulted in increase...

Please sign up or login with your details

Forgot password? Click here to reset