Fully Quantized Transformer for Improved Translation

by   Gabriele Prato, et al.

State-of-the-art neural machine translation methods employ massive amounts of parameters. Drastically reducing computational costs of such methods without affecting performance has been up to this point unsolved. In this work, we propose a quantization strategy tailored to the Transformer architecture. We evaluate our method on the WMT14 EN-FR and WMT14 EN-DE translation tasks and achieve state-of-the-art quantization results for the Transformer, obtaining no loss in BLEU scores compared to the non-quantized baseline. We further compress the Transformer by showing that, once the model is trained, a good portion of the nodes in the encoder can be removed without causing any loss in BLEU.


page 1

page 2

page 3

page 4


Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation

Transformer is being widely used in Neural Machine Translation (NMT). De...

Weighted Transformer Network for Machine Translation

State-of-the-art results on neural machine translation often use attenti...

Power Law Graph Transformer for Machine Translation and Representation Learning

We present the Power Law Graph Transformer, a transformer model with wel...

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

It has been found that residual networks are an Euler discretization of ...

An Effective Method using Phrase Mechanism in Neural Machine Translation

Machine Translation is one of the essential tasks in Natural Language Pr...

Parallel Attention Mechanisms in Neural Machine Translation

Recent papers in neural machine translation have proposed the strict use...

Learning Accurate Integer Transformer Machine-Translation Models

We describe a method for training accurate Transformer machine-translati...

Please sign up or login with your details

Forgot password? Click here to reset