Pieces of Eight: 8-bit Neural Machine Translation

04/13/2018
by   Jerry Quinn, et al.
0

Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with 8-bit quantization for models that have been trained using 32-bit floating point values. Results show that 8-bit translation makes a non-negligible impact in terms of speed with no degradation in accuracy and adequacy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2017

Does Neural Machine Translation Benefit from Larger Context?

We propose a neural machine translation architecture that models the sur...
research
08/11/2020

The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

We present Sockeye 2, a modernized and streamlined version of the Sockey...
research
09/13/2019

Neural Machine Translation with 4-Bit Precision and Beyond

Neural Machine Translation (NMT) is resource intensive. We design a quan...
research
02/09/2023

Binarized Neural Machine Translation

The rapid scaling of language models is motivating research using low-bi...
research
06/03/2019

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model

In this work, we quantize a trained Transformer machine language transla...
research
09/16/2021

Improving Neural Machine Translation by Bidirectional Training

We present a simple and effective pretraining strategy – bidirectional t...
research
09/11/2023

Compressed Real Numbers for AI: a case-study using a RISC-V CPU

As recently demonstrated, Deep Neural Networks (DNN), usually trained us...

Please sign up or login with your details

Forgot password? Click here to reset