Scaling Neural Machine Translation

06/01/2018
by   Myle Ott, et al.
0

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT'14 English-German translation, we match the accuracy of (Vaswani et al 2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU after training for 91 minutes on 128 GPUs. We further improve these results to 29.8 BLEU by training on the much larger Paracrawl dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2017

Weighted Transformer Network for Machine Translation

State-of-the-art results on neural machine translation often use attenti...
research
04/01/2018

Training Tips for the Transformer Model

This article describes our experiments in neural machine translation usi...
research
08/11/2023

Optimizing transformer-based machine translation model for single GPU training: a hyperparameter ablation study

In machine translation tasks, the relationship between model complexity ...
research
11/19/2015

Multi-task Sequence to Sequence Learning

Sequence to sequence learning has recently emerged as a new paradigm in ...
research
06/02/2018

Fast Locality Sensitive Hashing for Beam Search on GPU

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to spe...
research
05/12/2020

Porting and optimizing UniFrac for GPUs

UniFrac is a commonly used metric in microbiome research for comparing m...
research
07/17/2014

Efficient On-the-fly Category Retrieval using ConvNets and GPUs

We investigate the gains in precision and speed, that can be obtained by...

Please sign up or login with your details

Forgot password? Click here to reset