An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

06/19/2017
by   Makoto Morishita, et al.
0

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

READ FULL TEXT
research
05/01/2018

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Traditional Neural machine translation (NMT) involves a fixed training p...
research
05/05/2017

A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

In this work we compare different batch construction methods for mini-ba...
research
10/09/2020

Self-Paced Learning for Neural Machine Translation

Recent studies have proven that the training of neural machine translati...
research
10/28/2021

Cross-Batch Negative Sampling for Training Two-Tower Recommenders

The two-tower architecture has been widely applied for learning item and...
research
03/12/2018

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Stochastic neural net weights are used in a variety of contexts, includi...
research
08/26/2022

Universal Mini-Batch Consistency for Set Encoding Functions

Previous works have established solid foundations for neural set functio...
research
10/25/2021

Some like it tough: Improving model generalization via progressively increasing the training difficulty

In this work, we propose to progressively increase the training difficul...

Please sign up or login with your details

Forgot password? Click here to reset