Revisiting Checkpoint Averaging for Neural Machine Translation

10/21/2022
by   Yingbo Gao, et al.
0

Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several checkpoints, the selection of which is mostly based on empirical recipes without many justifications. In this work, we revisit the concept of checkpoint averaging and consider several extensions. Specifically, we experiment with ideas such as using different checkpoint selection strategies, calculating weighted average instead of simple mean, making use of gradient information and fine-tuning the interpolation weights on development data. Our results confirm the necessity of applying checkpoint averaging for optimal performance, but also suggest that the landscape between the converged checkpoints is rather flat and not much further improvement compared to simple averaging is to be obtained.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2019

A Baseline Neural Machine Translation System for Indian Languages

We present a simple, yet effective, Neural Machine Translation system fo...
research
09/16/2021

Improving Neural Machine Translation by Bidirectional Training

We present a simple and effective pretraining strategy – bidirectional t...
research
08/29/2017

Neural Machine Translation Training in a Multi-Domain Scenario

In this paper, we explore alternative ways to train a neural machine tra...
research
01/19/2022

Improving Neural Machine Translation by Denoising Training

We present a simple and effective pretraining strategy Denoising Trainin...
research
06/02/2019

Domain Adaptive Inference for Neural Machine Translation

We investigate adaptive ensemble weighting for Neural Machine Translatio...
research
09/01/2021

Masked Adversarial Generation for Neural Machine Translation

Attacking Neural Machine Translation models is an inherently combinatori...
research
11/07/2019

Can Neural Networks Learn Symbolic Rewriting?

This work investigates if the current neural architectures are adequate ...

Please sign up or login with your details

Forgot password? Click here to reset