Competence-based Curriculum Learning for Neural Machine Translation

Current state-of-the-art NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70 decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models

The competitive performance of neural machine translation (NMT) critical...
research
02/28/2019

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

We consider the problem of making efficient use of heterogeneous trainin...
research
11/30/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Large amounts of data has made neural machine translation (NMT) a big su...
research
02/17/2020

Subset Sampling For Progressive Neural Network Learning

Progressive Neural Network Learning is a class of algorithms that increm...
research
02/06/2021

Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Recent advancements in data-to-text generation largely take on the form ...
research
05/18/2022

LeRaC: Learning Rate Curriculum

Most curriculum learning methods require an approach to sort the data sa...
research
08/14/2023

SOTASTREAM: A Streaming Approach to Machine Translation Training

Many machine translation toolkits make use of a data preparation step wh...

Please sign up or login with your details

Forgot password? Click here to reset