Applying Cyclical Learning Rate to Neural Machine Translation

04/06/2020
by   Choon Meng Lee, et al.
0

In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2019

A novel adaptive learning rate scheduler for deep neural networks

Optimizing deep neural networks is largely thought to be an empirical pr...
research
02/16/2021

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

Changes in neural architectures have fostered significant breakthroughs ...
research
05/27/2021

Training With Data Dependent Dynamic Learning Rates

Recently many first and second order variants of SGD have been proposed ...
research
04/01/2018

Training Tips for the Transformer Model

This article describes our experiments in neural machine translation usi...
research
02/06/2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Recent research has shown the existence of significant redundancy in lar...
research
09/22/2017

Improving Language Modelling with Noise-contrastive estimation

Neural language models do not scale well when the vocabulary is large. N...
research
11/18/2019

Feedback Control for Online Training of Neural Networks

Convolutional neural networks (CNNs) are commonly used for image classif...

Please sign up or login with your details

Forgot password? Click here to reset