R-Drop: Regularized Dropout for Neural Networks

06/28/2021
by   Xiaobo Liang, et al.
0

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English→German translation (30.91 BLEU) and WMT14 English→French translation (43.95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub<https://github.com/dropreg/R-Drop>.

READ FULL TEXT
research
04/11/2021

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Transformer architecture achieves great success in abundant natural lang...
research
07/27/2023

R-Block: Regularized Block of Dropout for convolutional networks

Dropout as a regularization technique is widely used in fully connected ...
research
06/10/2019

Improving Neural Language Modeling via Adversarial Training

Recently, substantial progress has been made in language modeling by usi...
research
01/05/2021

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Neural networks are often over-parameterized and hence benefit from aggr...
research
06/20/2023

Augmenting Sub-model to Improve Main Model

Image classification has improved with the development of training techn...
research
03/09/2023

Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts

Many real-world applications based on online learning produce streaming ...
research
05/03/2020

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

This paper introduces Dynamic Programming Encoding (DPE), a new segmenta...

Please sign up or login with your details

Forgot password? Click here to reset