UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

04/11/2021
by   Zhen Wu, et al.
0

Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with a careful design. Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models. Specifically, we propose an approach named UniDrop to unites three different dropout techniques from fine-grain to coarse-grain, i.e., feature dropout, structure dropout, and data dropout. Theoretically, we demonstrate that these three dropouts play different roles from regularization perspectives. Empirically, we conduct experiments on both neural machine translation and text classification benchmark datasets. Extensive results indicate that Transformer with UniDrop can achieve around 1.5 BLEU improvement on IWSLT14 translation tasks, and better accuracy for the classification even using strong pre-trained RoBERTa as backbone.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2021

R-Drop: Regularized Dropout for Neural Networks

Dropout is a powerful and widely used technique to regularize the traini...
research
04/28/2020

Scheduled DropHead: A Regularization Method for Transformer Models

In this paper, we introduce DropHead, a structured dropout method specif...
research
02/18/2013

Maxout Networks

We consider the problem of designing models to leverage a recently intro...
research
10/05/2022

Revisiting Structured Dropout

Large neural networks are often overparameterised and prone to overfitti...
research
09/25/2019

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art re...
research
07/09/2021

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-super...
research
10/29/2019

BPE-Dropout: Simple and Effective Subword Regularization

Subword segmentation is widely used to address the open vocabulary probl...

Please sign up or login with your details

Forgot password? Click here to reset