Log In Sign Up

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

by   Wancong Zhang, et al.

MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding (NLU) domain has been limited, due to the difficulty of interpolating text directly in the input space. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer architecture, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50


page 1

page 2

page 3

page 4


Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks

Mixup is the latest data augmentation technique that linearly interpolat...

SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness

Models that perform well on a training domain often fail to generalize t...

Data Augmentation for Deep Transfer Learning

Current approaches to deep learning are beginning to rely heavily on tra...

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Data augmentation has been demonstrated as an effective strategy for imp...

Word Embedding Perturbation for Sentence Classification

In this technique report, we aim to mitigate the overfitting problem of ...

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

A well-calibrated neural model produces confidence (probability outputs)...

HaT5: Hate Language Identification using Text-to-Text Transfer Transformer

We investigate the performance of a state-of-the art (SoTA) architecture...