CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

10/16/2020
by   Yanru Qu, et al.
0

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends to be more challenging. In this paper, we propose a novel data augmentation framework dubbed CoDA, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically. Moreover, a contrastive regularization objective is introduced to capture the global relationship among all the data samples. A momentum encoder along with a memory bank is further leveraged to better estimate the contrastive loss. To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks. On the GLUE benchmark, CoDA gives rise to an average improvement of 2.2 it consistently exhibits stronger results relative to several competitive data augmentation and adversarial training base-lines (including the low-resource settings). Extensive experiments show that the proposed contrastive objective can be flexibly combined with various data augmentation approaches to further boost their performance, highlighting the wide applicability of the CoDA framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation

Adversarial training has been shown effective at endowing the learned re...
research
12/12/2022

RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

This paper presents a new data augmentation algorithm for natural unders...
research
05/31/2022

A Multi-level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference

Natural Language Inference (NLI) is a growingly essential task in natura...
research
05/12/2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many ...
research
02/22/2021

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

MixUp is a computer vision data augmentation technique that uses convex ...
research
03/09/2020

Improved Baselines with Momentum Contrastive Learning

Contrastive unsupervised learning has recently shown encouraging progres...
research
03/01/2022

MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning

Logical reasoning is of vital importance to natural language understandi...

Please sign up or login with your details

Forgot password? Click here to reset