DeepAI AI Chat
Log In Sign Up

LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

by   Yekyung Kim, et al.

Despite the success of mixup in data augmentation, its applicability to natural language processing (NLP) tasks has been limited due to the discrete and variable-length nature of natural languages. Recent studies have thus relied on domain-specific heuristics and manually crafted resources, such as dictionaries, in order to apply mixup in NLP. In this paper, we instead propose an unsupervised learning approach to text interpolation for the purpose of data augmentation, to which we refer as "Learning to INterpolate for Data Augmentation" (LINDA), that does not require any heuristics nor manually crafted resources but learns to interpolate between any pair of natural language sentences over a natural language manifold. After empirically demonstrating the LINDA's interpolation capability, we show that LINDA indeed allows us to seamlessly apply mixup in NLP and leads to better generalization in text classification both in-domain and out-of-domain.


page 1

page 2

page 3

page 4


Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitutio...

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

Mixup, a recent proposed data augmentation method through linearly inter...

Joint translation and unit conversion for end-to-end localization

A variety of natural language tasks require processing of textual data w...

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...

Weakly Supervised Domain Detection

In this paper we introduce domain detection as a new natural language pr...

Learning to SMILE(S)

This paper shows how one can directly apply natural language processing ...