DeepAI AI Chat
Log In Sign Up

LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

12/28/2021
by   Yekyung Kim, et al.
11

Despite the success of mixup in data augmentation, its applicability to natural language processing (NLP) tasks has been limited due to the discrete and variable-length nature of natural languages. Recent studies have thus relied on domain-specific heuristics and manually crafted resources, such as dictionaries, in order to apply mixup in NLP. In this paper, we instead propose an unsupervised learning approach to text interpolation for the purpose of data augmentation, to which we refer as "Learning to INterpolate for Data Augmentation" (LINDA), that does not require any heuristics nor manually crafted resources but learns to interpolate between any pair of natural language sentences over a natural language manifold. After empirically demonstrating the LINDA's interpolation capability, we show that LINDA indeed allows us to seamlessly apply mixup in NLP and leads to better generalization in text classification both in-domain and out-of-domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/02/2021

Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitutio...
05/22/2019

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

Mixup, a recent proposed data augmentation method through linearly inter...
04/10/2020

Joint translation and unit conversion for end-to-end localization

A variety of natural language tasks require processing of textual data w...
09/27/2019

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...
07/26/2019

Weakly Supervised Domain Detection

In this paper we introduce domain detection as a new natural language pr...
02/19/2016

Learning to SMILE(S)

This paper shows how one can directly apply natural language processing ...