Unsupervised Data Augmentation

04/29/2019
by   Qizhe Xie, et al.
18

Despite its success, deep learning still needs large labeled datasets to succeed. Data augmentation has shown much promise in alleviating the need for more labeled data, but it so far has mostly been applied in supervised settings and achieved limited gains. In this work, we propose to apply data augmentation to unlabeled data in a semi-supervised learning setting. Our method, named Unsupervised Data Augmentation or UDA, encourages the model predictions to be consistent between an unlabeled example and an augmented unlabeled example. Unlike previous methods that use random noise such as Gaussian noise or dropout noise, UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods. This small twist leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small. For example, on the IMDb text classification dataset, with only 20 labeled examples, UDA outperforms the state-of-the-art model trained on 25,000 labeled examples. On standard semi-supervised learning benchmarks, CIFAR-10 with 4,000 examples and SVHN with 1,000 examples, UDA outperforms all previous approaches and reduces more than 30% of the error rates of state-of-the-art methods: going from 7.66 5.27 that have a lot of labeled data. For example, on ImageNet, with 1.3M extra unlabeled data, UDA improves the top-1/top-5 accuracy from 78.28/94.36 79.04/94.45

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2020

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

This paper presents MixText, a semi-supervised learning method for text ...
research
09/16/2022

Confidence-Guided Data Augmentation for Deep Semi-Supervised Training

We propose a new data augmentation technique for semi-supervised learnin...
research
10/22/2020

Unsupervised Data Augmentation with Naive Augmentation and without Unlabeled Data

Unsupervised Data Augmentation (UDA) is a semi-supervised technique that...
research
03/28/2020

Gradient-based Data Augmentation for Semi-Supervised Learning

In semi-supervised learning (SSL), a technique called consistency regula...
research
02/26/2020

A Comprehensive Approach to Unsupervised Embedding Learning based on AND Algorithm

Unsupervised embedding learning aims to extract good representation from...
research
07/16/2020

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Recent state-of-the-art semi-supervised learning (SSL) methods use a com...
research
09/09/2021

SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks

The application of self-supervised methods has resulted in broad improve...

Please sign up or login with your details

Forgot password? Click here to reset