DivAug: Plug-in Automated Data Augmentation with Explicit Diversity Maximization

03/26/2021
by   Zirui Liu, et al.
0

Human-designed data augmentation strategies have been replaced by automatically learned augmentation policy in the past two years. Specifically, recent work has empirically shown that the superior performance of the automated data augmentation methods stems from increasing the diversity of augmented data. However, two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects. To bridge this gap, we propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity. We validate in experiments that the relative gain from automated data augmentation in test accuracy is highly correlated to Variance Diversity. An unsupervised sampling-based framework, DivAug, is designed to directly maximize Variance Diversity and hence strengthen the regularization effect. Without requiring a separate search process, the performance gain from DivAug is comparable with the state-of-the-art method with better efficiency. Moreover, under the semi-supervised setting, our framework can further improve the performance of semi-supervised learning algorithms when compared to RandAugment, making it highly applicable to real-world problems, where labeled data is scarce.

READ FULL TEXT
research
03/26/2022

Metropolis-Hastings Data Augmentation for Graph Neural Networks

Graph Neural Networks (GNNs) often suffer from weak-generalization due t...
research
03/28/2020

Gradient-based Data Augmentation for Semi-Supervised Learning

In semi-supervised learning (SSL), a technique called consistency regula...
research
11/05/2021

Increasing Data Diversity with Iterative Sampling to Improve Performance

As a part of the Data-Centric AI Competition, we propose a data-centric ...
research
08/03/2022

Augmentation Learning for Semi-Supervised Classification

Recently, a number of new Semi-Supervised Learning methods have emerged....
research
03/02/2023

Evolutionary Augmentation Policy Optimization for Self-supervised Learning

Self-supervised learning (SSL) is a Machine Learning algorithm for pretr...
research
02/20/2020

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Though data augmentation has become a standard component of deep neural ...
research
06/11/2020

GANgster: A Fraud Review Detector based on Regulated GAN with Data Augmentation

Financial implications of written reviews provide great incentives for b...

Please sign up or login with your details

Forgot password? Click here to reset