Toward Understanding Generative Data Augmentation

05/27/2023
by   Chenyu Zheng, et al.
0

Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot learning, and adversarially robust learning. However, little work has theoretically investigated the effect of generative data augmentation. To fill this gap, we establish a general stability bound in this not independently and identically distributed (non-i.i.d.) setting, where the learned distribution is dependent on the original train set and generally not the same as the true distribution. Our theoretical result includes the divergence between the learned distribution and the true distribution. It shows that generative data augmentation can enjoy a faster learning rate when the order of divergence term is o(max( log(m)β_m, 1 / √(m))), where m is the train set size and β_m is the corresponding stability constant. We further specify the learning setup to the Gaussian mixture model and generative adversarial nets. We prove that in both cases, though generative data augmentation does not enjoy a faster learning rate, it can improve the learning guarantees at a constant level when the train set is small, which is significant when the awful overfitting occurs. Simulation results on the Gaussian mixture model and empirical results on generative adversarial nets support our theoretical conclusions. Our code is available at https://github.com/ML-GSAI/Understanding-GDA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2017

Data Augmentation Generative Adversarial Networks

Effective training of neural networks requires much data. In the low-dat...
research
08/29/2018

DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification

Deep learning has revolutionized the performance of classification, but ...
research
04/08/2020

Mixture Density Conditional Generative Adversarial Network Models (MD-CGAN)

Generative Adversarial Networks (GANs) have gained significant attention...
research
03/27/2023

Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Despite consistent advancement in powerful deep learning techniques in r...
research
11/02/2022

Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Data augmentation is a critical contributing factor to the success of de...
research
05/20/2022

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Data augmentation plays a key role in modern machine learning pipelines....
research
08/21/2022

A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective

We propose the first unified theoretical analysis of mixed sample data a...

Please sign up or login with your details

Forgot password? Click here to reset