A Kernel Theory of Modern Data Augmentation

03/16/2018
∙
by   Tri Dao, et al.
∙
0
∙

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding modern data augmentation techniques. We start by showing that for kernel classifiers, data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. We connect this general approximation framework to prior work in invariant kernels, tangent propagation, and robust optimization. Next, we explicitly tackle the compositional aspect of modern data augmentation techniques, proposing a novel model of data augmentation as a Markov process. Under this model, we show that performing k-nearest neighbors with data augmentation is asymptotically equivalent to a kernel classifier. Finally, we illustrate ways in which our theoretical framework can be leveraged to accelerate machine learning workflows in practice, including reducing the amount of computation needed to train on augmented data, and predicting the utility of a transformation prior to training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 06/25/2020

Learning Data Augmentation with Online Bilevel Optimization for Image Classification

Data augmentation is a key practice in machine learning for improving ge...
research
∙ 04/01/2021

GABO: Graph Augmentations with Bi-level Optimization

Data augmentation refers to a wide range of techniques for improving mod...
research
∙ 06/08/2021

What Data Augmentation Do We Need for Deep-Learning-Based Finance?

The main task we consider is portfolio construction in a speculative mar...
research
∙ 06/16/2023

SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

Data augmentation is known to improve the generalization capabilities of...
research
∙ 03/13/2022

On Data Augmentation in Point Process Models Based on Thinning

Many models for point process data are defined through a thinning proced...
research
∙ 05/20/2022

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Data augmentation plays a key role in modern machine learning pipelines....
research
∙ 02/15/2019

Asymptotically exact data augmentation: models, properties and algorithms

Data augmentation, by the introduction of auxiliary variables, has becom...

Please sign up or login with your details

Forgot password? Click here to reset