A Theory of PAC Learnability under Transformation Invariances

02/15/2022
by   Han Shao, et al.
1

Transformation invariances are present in many real-world problems. For example, image classification is usually invariant to rotation and color transformation: a rotated car in a different color is still identified as a car. Data augmentation, which adds the transformed data into the training set and trains a model on the augmented data, is one commonly used technique to build these invariances into the learning process. However, it is unclear how data augmentation performs theoretically and what the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC learnability under transformation invariances in three settings according to different levels of realizability: (i) A hypothesis fits the augmented data; (ii) A hypothesis fits only the original data and the transformed data lying in the support of the data distribution; (iii) Agnostic case. One interesting observation is that distinguishing between the original data and the transformed data is necessary to achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating between the original and transformed data (including data augmentation) is not optimal. Furthermore, this type of algorithms can even "harm" the accuracy. In setting (i), although it is unnecessary to distinguish between the two data sets, data augmentation still does not perform optimally. Due to such a difference, we propose two combinatorial measures characterizing the optimal sample complexity in setting (i) and (ii)(iii) and provide the optimal algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2021

Text Augmentation in a Multi-Task View

Traditional data augmentation aims to increase the coverage of the input...
research
07/25/2019

Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond

Many complex deep learning models have found success by exploiting symme...
research
11/16/2021

Learning Augmentation Distributions using Transformed Risk Minimization

Adapting to the structure of data distributions (such as symmetry and tr...
research
07/06/2020

On Data Augmentation and Adversarial Risk: An Empirical Analysis

Data augmentation techniques have become standard practice in deep learn...
research
01/20/2018

Visual Data Augmentation through Learning

The rapid progress in machine learning methods has been empowered by i) ...
research
09/13/2018

SiftingGAN: Generating and Sifting Labeled Samples to Improve the Remote Sensing Image Scene Classification Baseline in vitro

Lack of annotated samples vastly restrains the direct application of dee...
research
06/01/2023

Provable Benefit of Mixup for Finding Optimal Decision Boundaries

We investigate how pair-wise data augmentation techniques like Mixup aff...

Please sign up or login with your details

Forgot password? Click here to reset