DeepAI AI Chat
Log In Sign Up

Towards Understanding the Data Dependency of Mixup-style Training

by   Muthu Chidambaram, et al.
Duke University
Peking University

In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.


page 24

page 25


Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Mixup is a data augmentation technique that relies on training using ran...

Supervising Feature Influence

Causal influence measures for machine learnt classifiers shed light on t...

Improving Generalization via Uncertainty Driven Perturbations

Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bi...

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

We prove bounds on the population risk of the maximum margin algorithm f...

On the Error Resistance of Hinge Loss Minimization

Commonly used classification algorithms in machine learning, such as sup...

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

In real-world classification tasks, each class often comprises multiple ...

Investigating minimizing the training set fill distance in machine learning regression

Many machine learning regression methods leverage large datasets for tra...

Code Repositories


Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training".

view repo