Weighted Risk Minimization & Deep Learning

12/08/2018
by   Jonathon Byrd, et al.
0

Importance weighting is a key ingredient in many algorithms for causal inference and related problems, such as off-policy evaluation for reinforcement learning. Recently, theorists proved that on separable data, unregularized linear networks, trained with cross-entropy loss and optimized by stochastic gradient descent converge in direction to the max margin solution. This solution depends on the location of data points but not their weights, nullifying the effect of importance weighting. This paper asks, for realistic deep networks, for which all datasets are separable, what is the effect of importance weighting? Lacking theoretical tools for analyzing modern deep (nonlinear, unregularized) networks, we investigate the question empirically on both realistic and synthetic data. Our results demonstrate that while importance weighting alters the learned model early in training, its effect diminishes to negligible with indefinite training. However, this diminishing effect does not occur in the presence of L2-regularization. These results (i) support the broader applicability of theoretical findings by Soudry et al (2018), who analyze linear networks; (ii) call into question the practice of importance weighting; and (iii) suggest that its usefulness interacts strongly with the early stopping criteria and regularization methods that interact with the loss function.

READ FULL TEXT

page 4

page 5

research
03/28/2021

Understanding the role of importance weighting for deep learning

The recent paper by Byrd Lipton (2019), based on empirical observati...
research
12/24/2021

Is Importance Weighting Incompatible with Interpolating Classifiers?

Importance weighting is a classic technique to handle distribution shift...
research
10/26/2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated e...
research
06/29/2018

Theory IIIb: Generalization in Deep Networks

A main puzzle of deep neural networks (DNNs) revolves around the apparen...
research
06/30/2020

Gradient Methods Never Overfit On Separable Data

A line of recent works established that when training linear predictors ...
research
03/02/2023

Tight Risk Bounds for Gradient Descent on Separable Data

We study the generalization properties of unregularized gradient methods...
research
05/24/2023

Generalizing Importance Weighting to A Universal Solver for Distribution Shift Problems

Distribution shift (DS) may have two levels: the distribution itself cha...

Please sign up or login with your details

Forgot password? Click here to reset