Regularization via Mass Transportation

The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce training data, overfitting is typically mitigated by adding regularization terms to the objective that penalize hypothesis complexity. In this paper we introduce new regularization techniques using ideas from distributionally robust optimization, and we give new probabilistic interpretations to existing techniques. Specifically, we propose to minimize the worst-case expected loss, where the worst case is taken over the ball of all (continuous or discrete) distributions that have a bounded transportation distance from the (discrete) empirical distribution. By choosing the radius of this ball judiciously, we can guarantee that the worst-case expected loss provides an upper confidence bound on the loss on test data, thus offering new generalization bounds. We prove that the resulting regularized learning problems are tractable and can be tractably kernelized for many popular loss functions. We validate our theoretical out-of-sample guarantees through simulated and empirical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2015

Distributionally Robust Logistic Regression

This paper proposes a distributionally robust approach to logistic regre...
research
06/07/2016

A Minimax Approach to Supervised Learning

Given a task of predicting Y from X, a loss function L, and a set of pro...
research
12/17/2017

Wasserstein Distributional Robustness and Regularization in Statistical Learning

A central question in statistical learning is to design algorithms that ...
research
07/16/2020

Provable Worst Case Guarantees for the Detection of Out-of-Distribution Data

Deep neural networks are known to be overconfident when applied to out-o...
research
05/28/2023

HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts

In this work, we propose a hyperparameter optimization method named Hype...
research
01/03/2021

Distributionally robust halfspace depth

Tukey's halfspace depth can be seen as a stochastic program and as such ...
research
01/17/2022

Minimax risk classifiers with 0-1 loss

Supervised classification techniques use training samples to learn a cla...

Please sign up or login with your details

Forgot password? Click here to reset