An Information-theoretic Approach to Distribution Shifts

06/07/2021
by   Marco Federici, et al.
0

Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2022

Identifying the Context Shift between Test Benchmarks and Production Data

Across a wide variety of domains, there exists a performance gap between...
research
10/07/2021

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

This paper follows up on a recent work of (Neu, 2021) and presents new a...
research
09/01/2022

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

The distribution shifts between training and test data typically undermi...
research
04/01/2021

Model Selection's Disparate Impact in Real-World Deep Learning Applications

Algorithmic fairness has emphasized the role of biased data in automated...
research
02/04/2021

Undecidability of Underfitting in Learning Algorithms

Using recent machine learning results that present an information-theore...
research
12/07/2017

How consistent is my model with the data? Information-Theoretic Model Check

The choice of model class is fundamental in statistical learning and sys...

Please sign up or login with your details

Forgot password? Click here to reset