Identifying the Context Shift between Test Benchmarks and Production Data

07/03/2022
by   Matthew Groh, et al.
0

Across a wide variety of domains, there exists a performance gap between machine learning models' accuracy on dataset benchmarks and real-world production data. Despite the careful design of static dataset benchmarks to represent the real-world, models often err when the data is out-of-distribution relative to the data the models have been trained on. We can directly measure and adjust for some aspects of distribution shift, but we cannot address sample selection bias, adversarial perturbations, and non-stationarity without knowing the data generation process. In this paper, we outline two methods for identifying changes in context that lead to distribution shifts and model prediction errors: leveraging human intuition and expert knowledge to identify first-order contexts and developing dynamic benchmarks based on desiderata for the data generation process. Furthermore, we present two case-studies to highlight the implicit assumptions underlying applied machine learning models that tend to lead to errors when attempting to generalize beyond test benchmark datasets. By paying close attention to the role of context in each prediction task, researchers can reduce context shift errors and increase generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

Machine learning on data streams is increasingly more present in multipl...
research
06/07/2021

An Information-theoretic Approach to Distribution Shifts

Safely deploying machine learning models to the real world is often a ch...
research
07/24/2023

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

For more than a decade, researchers have measured progress in object rec...
research
11/20/2022

Are Out-of-Distribution Detection Methods Reliable?

This paper establishes a novel evaluation framework for assessing the pe...
research
06/29/2022

Towards out of distribution generalization for problems in mechanics

There has been a massive increase in research interest towards applying ...
research
03/14/2019

A Research Agenda: Dynamic Models to Defend Against Correlated Attacks

In this article I describe a research agenda for securing machine learni...
research
12/12/2019

It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets

Confounding variables are a well known source of nuisance in biomedical ...

Please sign up or login with your details

Forgot password? Click here to reset