Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

10/29/2018
by   Stephan Rabanser, et al.
10

We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. Machine learning (ML) systems, however, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift and identifying exemplars that most typify the shift. We focus on several datasets and various perturbations to both covariates and label distributions with varying magnitudes and fractions of data affected. Interestingly, we show that while classifier-based methods perform well in high-data settings, they perform poorly in low-data settings. Moreover, across the dataset shifts that we explore, a two-sample-testing-based approach, using pretrained classifiers for dimensionality reduction performs best.

READ FULL TEXT

page 7

page 8

page 9

page 12

page 13

page 14

page 16

page 20

research
09/18/2022

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

Deployed machine learning (ML) models often encounter new user data that...
research
12/28/2020

Testing for concept shift online

This note continues study of exchangeability martingales, i.e., processe...
research
05/17/2022

A unified framework for dataset shift diagnostics

Most machine learning (ML) methods assume that the data used in the trai...
research
04/18/2021

Failing Conceptually: Concept-Based Explanations of Dataset Shift

Despite their remarkable performance on a wide range of visual tasks, ma...
research
03/08/2023

Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in Medical Images

Distribution shifts remain a fundamental problem for the safe applicatio...
research
07/21/2021

Preventing dataset shift from breaking machine-learning biomarkers

Machine learning brings the hope of finding new biomarkers extracted fro...
research
06/15/2023

Enhanced Sampling with Machine Learning: A Review

Molecular dynamics (MD) enables the study of physical systems with excel...

Please sign up or login with your details

Forgot password? Click here to reset