A unified framework for dataset shift diagnostics

05/17/2022
by   Felipe Maia Polo, et al.
0

Most machine learning (ML) methods assume that the data used in the training phase comes from the distribution of the target population. However, in practice one often faces dataset shift, which, if not properly taken into account, may decrease the predictive performance of the ML models. In general, if the practitioner knows which type of shift is taking place - e.g., covariate shift or label shift - they may apply transfer learning methods to obtain better predictions. Unfortunately, current methods for detecting shift are only designed to detect specific types of shift or cannot formally test their presence. We introduce a general framework that gives insights on how to improve prediction methods by detecting the presence of different types of shift and quantifying how strong they are. Our approach can be used for any data type (tabular/image/text) and both for classification and regression tasks. Moreover, it uses formal hypotheses tests that controls false alarms. We illustrate how our framework is useful in practice using both artificial and real datasets. Our package for dataset shift detection can be found in https://github.com/felipemaiapolo/detectshift.

READ FULL TEXT
research
04/19/2023

Information Geometrically Generalized Covariate Shift Adaptation

Many machine learning methods assume that the training and test data fol...
research
12/28/2020

Testing for concept shift online

This note continues study of exchangeability martingales, i.e., processe...
research
08/30/2021

SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts

Many machine learning algorithms assume that the training data and the t...
research
10/29/2018

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

We might hope that when faced with unexpected inputs, well-designed soft...
research
06/28/2021

Ensembling Shift Detectors: an Extensive Empirical Evaluation

The term dataset shift refers to the situation where the data used to tr...
research
07/07/2021

Test for non-negligible adverse shifts

Statistical tests for dataset shift are susceptible to false alarms: the...
research
10/28/2021

Class-wise Thresholding for Detecting Out-of-Distribution Data

We consider the problem of detecting OoD(Out-of-Distribution) input data...

Please sign up or login with your details

Forgot password? Click here to reset