Estimating and Explaining Model Performance When Both Covariates and Labels Shift

09/18/2022
by   Lingjiao Chen, et al.
3

Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. This unifies and generalizes several existing shift models including label shift and sparse covariate shift, where only marginal feature or label distribution shifts are considered. We describe mathematical conditions under which SJS is identifiable. We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels. We conduct extensive experiments on several real-world datasets with various ML models. Across different datasets and distribution shifts, SEES achieves significant (up to an order of magnitude) shift estimation error improvements over existing approaches.

READ FULL TEXT
research
03/05/2023

Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

Our goal is to improve reliability of Machine Learning (ML) systems depl...
research
10/29/2018

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

We might hope that when faced with unexpected inputs, well-designed soft...
research
03/03/2023

Unproportional mosaicing

Data shift is a gap between data distribution used for training and data...
research
12/08/2020

Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels

In model serving, having one fixed model during the entire often life-lo...
research
03/29/2023

Sparse joint shift in multinomial classification

Sparse joint shift (SJS) was recently proposed as a tractable model for ...
research
10/19/2022

Towards Explaining Distribution Shifts

A distribution shift can have fundamental consequences such as signaling...
research
11/07/2022

A Semiparametric Efficient Approach To Label Shift Estimation and Quantification

Transfer Learning is an area of statistics and machine learning research...

Please sign up or login with your details

Forgot password? Click here to reset