Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

03/05/2023
by   Vihari Piratla, et al.
0

Our goal is to improve reliability of Machine Learning (ML) systems deployed in the wild. ML models perform exceedingly well when test examples are similar to train examples. However, real-world applications are required to perform on any distribution of test examples. Current ML systems can fail silently on test examples with distribution shifts. In order to improve reliability of ML models due to covariate or domain shift, we propose algorithms that enable models to: (a) generalize to a larger family of test distributions, (b) evaluate accuracy under distribution shifts, (c) adapt to a target distribution. We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models. A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses. While we improve robustness over standard training methods for certain problem settings, performance of ML systems can still vary drastically with domain shifts. It is crucial for developers and stakeholders to understand model vulnerabilities and operational ranges of input, which could be assessed on the fly during the deployment, albeit at a great cost. Instead, we advocate for proactively estimating accuracy surfaces over any combination of prespecified and interpretable domain shifts for performance forecasting. We present a label-efficient estimation to address estimation over a combinatorial space of domain shifts. Further, when a model's performance on a target domain is found to be poor, traditional approaches adapt the model using the target domain's resources. Standard adaptation methods assume access to sufficient labeled resources, which may be impractical for deployed models. We initiate a study of lightweight adaptation techniques with only unlabeled data resources with a focus on language applications.

READ FULL TEXT

page 13

page 26

research
08/17/2023

Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts

Test-time adaptation (TTA) aims to adapt a pre-trained model to the targ...
research
09/18/2022

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

Deployed machine learning (ML) models often encounter new user data that...
research
06/17/2022

StaDRe and StaDRo: Reliability and Robustness Estimation of ML-based Forecasting using Statistical Distance Measures

Reliability estimation of Machine Learning (ML) models is becoming a cru...
research
12/14/2020

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Distribution shifts can cause significant degradation in a broad range o...
research
03/03/2023

Diagnosing Model Performance Under Distribution Shift

Prediction models can perform poorly when deployed to target distributio...
research
07/17/2021

BEDS-Bench: Behavior of EHR-models under Distributional Shift–A Benchmark

Machine learning has recently demonstrated impressive progress in predic...
research
06/15/2023

Dis-AE: Multi-domain Multi-task Generalisation on Real-World Clinical Data

Clinical data is often affected by clinically irrelevant factors such as...

Please sign up or login with your details

Forgot password? Click here to reset