BEDS-Bench: Behavior of EHR-models under Distributional Shift–A Benchmark

07/17/2021
by   Anand Avati, et al.
19

Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, most ML models commonly display erratic behavior on Out-of-Distribution (OOD) examples, such as assigning high confidence to wrong predictions, or vice-versa. Implications of such unusual model behavior are further exacerbated in the healthcare setting, where patient health can potentially be put at risk. It is crucial to study the behavior and robustness properties of models under distributional shift, understand common failure modes, and take mitigation steps before the model is deployed. Having a benchmark that shines light upon these aspects of a model is a first and necessary step in addressing the issue. Recent work and interest in increasing model robustness in OOD settings have focused more on image modality, while the Electronic Health Record (EHR) modality is still largely under-explored. We aim to bridge this gap by releasing BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings. We use two open access, de-identified EHR datasets to construct several OOD data settings to run tests on, and measure relevant metrics that characterize crucial aspects of a model's OOD behavior. We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general. Our results highlight the need and the potential to improve robustness of EHR models under distributional shift, and BEDS-Bench provides one way to measure progress towards that goal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2022

Maintaining fairness across distribution shift: do we have viable solutions for real-world applications?

Fairness and robustness are often considered as orthogonal dimensions wh...
research
03/05/2023

Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

Our goal is to improve reliability of Machine Learning (ML) systems depl...
research
08/24/2018

Unknown Examples & Machine Learning Model Generalization

Over the past decades, researchers and ML practitioners have come up wit...
research
06/11/2021

DORO: Distributional and Outlier Robust Optimization

Many machine learning tasks involve subpopulation shift where the testin...
research
10/26/2021

Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection

Unpredictable ML model behavior on unseen data, especially in the health...
research
05/14/2021

Information-theoretic Evolution of Model Agnostic Global Explanations

Explaining the behavior of black box machine learning models through hum...
research
09/19/2020

Hidden Incentives for Auto-Induced Distributional Shift

Decisions made by machine learning systems have increasing influence on ...

Please sign up or login with your details

Forgot password? Click here to reset