Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

07/31/2021
by   Loic Le Folgoc, et al.
0

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for machine learning models. They cause significant gaps between model performance in the lab and in the real world. Our work is a solution to prevalence bias. Prevalence bias is the discrepancy between the prevalence of a pathology and its sampling rate in the training dataset, introduced upon collecting data or due to the practioner rebalancing the training batches. This paper lays the theoretical and computational framework for training models, and for prediction, in the presence of prevalence bias. Concretely a bias-corrected loss function, as well as bias-corrected predictive rules, are derived under the principles of Bayesian risk minimization. The loss exhibits a direct connection to the information gain. It offers a principled alternative to heuristic training losses and complements test-time procedures based on selecting an operating point from summary curves. It integrates seamlessly in the current paradigm of (deep) learning using stochastic backpropagation and naturally with Bayesian models.

READ FULL TEXT
research
06/24/2020

Bayesian Sampling Bias Correction: Training with the Right Loss Function

We derive a family of loss functions to train models in the presence of ...
research
02/07/2023

Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors

Epidemiologic screening programs often make use of tests with small, but...
research
10/25/2018

Between a ROC and a Hard Place: Using prevalence plots to understand the likely real world performance of biomarkers in the clinic

The Receiver Operating Characteristic (ROC) curve and the Area Under the...
research
06/10/2022

Active information, missing data and prevalence estimation

The topic of this paper is prevalence estimation from the perspective of...
research
02/13/2023

Provable Detection of Propagating Sampling Bias in Prediction Models

With an increased focus on incorporating fairness in machine learning mo...
research
03/29/2023

Problems and shortcuts in deep learning for screening mammography

This work reveals undiscovered challenges in the performance and general...
research
03/22/2023

Deployment of Image Analysis Algorithms under Prevalence Shifts

Domain gaps are among the most relevant roadblocks in the clinical trans...

Please sign up or login with your details

Forgot password? Click here to reset