Estimating Model Performance on External Samples from Their Limited Statistical Characteristics

02/28/2022
by   Tal El-Hay, et al.
0

Methods that address data shifts usually assume full access to multiple datasets. In the healthcare domain, however, privacy-preserving regulations as well as commercial interests limit data availability and, as a result, researchers can typically study only a small number of datasets. In contrast, limited statistical characteristics of specific patient samples are much easier to share and may be available from previously published literature or focused collaborative efforts. Here, we propose a method that estimates model performance in external samples from their limited statistical characteristics. We search for weights that induce internal statistics that are similar to the external ones; and that are closest to uniform. We then use model performance on the weighted internal sample as an estimation for the external counterpart. We evaluate the proposed algorithm on simulated data as well as electronic medical record data for two risk models, predicting complications in ulcerative colitis patients and stroke in women diagnosed with atrial fibrillation. In the vast majority of cases, the estimated external performance is much closer to the actual one than the internal performance. Our proposed method may be an important building block in training robust models and detecting potential model failures in external environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Librispeech Transducer Model with Internal Language Model Prior Correction

We present our transducer model on Librispeech. We study variants to inc...
research
10/20/2020

An ensemble meta-prediction framework to integrate multiple external models into a current study

Disease risk prediction models are used throughout clinical biomedicine....
research
07/11/2020

Generalization of Deep Convolutional Neural Networks – A Case-study on Open-source Chest Radiographs

Deep Convolutional Neural Networks (DCNNs) have attracted extensive atte...
research
10/01/2022

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Suppose we have available individual data from an internal study and var...
research
06/15/2020

Privacy-Preserving Technology to Help Millions of People: Federated Prediction Model for Stroke Prevention

prevention of stroke with its associated risk factors has been one of th...
research
06/12/2021

Regression inference for multiple populations by integrating summary-level data using stacked imputations

There is a growing need for flexible general frameworks that integrate i...
research
05/22/2023

Evaluating Model Performance in Medical Datasets Over Time

Machine learning (ML) models deployed in healthcare systems must face da...

Please sign up or login with your details

Forgot password? Click here to reset