Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls

02/01/2022
by   Farhad Maleki, et al.
5

Despite the great potential of machine learning, the lack of generalizability has hindered the widespread adoption of these technologies in routine clinical practice. We investigate three methodological pitfalls: (1) violation of independence assumption, (2) model evaluation with an inappropriate performance indicator, and (3) batch effect and how these pitfalls could affect the generalizability of machine learning models. We implement random forest and deep convolutional neural network models using several medical imaging datasets, including head and neck CT, lung CT, chest X-Ray, and histopathological images, to quantify and illustrate the effect of these pitfalls. We develop these models with and without the pitfall and compare the performance of the resulting models in terms of accuracy, precision, recall, and F1 score. Our results showed that violation of the independence assumption could substantially affect model generalizability. More specifically, (I) applying oversampling before splitting data into train, validation and test sets; (II) performing data augmentation before splitting data; (III) distributing data points for a subject across training, validation, and test sets; and (IV) applying feature selection before splitting data led to superficial boosts in model performance. We also observed that inappropriate performance indicators could lead to erroneous conclusions. Also, batch effect could lead to developing models that lack generalizability. The aforementioned methodological pitfalls lead to machine learning models with over-optimistic performance. These errors, if made, cannot be captured using internal model evaluation, and the inaccurate predictions made by the model may lead to wrong conclusions and interpretations. Therefore, avoiding these pitfalls is a necessary condition for developing generalizable models.

READ FULL TEXT

page 2

page 3

page 4

page 7

research
03/18/2021

CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays

A major obstacle to the integration of deep learning models for chest x-...
research
12/28/2022

Evaluating Generalizability of Deep Learning Models Using Indian-COVID-19 CT Dataset

Computer tomography (CT) have been routinely used for the diagnosis of l...
research
02/09/2021

A Real-World Demonstration of Machine Learning Generalizability: Intracranial Hemorrhage Detection on Head CT

Machine learning (ML) holds great promise in transforming healthcare. Wh...
research
04/30/2020

Intra-model Variability in COVID-19 Classification Using Chest X-ray Images

X-ray and computed tomography (CT) scanning technologies for COVID-19 sc...
research
03/20/2023

Integration of Radiomics and Tumor Biomarkers in Interpretable Machine Learning Models

Despite the unprecedented performance of deep neural networks (DNNs) in ...
research
03/06/2021

Fibrosis-Net: A Tailored Deep Convolutional Neural Network Design for Prediction of Pulmonary Fibrosis Progression from Chest CT Images

Pulmonary fibrosis is a devastating chronic lung disease that causes irr...
research
09/15/2022

Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection

When evaluating the performance of clinical machine learning models, one...

Please sign up or login with your details

Forgot password? Click here to reset