The pitfalls of using open data to develop deep learning solutions for COVID-19 detection in chest X-rays

09/14/2021
by   Rachael Harkness, et al.
0

Since the emergence of COVID-19, deep learning models have been developed to identify COVID-19 from chest X-rays. With little to no direct access to hospital data, the AI community relies heavily on public data comprising numerous data sources. Model performance results have been exceptional when training and testing on open-source data, surpassing the reported capabilities of AI in pneumonia-detection prior to the COVID-19 outbreak. In this study impactful models are trained on a widely used open-source data and tested on an external test set and a hospital dataset, for the task of classifying chest X-rays into one of three classes: COVID-19, non-COVID pneumonia and no-pneumonia. Classification performance of the models investigated is evaluated through ROC curves, confusion matrices and standard classification metrics. Explainability modules are implemented to explore the image features most important to classification. Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem and that results from testing on this are inflated. Dependence on open-source data can leave models vulnerable to bias and confounding variables, requiring careful analysis to develop clinically useful/viable AI tools for COVID-19 detection in chest X-rays.

READ FULL TEXT
research
03/22/2020

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images

The COVID-19 pandemic continues to have a devastating effect on the heal...
research
06/23/2020

Was there COVID-19 back in 2012? Challenge for AI in Diagnosis with Similar Indications

Purpose: Since the recent COVID-19 outbreak, there has been an avalanche...
research
01/16/2022

Challenges in COVID-19 Chest X-Ray Classification: Problematic Data or Ineffective Approaches?

The value of quick, accurate, and confident diagnoses cannot be undermin...
research
01/13/2020

An Adversarial Approach for the Robust Classification of Pneumonia from Chest Radiographs

While deep learning has shown promise in the domain of disease classific...
research
08/08/2023

When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations

In machine learning, incorporating more data is often seen as a reliable...
research
10/23/2020

When the Open Source Community Meets COVID-19: Characterizing COVID-19 themed GitHub Repositories

Ever since the beginning of the outbreak of the COVID-19 pandemic, resea...
research
08/02/2023

Unlearning Spurious Correlations in Chest X-ray Classification

Medical image classification models are frequently trained using trainin...

Please sign up or login with your details

Forgot password? Click here to reset