On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets

Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical setting

READ FULL TEXT

page 5

page 6

research
04/11/2020

Unveiling COVID-19 from Chest X-ray with deep learning: a hurdles race with small data

The possibility to use widespread and simple chest X-ray (CXR) imaging f...
research
08/22/2022

Optimising Chest X-Rays for Image Analysis by Identifying and Removing Confounding Factors

During the COVID-19 pandemic, the sheer volume of imaging performed in a...
research
06/11/2022

Machine learning approaches for COVID-19 detection from chest X-ray imaging: A Systematic Review

There is a necessity to develop affordable, and reliable diagnostic tool...
research
08/09/2023

Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?

While many studies have assessed the fairness of AI algorithms in the me...
research
04/30/2020

Intra-model Variability in COVID-19 Classification Using Chest X-ray Images

X-ray and computed tomography (CT) scanning technologies for COVID-19 sc...
research
05/15/2022

Combating COVID-19 using Generative Adversarial Networks and Artificial Intelligence for Medical Images: A Scoping Review

This review presents a comprehensive study on the role of GANs in addres...

Please sign up or login with your details

Forgot password? Click here to reset