Improving Semi-supervised Deep Learning by using Automatic Thresholding to Deal with Out of Distribution Data for COVID-19 Detection using Chest X-ray Images

11/03/2022
by   Isaac Benavides-Mata, et al.
0

Semi-supervised learning (SSL) leverages both labeled and unlabeled data for training models when the labeled data is limited and the unlabeled data is vast. Frequently, the unlabeled data is more widely available than the labeled data, hence this data is used to improve the level of generalization of a model when the labeled data is scarce. However, in real-world settings unlabeled data might depict a different distribution than the labeled dataset distribution. This is known as distribution mismatch. Such problem generally occurs when the source of unlabeled data is different from the labeled data. For instance, in the medical imaging domain, when training a COVID-19 detector using chest X-ray images, different unlabeled datasets sampled from different hospitals might be used. In this work, we propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset. We use the Mahalanobis distance between the labeled and unlabeled datasets using the feature space built by a pre-trained Image-net Feature Extractor (FE) to score each unlabeled observation. We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images. The tested methods provide an automatic manner to define what unlabeled data to preserve when training a semi-supervised deep learning architecture.

READ FULL TEXT

page 1

page 4

research
12/18/2019

RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms

Semi-Supervised Learning (SSL) algorithms have shown great potential in ...
research
07/24/2021

A Real Use Case of Semi-Supervised Learning for Mammogram Classification in a Local Clinic of Costa Rica

The implementation of deep learning based computer aided diagnosis syste...
research
07/22/2019

Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space

The success of deep learning in medical imaging is mostly achieved at th...
research
10/09/2021

Harnessing Unlabeled Data to Improve Generalization of Biometric Gender and Age Classifiers

With significant advances in deep learning, many computer vision applica...
research
06/02/2023

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Rapid discovery of new diseases, such as COVID-19 can enable a timely ep...
research
05/23/2019

Detecting Malicious PowerShell Scripts Using Contextual Embeddings

PowerShell is a command line shell, that is widely used in organizations...
research
03/21/2023

Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Semi-supervised learning (SSL) methods assume that labeled data, unlabel...

Please sign up or login with your details

Forgot password? Click here to reset