Community Detection in Medical Image Datasets: Using Wavelets and Spectral Methods

by   Roozbeh Yousefzadeh, et al.

Medical image datasets can have large number of images representing patients with different health conditions and various disease severity. When dealing with raw unlabeled image datasets, the large number of samples often makes it hard for experts and non-experts to understand the variety of images present in a dataset. Supervised learning methods rely on labeled images which requires a considerable effort by medical experts to first understand the communities of images present in the data and then labeling the images. Here, we propose an algorithm to facilitate the automatic identification of communities in medical image datasets. We further explain that such analysis can also be insightful in a supervised setting, when the images are already labeled. Such insights are useful because in reality, health and disease severity can be considered a continuous spectrum, and within each class, there usually are finer communities worthy of investigation, especially when they have similarities to communities in other classes. In our approach, we use wavelet decomposition of images in tandem with spectral methods. We show that the eigenvalues of a graph Laplacian can reveal the number of notable communities in an image dataset. In our experiments, we use a dataset of images labeled with different conditions for COVID patients. We detect 25 communities in the dataset and then observe that only 6 of those communities contain patients with pneumonia. We also investigate the contents of a colorectal cancer histopathology dataset.


Self-Supervised Learning as a Means To Reduce the Need for Labeled Data in Medical Image Analysis

One of the largest problems in medical image processing is the lack of a...

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

Semantic segmentation of medical images aims to associate a pixel with a...

Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

The data-driven nature of deep learning models for semantic segmentation...

CD S Dataset: Handheld Imagery Dataset Acquired Under Field Conditions for Corn Disease Identification and Severity Estimation

Accurate disease identification and its severity estimation is an import...

SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification

Medical image classification is a challenging task due to the scarcity o...

Automatic Infectious Disease Classification Analysis with Concept Discovery

Automatic infectious disease classification from images can facilitate n...

Structured dataset documentation: a datasheet for CheXpert

Billions of X-ray images are taken worldwide each year. Machine learning...

Please sign up or login with your details

Forgot password? Click here to reset