Dealing with Small Annotated Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencod

08/12/2023
by   Daniel Wolf, et al.
0

Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task, the so-called “downstream task". The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several downstream CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in the medical imaging domain, it is of particular interest to evaluate how the self-supervised pre-training methods perform on small downstream datasets. By experimenting with gradually reducing the training dataset size of our downstream tasks, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical downstream tasks with small datasets.

READ FULL TEXT

page 1

page 5

page 8

research
03/25/2021

Contrasting Contrastive Self-Supervised Representation Learning Models

In the past few years, we have witnessed remarkable breakthroughs in sel...
research
05/25/2022

Interaction of a priori Anatomic Knowledge with Self-Supervised Contrastive Learning in Cardiac Magnetic Resonance Imaging

Training deep learning models on cardiac magnetic resonance imaging (CMR...
research
06/20/2023

LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Obtaining large pre-trained models that can be fine-tuned to new tasks w...
research
12/06/2022

Giga-SSL: Self-Supervised Learning for Gigapixel Images

Whole slide images (WSI) are microscopy images of stained tissue slides ...
research
01/04/2022

Self-supervised Learning from 100 Million Medical Images

Building accurate and robust artificial intelligence systems for medical...
research
05/17/2021

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

The usage of smartphone-collected respiratory sound, trained with deep l...
research
03/25/2023

MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation

The medical imaging community generates a wealth of datasets, many of wh...

Please sign up or login with your details

Forgot password? Click here to reset