Synthetic Augmentation with Large-scale Unconditional Pre-training

08/08/2023
by   Jiarong Ye, et al.
0

Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4 labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.

READ FULL TEXT

page 7

page 13

research
11/10/2021

Selective Synthetic Augmentation with HistoGAN for Improved Histopathology Image Classification

Histopathological analysis is the present gold standard for precancerous...
research
03/28/2023

Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

Neural text-to-speech (TTS) models can synthesize natural human speech w...
research
11/25/2022

Expanding Small-Scale Datasets with Guided Imagination

The power of Deep Neural Networks (DNNs) depends heavily on the training...
research
08/26/2020

Synthetic Sample Selection via Reinforcement Learning

Synthesizing realistic medical images provides a feasible solution to th...
research
08/28/2023

LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration

Despite significant advances in deep learning, models often struggle to ...
research
07/15/2022

Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification

Deep learning-based melanoma classification with dermoscopic images has ...
research
06/29/2022

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Automated analysis of optical colonoscopy (OC) video frames (to assist e...

Please sign up or login with your details

Forgot password? Click here to reset