Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images

by   Mohamed Akrout, et al.

Despite continued advancement in recent years, deep neural networks still rely on large amounts of training data to avoid overfitting. However, labeled training data for real-world applications such as healthcare is limited and difficult to access given longstanding privacy, and strict data sharing policies. By manipulating image datasets in the pixel or feature space, existing data augmentation techniques represent one of the effective ways to improve the quantity and diversity of training data. Here, we look to advance augmentation techniques by building upon the emerging success of text-to-image diffusion probabilistic models in augmenting the training samples of our macroscopic skin disease dataset. We do so by enabling fine-grained control of the image generation process via input text prompts. We demonstrate that this generative data augmentation approach successfully maintains a similar classification accuracy of the visual classifier even when trained on a fully synthetic skin disease dataset. Similar to recent applications of generative models, our study suggests that diffusion models are indeed effective in generating high-quality skin images that do not sacrifice the classifier performance, and can improve the augmentation of training datasets after curation.


page 1

page 6

page 7


Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Despite consistent advancement in powerful deep learning techniques in r...

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

This work presents a new state of the art in reconstruction of surface r...

Improving dermatology classifiers across populations using images generated by large diffusion models

Dermatological classification algorithms developed without sufficiently ...

Augmenting medical image classifiers with synthetic data from latent diffusion models

While hundreds of artificial intelligence (AI) algorithms are now approv...

Understanding data augmentation for classification: when to warp?

In this paper we investigate the benefit of augmenting data with synthet...

Dialog State Tracking with Reinforced Data Augmentation

Neural dialog state trackers are generally limited due to the lack of qu...

Training on Thin Air: Improve Image Classification with Generated Data

Acquiring high-quality data for training discriminative models is a cruc...

Please sign up or login with your details

Forgot password? Click here to reset