Augmenting medical image classifiers with synthetic data from latent diffusion models

08/23/2023
by   Luke W. Sagers, et al.
0

While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.

READ FULL TEXT
research
11/23/2022

Improving dermatology classifiers across populations using images generated by large diffusion models

Dermatological classification algorithms developed without sufficiently ...
research
01/12/2023

Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images

Despite continued advancement in recent years, deep neural networks stil...
research
05/24/2023

Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science

Large Language Models (LLMs) have democratized synthetic data generation...
research
11/15/2021

Disparities in Dermatology AI: Assessments Using Diverse Clinical Images

More than 3 billion people lack access to care for skin disease. AI diag...
research
11/20/2019

DermGAN: Synthetic Generation of Clinical Skin Images with Pathology

Despite the recent success in applying supervised deep learning to medic...
research
03/24/2023

CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images

Recent technological advances in synthetic data have enabled the generat...
research
02/01/2023

SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model debugging and analysis

For the deployment of artificial intelligence (AI) in high-risk settings...

Please sign up or login with your details

Forgot password? Click here to reset