Ambient Diffusion: Learning Clean Distributions from Corrupted Data

by   Giannis Daras, et al.

We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to acquire. Another benefit of our approach is the ability to train generative models that are less likely to memorize individual training samples since they never observe clean training data. Our main idea is to introduce additional measurement distortion during the diffusion process and require the model to predict the original corrupted image from the further corrupted image. We prove that our method leads to models that learn the conditional expectation of the full uncorrupted image given this additional measurement corruption. This holds for any corruption process that satisfies some technical conditions (and in particular includes inpainting and compressed sensing). We train models on standard benchmarks (CelebA, CIFAR-10 and AFHQ) and show that we can learn the distribution even when all the training samples have 90% of their pixels missing. We also show that we can finetune foundation models on small corrupted datasets (e.g. MRI scans with block corruptions) and learn the clean distribution without memorizing the training set.


page 2

page 3

page 9

page 20

page 21

page 22

page 23

page 24


GSURE-Based Diffusion Model Training with Corrupted Data

Diffusion models have demonstrated impressive results in both data gener...

Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion Generative Models

Diffusion-based generative models have been used as powerful priors for ...

Robust Dictionary based Data Representation

The robustness to noise and outliers is an important issue in linear rep...

Noise2Noise: Learning Image Restoration without Clean Data

We apply basic statistical reasoning to signal reconstruction by machine...

Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data

We present SURE-Score: an approach for learning score-based generative m...

Highly corrupted image inpainting through hypoelliptic diffusion

We present a new image inpainting algorithm, the Averaging and Hypoellip...

Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach

Collecting paired training data is difficult in practice, but the unpair...

Please sign up or login with your details

Forgot password? Click here to reset