Training Data Attribution for Diffusion Models

06/03/2023
by   Zheng Dai, et al.
0

Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs. We demonstrate the viability of these ensembles as generative models and the validity of our approach to assessing influence.

READ FULL TEXT

page 4

page 8

page 9

page 10

page 13

research
01/30/2023

Extracting Training Data from Diffusion Models

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion ha...
research
05/31/2023

A Bayesian Perspective On Training Data Attribution

Training data attribution (TDA) techniques find influential training dat...
research
01/24/2019

General Supervision via Probabilistic Transformations

Different types of training data have led to numerous schemes for superv...
research
03/01/2023

Generating Initial Conditions for Ensemble Data Assimilation of Large-Eddy Simulations with Latent Diffusion Models

In order to accurately reconstruct the time history of the atmospheric s...
research
08/13/2023

Generating observation guided ensembles for data assimilation with denoising diffusion probabilistic model

This paper presents an ensemble data assimilation method using the pseud...
research
08/02/2023

Training Data Protection with Compositional Diffusion Models

We introduce Compartmentalized Diffusion Models (CDM), a method to train...
research
12/09/2022

Training Data Influence Analysis and Estimation: A Survey

Good models require good training data. For overparameterized deep model...

Please sign up or login with your details

Forgot password? Click here to reset