TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

03/10/2023
by   Weixin Chen, et al.
0

Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.

READ FULL TEXT

page 14

page 15

page 16

page 17

page 18

page 19

page 21

page 24

research
05/07/2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

With the help of conditioning mechanisms, the state-of-the-art diffusion...
research
02/09/2023

Better Diffusion Models Further Improve Adversarial Training

It has been recognized that the data generated by the denoising diffusio...
research
08/28/2023

DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing

Diffusion models have been leveraged to perform adversarial purification...
research
02/08/2023

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

We introduce a new family of physics-inspired generative models termed P...
research
02/01/2023

Stable Target Field for Reduced Variance Score Estimation in Diffusion Models

Diffusion models generate samples by reversing a fixed forward diffusion...
research
06/08/2023

PriSampler: Mitigating Property Inference of Diffusion Models

Diffusion models have been remarkably successful in data synthesis. Such...
research
05/15/2023

A Reproducible Extraction of Training Images from Diffusion Models

Recently, Carlini et al. demonstrated the widely used model Stable Diffu...

Please sign up or login with your details

Forgot password? Click here to reset