A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion

03/29/2023
by   Haomin Zhuang, et al.
17

Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content.

READ FULL TEXT

page 1

page 5

page 6

research
05/20/2023

SneakyPrompt: Evaluating Robustness of Text-to-image Generative Models' Safety Filters

Text-to-image generative models such as Stable Diffusion and DALL·E 2 ha...
research
06/05/2023

Stable Diffusion is Unstable

Recently, text-to-image models have been thriving. Despite their powerfu...
research
08/16/2023

Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model

To enhance the security of text CAPTCHAs, various methods have been empl...
research
08/23/2023

Manipulating Embeddings of Stable Diffusion Prompts

Generative text-to-image models such as Stable Diffusion allow users to ...
research
05/16/2023

A Method for Training-free Person Image Picture Generation

The current state-of-the-art Diffusion model has demonstrated excellent ...
research
08/08/2023

FLIRT: Feedback Loop In-context Red Teaming

Warning: this paper contains content that may be inappropriate or offens...
research
05/15/2023

A Reproducible Extraction of Training Images from Diffusion Models

Recently, Carlini et al. demonstrated the widely used model Stable Diffu...

Please sign up or login with your details

Forgot password? Click here to reset