Backdooring Textual Inversion for Concept Censorship

08/21/2023
by   Yutong Wu, et al.
0

Recent years have witnessed success in AIGC (AI Generated Content). People can make use of a pre-trained diffusion model to generate images of high quality or freely modify existing pictures with only prompts in nature language. More excitingly, the emerging personalization techniques make it feasible to create specific-desired images with only a few images as references. However, this induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (i.e., concept censorship) for their development and advancement. In this paper, we focus on the personalization technique dubbed Textual Inversion (TI), which is becoming prevailing for its lightweight nature and excellent performance. TI crafts the word embedding that contains detailed information about a specific object. Users can easily download the word embedding from public websites like Civitai and add it to their own stable diffusion model without fine-tuning for personalization. To achieve the concept censorship of a TI model, we propose leveraging the backdoor technique for good by injecting backdoors into the Textual Inversion embeddings. Briefly, we select some sensitive words as triggers during the training of TI, which will be censored for normal use. In the subsequent generation stage, if the triggers are combined with personalized embeddings as final prompts, the model will output a pre-defined target image rather than images including the desired malicious concept. To demonstrate the effectiveness of our approach, we conduct extensive experiments on Stable Diffusion, a prevailing open-sourced text-to-image model. Our code, data, and results are available at https://concept-censorship.github.io.

READ FULL TEXT

page 5

page 7

page 8

page 9

page 10

page 11

page 15

page 16

research
09/12/2023

Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking

AIGC (AI-Generated Content) has achieved tremendous success in many appl...
research
08/02/2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Text-to-image models offer unprecedented freedom to guide creation throu...
research
03/27/2023

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis

Text-to-image diffusion models are nothing but a revolution, allowing an...
research
11/30/2022

Multiresolution Textual Inversion

We extend Textual Inversion to learn pseudo-words that represent a conce...
research
07/10/2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

With the advance of text-to-image models (e.g., Stable Diffusion) and co...
research
03/03/2023

Word-As-Image for Semantic Typography

A word-as-image is a semantic typography technique where a word illustra...
research
06/01/2023

Inserting Anybody in Diffusion Models via Celeb Basis

Exquisite demand exists for customizing the pretrained large text-to-ima...

Please sign up or login with your details

Forgot password? Click here to reset