The Hidden Language of Diffusion Models

06/01/2023
by   Hila Chefer, et al.
0

Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual concept (e.g., "a doctor", "love"). However, the internal process of mapping text to a rich visual representation remains an enigma. In this work, we tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation. Our code will be available at: https://hila-chefer.github.io/Conceptor/

READ FULL TEXT

page 1

page 5

page 6

page 8

page 15

page 16

page 17

research
03/30/2023

Token Merging for Fast Stable Diffusion

The landscape of image generation has been forever changed by open vocab...
research
01/31/2023

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Recent text-to-image generative models have demonstrated an unparalleled...
research
05/25/2023

Break-A-Scene: Extracting Multiple Concepts from a Single Image

Text-to-image model personalization aims to introduce a user-provided co...
research
08/21/2017

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

The increasing availability of affect-rich multimedia resources has bols...
research
03/01/2023

Succinct Representations for Concepts

Foundation models like chatGPT have demonstrated remarkable performance ...
research
05/20/2022

Visual Concepts Tokenization

Obtaining the human-like perception ability of abstracting visual concep...
research
02/07/2023

Concept Algebra for Text-Controlled Vision Models

This paper concerns the control of text-guided generative models, where ...

Please sign up or login with your details

Forgot password? Click here to reset