ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation

05/25/2023
by   Yuxin Zhang, et al.
0

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes like material, style, layout, etc. remains a challenge, leading to a lack of disentanglement and editability. To address this, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low- to high-frequency information, providing a new perspective on representing, generating, and editing images. We develop Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer stronger disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image/text-guided material/style/layout transfer/editing, achieving previously unattainable results with a single image input without fine-tuning the diffusion models.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

page 10

page 11

page 12

research
12/08/2022

SINE: SINgle Image Editing with Text-to-Image Diffusion Models

Recent works on diffusion models have demonstrated a strong capability f...
research
03/16/2023

P+: Extended Textual Conditioning in Text-to-Image Generation

We introduce an Extended Textual Conditioning space in text-to-image mod...
research
05/22/2023

The CLIP Model is Secretly an Image-to-Prompt Converter

The Stable Diffusion model is a prominent text-to-image generation model...
research
05/30/2023

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Text-guided image editing has recently experienced rapid development. Ho...
research
03/20/2023

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Diffusion models have achieved remarkable success in text-to-image gener...
research
08/02/2023

AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation

Advertising posters, a form of information presentation, combine visual ...
research
06/11/2023

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

We present Face0, a novel way to instantaneously condition a text-to-ima...

Please sign up or login with your details

Forgot password? Click here to reset