Key-Locked Rank One Editing for Text-to-Image Personalization

05/02/2023
by   Yoad Tewel, et al.
0

Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.

READ FULL TEXT

page 6

page 10

page 11

page 13

page 14

page 18

page 19

page 22

research
06/22/2023

Continuous Layout Editing of Single Images with Diffusion Models

Recent advancements in large-scale text-to-image diffusion models have e...
research
05/25/2023

Break-A-Scene: Extracting Multiple Concepts from a Single Image

Text-to-image model personalization aims to introduce a user-provided co...
research
07/13/2023

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Text-to-image (T2I) personalization allows users to guide the creative i...
research
05/24/2023

A Neural Space-Time Representation for Text-to-Image Personalization

A key aspect of text-to-image personalization methods is the manner in w...
research
02/23/2023

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Text-to-image personalization aims to teach a pre-trained diffusion mode...
research
05/26/2023

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

In this paper, we present ControlVideo, a novel method for text-driven v...
research
03/07/2023

ELODIN: Naming Concepts in Embedding Spaces

Despite recent advancements, the field of text-to-image synthesis still ...

Please sign up or login with your details

Forgot password? Click here to reset