Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

02/23/2023
by   Rinon Gal, et al.
0

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps - accelerating personalization from dozens of minutes to seconds, while preserving quality.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 10

research
06/01/2023

Inserting Anybody in Diffusion Models via Celeb Basis

Exquisite demand exists for customizing the pretrained large text-to-ima...
research
05/25/2023

Break-A-Scene: Extracting Multiple Concepts from a Single Image

Text-to-image model personalization aims to introduce a user-provided co...
research
07/13/2023

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Text-to-image (T2I) personalization allows users to guide the creative i...
research
03/07/2023

ELODIN: Naming Concepts in Embedding Spaces

Despite recent advancements, the field of text-to-image synthesis still ...
research
05/29/2023

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Public large-scale text-to-image diffusion models, such as Stable Diffus...
research
12/08/2022

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned ...
research
05/02/2023

Key-Locked Rank One Editing for Text-to-Image Personalization

Text-to-image models (T2I) offer a new level of flexibility by allowing ...

Please sign up or login with your details

Forgot password? Click here to reset