Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

04/05/2023
by   Xuhui Jia, et al.
1

This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches, which often employ a per-object optimization paradigm. Our framework adopts an encoder to capture high-level identifiable semantics of objects, producing an object-specific embedding with only a single feed-forward pass. The acquired object embedding is then passed to a text-to-image synthesis model for subsequent generation. To effectively blend a object-aware embedding space into a well developed text-to-image model under the same generation context, we investigate different network designs and training strategies, and propose a simple yet effective regularized joint training scheme with an object identity preservation loss. Additionally, we propose a caption generation scheme that become a critical piece in fostering object specific embedding faithfully reflected into the generation process, while keeping control and editing abilities. Once trained, the network is able to produce diverse content and styles, conditioned on both texts and objects. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity, without the need of test-time optimization. Systematic studies are also conducted to analyze our models, providing insights for future work.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
10/17/2022

Imagic: Text-Based Real Image Editing with Diffusion Models

Text-conditioned image editing has recently attracted considerable inter...
research
03/15/2023

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Diffusion models have shown superior performance in image generation and...
research
09/11/2023

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Personalized text-to-image generation has emerged as a powerful and soug...
research
11/21/2022

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning

Large-scale text-to-image generation models have achieved remarkable pro...
research
12/19/2016

Parsing Images of Overlapping Organisms with Deep Singling-Out Networks

This work is motivated by the mostly unsolved task of parsing biological...
research
07/31/2018

Caging Loops in Shape Embedding Space: Theory and Computation

We propose to synthesize feasible caging grasps for a target object thro...
research
09/05/2023

Towards Diverse and Consistent Typography Generation

In this work, we consider the typography generation task that aims at pr...

Please sign up or login with your details

Forgot password? Click here to reset