BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

05/24/2023
by   Dongxu Li, et al.
0

Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also demonstrate that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications. Code and models will be released at https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion. Project page at https://dxli94.github.io/BLIP-Diffusion-website/.

READ FULL TEXT

page 7

page 10

page 11

page 15

page 16

page 17

page 18

page 19

research
05/17/2023

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Diffusion models excel at text-to-image generation, especially in subjec...
research
06/13/2023

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

Text-to-image generative models have attracted rising attention for flex...
research
07/24/2023

Interpolating between Images with Diffusion Models

One little-explored frontier of image generation and editing is the task...
research
06/22/2023

DreamEdit: Subject-driven Image Editing

Subject-driven image generation aims at generating images containing cus...
research
04/04/2023

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Fashion illustration is used by designers to communicate their vision an...
research
04/01/2023

Subject-driven Text-to-Image Generation via Apprenticeship Learning

Recent text-to-image generation models like DreamBooth have made remarka...
research
04/14/2023

Identity Encoder for Personalized Diffusion

Many applications can benefit from personalized image generation models,...

Please sign up or login with your details

Forgot password? Click here to reset