Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model

06/13/2023
by   Xin Zhang, et al.
0

Text-to-image generative models have attracted rising attention for flexible image editing via user-specified descriptions. However, text descriptions alone are not enough to elaborate the details of subjects, often compromising the subjects' identity or requiring additional per-subject fine-tuning. We introduce a new framework called Paste, Inpaint and Harmonize via Denoising (PhD), which leverages an exemplar image in addition to text descriptions to specify user intentions. In the pasting step, an off-the-shelf segmentation model is employed to identify a user-specified subject within an exemplar image which is subsequently inserted into a background image to serve as an initialization capturing both scene context and subject identity in one. To guarantee the visual coherence of the generated or edited image, we introduce an inpainting and harmonizing module to guide the pre-trained diffusion model to seamlessly blend the inserted subject into the scene naturally. As we keep the pre-trained diffusion model frozen, we preserve its strong image synthesis ability and text-driven ability, thus achieving high-quality results and flexible editing with diverse texts. In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject. Both quantitative and qualitative comparisons with baseline methods demonstrate that our approach achieves state-of-the-art performance in both tasks. More qualitative results can be found at <https://sites.google.com/view/phd-demo-page>.

READ FULL TEXT

page 1

page 6

page 8

page 9

page 16

page 17

page 18

page 19

research
05/24/2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Subject-driven text-to-image generation models create novel renditions o...
research
11/15/2022

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models

With the rise of large, publicly-available text-to-image diffusion model...
research
05/29/2023

Photoswap: Personalized Subject Swapping in Images

In an era where images and visual content dominate our digital landscape...
research
05/25/2023

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Text-to-image (T2I) research has grown explosively in the past year, owi...
research
07/15/2023

Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?

Pre-trained text-to-image generative models can produce diverse, semanti...
research
06/22/2023

DreamEdit: Subject-driven Image Editing

Subject-driven image generation aims at generating images containing cus...
research
05/25/2023

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Text-to-Image diffusion models have made tremendous progress over the pa...

Please sign up or login with your details

Forgot password? Click here to reset