Text-to-image Editing by Image Information Removal

by   Zhongping Zhang, et al.

Diffusion models have demonstrated impressive performance in text-guided image generation. To leverage the knowledge of text-guided image generation models in image editing, current approaches either fine-tune the pretrained models using the input image (e.g., Imagic) or incorporate structure information as additional constraints into the pretrained models (e.g., ControlNet). However, fine-tuning large-scale diffusion models on a single image can lead to severe overfitting issues and lengthy inference time. The information leakage from pretrained models makes it challenging to preserve the text-irrelevant content of the input image while generating new features guided by language descriptions. On the other hand, methods that incorporate structural guidance (e.g., edge maps, semantic maps, keypoints) as additional constraints face limitations in preserving other attributes of the original image, such as colors or textures. A straightforward way to incorporate the original image is to directly use it as an additional control. However, since image editing methods are typically trained on the image reconstruction task, the incorporation can lead to the identical mapping issue, where the model learns to output an image identical to the input, resulting in limited editing capabilities. To address these challenges, we propose a text-to-image editing model with Image Information Removal module (IIR) to selectively erase color-related and texture-related information from the original image, allowing us to better preserve the text-irrelevant content and avoid the identical mapping issue. We evaluate our model on three benchmark datasets: CUB, Outdoor Scenes, and COCO. Our approach achieves the best editability-fidelity trade-off, and our edited images are approximately 35 annotators than the prior-arts on COCO.


page 2

page 4

page 6

page 7

page 8

page 13

page 14

page 15


DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion m...

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Recently large-scale language-image models (e.g., text-guided diffusion ...

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image

We present UniTune, a simple and novel method for general text-driven im...

Zero-shot Image-to-Image Translation

Large-scale text-to-image generative models have shown their remarkable ...

Face Aging via Diffusion-based Editing

In this paper, we address the problem of face aging: generating past or ...

Custom Structure Preservation in Face Aging

In this work, we propose a novel architecture for face age editing that ...

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Text-to-image models are trained on extensive amounts of data, leading t...

Please sign up or login with your details

Forgot password? Click here to reset