Zero-shot Image-to-Image Translation

02/06/2023
by   Gaurav Parmar, et al.
0

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

READ FULL TEXT

page 6

page 7

page 15

page 16

page 17

page 18

page 19

page 21

research
05/08/2023

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Large-scale text-to-image models have demonstrated amazing ability to sy...
research
03/28/2023

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capac...
research
04/14/2023

Delta Denoising Score

We introduce Delta Denoising Score (DDS), a novel scoring function for t...
research
03/21/2023

Vox-E: Text-guided Voxel Editing of 3D Objects

Large scale text-guided diffusion models have garnered significant atten...
research
11/03/2022

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

During image editing, existing deep generative models tend to re-synthes...
research
03/09/2022

FlexIT: Towards Flexible Semantic Image Translation

Deep generative models, like GANs, have considerably improved the state ...
research
05/27/2023

Text-to-image Editing by Image Information Removal

Diffusion models have demonstrated impressive performance in text-guided...

Please sign up or login with your details

Forgot password? Click here to reset