DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

03/11/2023
by   Yueming Lyu, et al.
0

Text-driven image manipulation remains challenging in training or inference flexibility. Conditional generative models depend heavily on expensive annotated training data. Meanwhile, recent frameworks, which leverage pre-trained vision-language models, are limited by either per text-prompt optimization or inference-time hyper-parameters tuning. In this work, we propose a novel framework named DeltaEdit to address these problems. Our key idea is to investigate and identify a space, namely delta image and text space that has well-aligned distribution between CLIP visual feature differences of two images and CLIP textual embedding differences of source and target texts. Based on the CLIP delta space, the DeltaEdit network is designed to map the CLIP visual features differences to the editing directions of StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the StyleGAN's editing directions from the differences of the CLIP textual features. In this way, DeltaEdit is trained in a text-free manner. Once trained, it can well generalize to various text prompts for zero-shot inference without bells and whistles. Code is available at https://github.com/Yueming6568/DeltaEdit.

READ FULL TEXT
research
07/13/2023

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

We present a novel methodology aimed at optimizing the application of fr...
research
12/09/2021

HairCLIP: Design Your Hair by Text and Reference Image

Hair editing is an interesting and challenging problem in computer visio...
research
11/26/2021

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

To achieve disentangled image manipulation, previous works depend heavil...
research
06/02/2022

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

Text-guided image editing models have shown remarkable results. However,...
research
07/06/2022

Towards Counterfactual Image Manipulation via CLIP

Leveraging StyleGAN's expressivity and its disentangled latent codes, ex...
research
02/05/2023

Divide and Compose with Score Based Generative Models

While score based generative models, or diffusion models, have found suc...
research
09/15/2023

AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness

Online advertisements are important elements in e-commerce sites, social...

Please sign up or login with your details

Forgot password? Click here to reset