CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Image Manipulation

10/08/2022
by   Chenliang Zhou, et al.
0

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy.

READ FULL TEXT

page 18

page 19

page 20

page 21

page 22

page 23

page 24

page 27

research
12/09/2021

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

We present CLIP-NeRF, a multi-modal 3D object manipulation method for ne...
research
03/31/2021

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Inspired by the ability of StyleGAN to generate highly realistic images ...
research
12/15/2021

StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation

Discovering meaningful directions in the latent space of GANs to manipul...
research
08/30/2022

Robust Sound-Guided Image Manipulation

Recent successes suggest that an image can be manipulated by a text prom...
research
12/22/2017

Disentangled Representations for Manipulation of Sentiment in Text

The ability to change arbitrary aspects of a text while leaving the core...
research
07/27/2021

Remember What You have drawn: Semantic Image Manipulation with Memory

Image manipulation with natural language, which aims to manipulate image...
research
11/26/2021

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

To achieve disentangled image manipulation, previous works depend heavil...

Please sign up or login with your details

Forgot password? Click here to reset