Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

11/26/2021
by   Zipeng Xu, et al.
0

To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. In this paper, we propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation, which does not need manual annotation and thus is not limited to fixed manipulations. Our method approaches the targets by deeply exploiting the power of the large scale pre-trained vision-language model CLIP. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE framework achieves much better quantitative and qualitative results than the up-to-date StyleCLIP baseline.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
12/09/2021

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

We present CLIP-NeRF, a multi-modal 3D object manipulation method for ne...
research
03/11/2023

DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

Text-driven image manipulation remains challenging in training or infere...
research
01/25/2023

Towards Arbitrary Text-driven Image Manipulation via Space Alignment

The recent GAN inversion methods have been able to successfully invert t...
research
10/02/2022

ManiCLIP: Multi-Attribute Face Manipulation from Text

In this paper we present a novel multi-attribute face manipulation metho...
research
10/10/2022

Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

Text-driven image manipulation is developed since the vision-language mo...
research
08/26/2022

Selective manipulation of disentangled representations for privacy-aware facial image processing

Camera sensors are increasingly being combined with machine learning to ...

Please sign up or login with your details

Forgot password? Click here to reset