Towards Counterfactual Image Manipulation via CLIP

07/06/2022
by   Yingchen Yu, et al.
0

Leveraging StyleGAN's expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives. In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing. Extensive experiments show that our design achieves accurate and realistic editing while driving by target texts with various counterfactual concepts.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

research
05/29/2023

TD-GEM: Text-Driven Garment Editing Mapper

Language-based fashion image editing allows users to try out variations ...
research
12/09/2021

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

We present CLIP-NeRF, a multi-modal 3D object manipulation method for ne...
research
12/09/2021

HairCLIP: Design Your Hair by Text and Reference Image

Hair editing is an interesting and challenging problem in computer visio...
research
07/17/2023

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Researchers have recently begun exploring the use of StyleGAN-based mode...
research
02/24/2023

Unsupervised Discovery of Semantic Latent Directions in Diffusion Models

Despite the success of diffusion models (DMs), we still lack a thorough ...
research
03/11/2023

DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

Text-driven image manipulation remains challenging in training or infere...
research
05/26/2023

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

This paper tackles text-guided control of StyleGAN for editing garments ...

Please sign up or login with your details

Forgot password? Click here to reset