Towards Counterfactual Image Manipulation via CLIP

by   Yingchen Yu, et al.
Max Planck Society
Alibaba Group
Nanyang Technological University

Leveraging StyleGAN's expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives. In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing. Extensive experiments show that our design achieves accurate and realistic editing while driving by target texts with various counterfactual concepts.


page 1

page 4

page 6

page 7

page 8


TD-GEM: Text-Driven Garment Editing Mapper

Language-based fashion image editing allows users to try out variations ...

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

We present CLIP-NeRF, a multi-modal 3D object manipulation method for ne...

HairCLIP: Design Your Hair by Text and Reference Image

Hair editing is an interesting and challenging problem in computer visio...

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Researchers have recently begun exploring the use of StyleGAN-based mode...

Unsupervised Discovery of Semantic Latent Directions in Diffusion Models

Despite the success of diffusion models (DMs), we still lack a thorough ...

DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

Text-driven image manipulation remains challenging in training or infere...

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

This paper tackles text-guided control of StyleGAN for editing garments ...

Please sign up or login with your details

Forgot password? Click here to reset