CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

07/17/2023
by   Ahmet Canberk Baykal, et al.
0

Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the CLIP embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.

READ FULL TEXT

page 23

page 24

page 25

page 29

page 30

page 31

page 32

page 33

research
01/25/2023

Towards Arbitrary Text-driven Image Manipulation via Space Alignment

The recent GAN inversion methods have been able to successfully invert t...
research
05/29/2023

TD-GEM: Text-Driven Garment Editing Mapper

Language-based fashion image editing allows users to try out variations ...
research
03/28/2023

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capac...
research
10/02/2022

ManiCLIP: Multi-Attribute Face Manipulation from Text

In this paper we present a novel multi-attribute face manipulation metho...
research
07/06/2022

Towards Counterfactual Image Manipulation via CLIP

Leveraging StyleGAN's expressivity and its disentangled latent codes, ex...
research
05/26/2023

StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

This paper tackles text-guided control of StyleGAN for editing garments ...
research
08/23/2023

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Text-driven localized editing of 3D objects is particularly difficult as...

Please sign up or login with your details

Forgot password? Click here to reset