Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

10/10/2022
by   Wanfeng Zheng, et al.
0

Text-driven image manipulation is developed since the vision-language model (CLIP) has been proposed. Previous work has adopted CLIP to design a text-image consistency-based objective to address this issue. However, these methods require either test-time optimization or image feature cluster analysis for single-mode manipulation direction. In this paper, we manage to achieve inference-time optimization-free diverse manipulation direction mining by bridging CLIP and StyleGAN through Latent Alignment (CSLA). More specifically, our efforts consist of three parts: 1) a data-free training strategy to train latent mappers to bridge the latent space of CLIP and StyleGAN; 2) for more precise mapping, temporal relative consistency is proposed to address the knowledge distribution bias problem among different latent spaces; 3) to refine the mapped latent in s space, adaptive style mixing is also proposed. With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation. Qualitative and quantitative comparisons are made to demonstrate the effectiveness of our method.

READ FULL TEXT

page 7

page 12

page 13

page 14

page 15

page 16

page 17

page 18

research
01/25/2023

Towards Arbitrary Text-driven Image Manipulation via Space Alignment

The recent GAN inversion methods have been able to successfully invert t...
research
11/28/2022

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

In this work, we are dedicated to text-guided image generation and propo...
research
10/05/2022

LDEdit: Towards Generalized Text Guided Image Manipulation via Latent Diffusion Models

Research in vision-language models has seen rapid developments off-late,...
research
09/18/2023

Progressive Text-to-Image Diffusion with Soft Latent Direction

In spite of the rapidly evolving landscape of text-to-image generation, ...
research
02/26/2023

Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance

With the advantages of fast inference and human-friendly flexible manipu...
research
04/12/2023

NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs

StyleGANs are at the forefront of controllable image generation as they ...
research
11/26/2021

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

To achieve disentangled image manipulation, previous works depend heavil...

Please sign up or login with your details

Forgot password? Click here to reset