OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

02/25/2022
by   Zhenxing Zhang, et al.
11

Text-to-image generation intends to automatically produce a photo-realistic image, conditioned on a textual description. It can be potentially employed in the field of art creation, data augmentation, photo-editing, etc. Although many efforts have been dedicated to this task, it remains particularly challenging to generate believable, natural scenes. To facilitate the real-world applications of text-to-image synthesis, we focus on studying the following three issues: 1) How to ensure that generated samples are believable, realistic or natural? 2) How to exploit the latent space of the generator to edit a synthesized image? 3) How to improve the explainability of a text-to-image generation framework? In this work, we constructed two novel data sets (i.e., the Good Bad bird and face data sets) consisting of successful as well as unsuccessful generated samples, according to strict criteria. To effectively and efficiently acquire high-quality images by increasing the probability of generating Good latent codes, we use a dedicated Good/Bad classifier for generated images. It is based on a pre-trained front end and fine-tuned on the basis of the proposed Good Bad data set. After that, we present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator. Furthermore, we develop a background-flattening loss (BFL), to improve the background appearance in the edited image. Subsequently, we introduce linear interpolation analysis between pairs of keywords. This is extended into a similar triangular `linguistic' interpolation in order to take a deep look into what a text-to-image synthesis model has learned within the linguistic embeddings. Our data set is available at https://zenodo.org/record/6283798#.YhkN_ujMI2w.

READ FULL TEXT

page 2

page 7

page 8

page 11

page 12

page 13

page 14

page 15

research
04/27/2022

Optimized latent-code selection for explainable conditional text-to-image GANs

The task of text-to-image generation has achieved remarkable progress du...
research
12/08/2021

InvGAN: Invertible GANs

Generation of photo-realistic images, semantic editing and representatio...
research
10/21/2021

Controllable and Compositional Generation with Latent-Space Energy-Based Models

Controllable generation is one of the key requirements for successful ad...
research
11/14/2022

Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces

Conditional text-to-image generation has seen countless recent improveme...
research
03/06/2023

MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs

Video generation has achieved rapid progress benefiting from high-qualit...
research
07/24/2023

Interpolating between Images with Diffusion Models

One little-explored frontier of image generation and editing is the task...
research
03/25/2023

Indonesian Text-to-Image Synthesis with Sentence-BERT and FastGAN

Currently, text-to-image synthesis uses text encoder and image generator...

Please sign up or login with your details

Forgot password? Click here to reset