DeepAI AI Chat
Log In Sign Up

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

01/30/2023
by   Ming Tao, et al.
0

Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the image synthesis process heavily. 3) The synthesized visual features are difficult to control and require delicately designed prompts. To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP. GALIP leverages the powerful pretrained CLIP model both in the discriminator and generator. Specifically, we propose a CLIP-based discriminator. The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality. Furthermore, we propose a CLIP-empowered generator that induces the visual concepts from CLIP through bridge features and prompts. The CLIP-integrated generator and discriminator boost training efficiency, and as a result, our model only requires about 3 parameters, achieving comparable results to large pretrained autoregressive and diffusion models. Moreover, our model achieves 120 times faster synthesis speed and inherits the smooth latent space from GAN. The extensive experimental results demonstrate the excellent performance of our GALIP. Code is available at https://github.com/tobran/GALIP.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 8

page 9

11/03/2021

Discriminator Synthesis: On reusing the other half of Generative Adversarial Networks

Generative Adversarial Networks have long since revolutionized the world...
12/09/2021

A Shared Representation for Photorealistic Driving Simulators

A powerful simulator highly decreases the need for real-world tests when...
03/22/2022

Cross-View Panorama Image Synthesis

In this paper, we tackle the problem of synthesizing a ground-view panor...
04/13/2021

IMAGINE: Image Synthesis by Image-Guided Model Inversion

We introduce an inversion based method, denoted as IMAge-Guided model IN...
10/11/2022

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

We propose AudioStyleGAN (ASGAN), a new generative adversarial network (...
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...
01/23/2023

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Text-to-image synthesis has recently seen significant progress thanks to...