Towards Open-World Text-Guided Face Image Generation and Manipulation

04/18/2021
by   Weihao Xia, et al.
0

The existing text-guided image synthesis methods can only produce limited quality results with at most resolution and the textual instructions are constrained in a small Corpus. In this work, we propose a unified framework for both face image generation and manipulation that produces diverse and high-quality images with an unprecedented resolution at 1024 from multimodal inputs. More importantly, our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing. To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model. Our proposed paradigm includes two novel strategies. The first strategy is to train a text encoder to obtain latent codes that align with the hierarchically semantic of the aforementioned pretrained GAN model. The second strategy is to directly optimize the latent codes in the latent space of the pretrained GAN model with guidance from a pretrained language model. The latent codes can be randomly sampled from a prior distribution or inverted from a given image, which provides inherent supports for both image generation and manipulation from multi-modal inputs, such as sketches or semantic labels, with textual guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 9

page 11

page 12

page 13

research
12/06/2020

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

In this work, we propose TediGAN, a novel framework for multi-modal imag...
research
08/03/2021

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis

This paper investigates an open research task of text-to-image synthesis...
research
09/21/2023

TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training

Text-guided image generation aimed to generate desired images conditione...
research
03/08/2021

InFillmore: Neural Frame Lexicalization for Narrative Text Infilling

We propose a structured extension to bidirectional-context conditional l...
research
08/31/2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Generating 3D faces from textual descriptions has a multitude of applica...
research
11/14/2022

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Text-guided image generation has progressed rapidly in recent years, ins...
research
09/18/2023

Progressive Text-to-Image Diffusion with Soft Latent Direction

In spite of the rapidly evolving landscape of text-to-image generation, ...

Please sign up or login with your details

Forgot password? Click here to reset