StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis

03/29/2022
by   Zhiheng Li, et al.
0

Although progress has been made for text-to-image synthesis, previous methods fall short of generalizing to unseen or underrepresented attribute compositions in the input text. Lacking compositionality could have severe implications for robustness and fairness, e.g., inability to synthesize the face images of underrepresented demographic groups. In this paper, we introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis. Specifically, we propose a CLIP-guided Contrastive Loss to better distinguish different compositions among different sentences. To further improve the compositionality, we design a novel Semantic Matching Loss and a Spatial Constraint to identify attributes' latent directions for intended spatial region manipulations, leading to better disentangled latent representations of attributes. Based on the identified latent directions of attributes, we propose Compositional Attribute Adjustment to adjust the latent code, resulting in better compositionality of image synthesis. In addition, we leverage the ℓ_2-norm regularization of identified latent directions (norm penalty) to strike a nice balance between image-text alignment and image fidelity. In the experiments, we devise a new dataset split and an evaluation metric to evaluate the compositionality of text-to-image synthesis models. The results show that StyleT2I outperforms previous approaches in terms of the consistency between the input text and synthesized images and achieves higher fidelity.

READ FULL TEXT

page 4

page 7

page 8

page 13

page 15

page 16

page 17

research
01/04/2023

Attribute-Centric Compositional Text-to-Image Generation

Despite the recent impressive breakthroughs in text-to-image generation,...
research
11/29/2021

Latent Transformations via NeuralODEs for GAN-based Image Editing

Recent advances in high-fidelity semantic image editing heavily rely on ...
research
03/29/2018

Towards Open-Set Identity Preserving Face Synthesis

We propose a framework based on Generative Adversarial Networks to disen...
research
06/01/2023

Learning Disentangled Prompts for Compositional Image Synthesis

We study domain-adaptive image synthesis, the problem of teaching pretra...
research
03/25/2021

Scaling-up Disentanglement for Image Translation

Image translation methods typically aim to manipulate a set of labeled a...
research
06/17/2023

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images

We address the problem of photorealistic 3D face avatar synthesis from s...
research
05/18/2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Existing automatic evaluation on text-to-image synthesis can only provid...

Please sign up or login with your details

Forgot password? Click here to reset