Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks

08/20/2022
by   Qingrong Cheng, et al.
0

Text-to-image synthesis aims to generate a photo-realistic and semantic consistent image from a specific text description. The images synthesized by off-the-shelf models usually contain limited components compared with the corresponding image and text description, which decreases the image quality and the textual-visual consistency. To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to strengthen the image quality and semantic consistency. The dual vision-language matching mechanism considers textual-visual matching between the generated image and the corresponding text description, and visual-visual consistent constraints between the synthesized image and the real image. Given a specific text description, VLMGAN* firstly encodes it into textual features and then feeds them to a dual vision-language matching-based generative model to synthesize a photo-realistic and textual semantic consistent image. Besides, the popular evaluation metrics for text-to-image synthesis are borrowed from simple image generation, which mainly evaluates the reality and diversity of the synthesized images. Therefore, we introduce a metric named Vision-Language Matching Score (VLMS) to evaluate the performance of text-to-image synthesis which can consider both the image quality and the semantic consistency between synthesized image and the description. The proposed dual multi-level vision-language matching strategy can be applied to other text-to-image synthesis methods. We implement this strategy on two popular baselines, which are marked with VLMGAN_+AttnGAN and VLMGAN_+DFGAN. The experimental results on two widely-used datasets show that the model achieves significant improvements over other state-of-the-art methods.

READ FULL TEXT

page 1

page 4

page 10

page 12

research
09/19/2023

SideGAN: 3D-Aware Generative Model for Improved Side-View Image Synthesis

While recent 3D-aware generative models have shown photo-realistic image...
research
02/25/2020

CookGAN: Meal Image Synthesis from Ingredients

In this work we propose a new computational framework, based on generati...
research
12/18/2019

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Typical methods for text-to-image synthesis seek to design effective gen...
research
06/27/2022

Perspective (In)consistency of Paint by Text

Type "a sea otter with a pearl earring by Johannes Vermeer" or "a photo ...
research
07/27/2022

Lighting (In)consistency of Paint by Text

Whereas generative adversarial networks are capable of synthesizing high...
research
10/27/2022

Towards Better Text-Image Consistency in Text-to-Image Generation

Generating consistent and high-quality images from given texts is essent...
research
05/18/2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Existing automatic evaluation on text-to-image synthesis can only provid...

Please sign up or login with your details

Forgot password? Click here to reset