Variational Hetero-Encoder Randomized Generative Adversarial Networks for Joint Image-Text Modeling

05/18/2019
by   Hao Zhang, et al.
4

For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN) that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework. VHE randomized GAN (VHE-GAN) encodes an image to decode its associated text, and feeds the variational posterior as the source of randomness into the GAN image generator. We plug three off-the-shelf modules, including a deep topic model, a ladder-structured image encoder, and StackGAN++, into VHE-GAN, which already achieves competitive performance. This further motivates the development of VHE-raster-scan-GAN that generates photo-realistic images in not only a multi-scale low-to-high-resolution manner, but also a hierarchical-semantic coarse-to-fine fashion. By capturing and relating hierarchical semantic and visual concepts with end-to-end training, VHE-raster-scan-GAN achieves state-of-the-art performance in a wide variety of image-text multi-modality learning and generation tasks. PyTorch code is provided.

READ FULL TEXT

page 13

page 14

page 16

page 17

page 18

page 19

page 20

page 21

research
01/12/2021

Cross-Modal Contrastive Learning for Text-to-Image Generation

The output of text-to-image synthesis systems should be coherent, clear,...
research
03/17/2016

Generative Image Modeling using Style and Structure Adversarial Networks

Current generative frameworks use end-to-end learning and generate image...
research
04/17/2022

DR-GAN: Distribution Regularization for Text-to-Image Generation

This paper presents a new Text-to-Image generation model, named Distribu...
research
07/26/2019

VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

In this paper, we propose a novel way to interpret text information by e...
research
03/21/2017

Recurrent Topic-Transition GAN for Visual Paragraph Generation

A natural image usually conveys rich semantic content and can be viewed ...
research
09/03/2022

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

Text-to-image generation aims at generating realistic images which are s...
research
10/23/2019

Divide-and-Conquer Adversarial Learning for High-Resolution Image and Video Enhancement

This paper introduces a divide-and-conquer inspired adversarial learning...

Please sign up or login with your details

Forgot password? Click here to reset