Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

04/15/2022
by   Hyungyung Lee, et al.
0

Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.

READ FULL TEXT

page 9

page 10

research
12/11/2019

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of ...
research
05/17/2023

What You See is What You Read? Improving Text-Image Alignment Evaluation

Automatically determining whether a text and a corresponding image are s...
research
03/21/2023

MAGVLT: Masked Generative Vision-and-Language Transformer

While generative modeling on multimodal image-text data has been activel...
research
10/19/2021

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

We study the joint learning of image-to-text and text-to-image generatio...
research
09/07/2022

Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit

Multimodal Variational Autoencoders (VAEs) have been a subject of intens...
research
06/26/2023

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models

Text-to-image diffusion models have advanced towards more controllable g...
research
04/18/2022

Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport

Recent advances in deep learning, such as powerful generative models and...

Please sign up or login with your details

Forgot password? Click here to reset