Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

12/01/2021
by   Woncheol Shin, et al.
2

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22 outperforming the VQGAN.

READ FULL TEXT

page 4

page 12

page 13

page 14

page 15

page 16

research
09/10/2020

Modern Methods for Text Generation

Synthetic text generation is challenging and has limited success. Recent...
research
11/22/2021

L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transfo...
research
11/29/2021

Vector Quantized Diffusion Model for Text-to-Image Synthesis

We present the vector quantized diffusion (VQ-Diffusion) model for text-...
research
10/19/2021

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

We study the joint learning of image-to-text and text-to-image generatio...
research
05/22/2023

A Frustratingly Simple Decoding Method for Neural Text Generation

We introduce a frustratingly simple, super efficient and surprisingly ef...
research
10/05/2022

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Recently, vector quantized autoregressive (VQ-AR) models have shown rema...
research
06/30/2023

Stay on topic with Classifier-Free Guidance

Classifier-Free Guidance (CFG) has recently emerged in text-to-image gen...

Please sign up or login with your details

Forgot password? Click here to reset