Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation

06/15/2022
by   Ye Zhu, et al.
28

Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis. A key desideratum in conditional synthesis is to achieve high correspondence between the conditioning input and generated output. Most existing methods learn such relationships implicitly, by incorporating the prior into the variational lower bound. In this work, we take a different route – we enhance input-output connections by maximizing their mutual information using contrastive learning. To this end, we introduce a Conditional Discrete Contrastive Diffusion (CDCD) loss and design two contrastive diffusion mechanisms to effectively incorporate it into the denoising process. We formulate CDCD by connecting it with the conventional variational objectives. We demonstrate the efficacy of our approach in evaluations with three diverse, multimodal conditional synthesis tasks: dance-to-music generation, text-to-image synthesis, and class-conditioned image synthesis. On each, we achieve state-of-the-art or higher synthesis quality and improve the input-output correspondence. Furthermore, the proposed approach improves the convergence of diffusion models, reducing the number of required diffusion steps by more than 35 inference speed.

READ FULL TEXT

page 2

page 6

page 20

page 21

research
05/28/2023

Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models

Most existing cross-modal generative methods based on diffusion models u...
research
04/25/2023

CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

With growing attention to tabular data these days, the attempt to apply ...
research
01/12/2021

Cross-Modal Contrastive Learning for Text-to-Image Generation

The output of text-to-image synthesis systems should be coherent, clear,...
research
06/12/2021

D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Conditional generative models of high-dimensional images have many appli...
research
09/13/2023

DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation

In the Text-to-speech(TTS) task, the latent diffusion model has excellen...
research
05/08/2023

Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Cross-modal contrastive learning in vision language pretraining (VLP) fa...
research
05/09/2023

Exploiting Pseudo Image Captions for Multimodal Summarization

Cross-modal contrastive learning in vision language pretraining (VLP) fa...

Please sign up or login with your details

Forgot password? Click here to reset