Cross-Modal Contrastive Representation Learning for Audio-to-Image Generation

07/20/2022
by   HaeChun Chung, et al.
0

Multiple modalities for certain information provide a variety of perspectives on that information, which can improve the understanding of the information. Thus, it may be crucial to generate data of different modality from the existing data to enhance the understanding. In this paper, we investigate the cross-modal audio-to-image generation problem and propose Cross-Modal Contrastive Representation Learning (CMCRL) to extract useful features from audios and use it in the generation phase. Experimental results show that CMCRL enhances quality of images generated than previous research.

READ FULL TEXT
research
07/01/2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross...
research
09/02/2017

XFlow: 1D-2D Cross-modal Deep Neural Networks for Audiovisual Classification

We propose two multimodal deep learning architectures that allow for cro...
research
09/27/2021

Audio-to-Image Cross-Modal Generation

Cross-modal representation learning allows to integrate information from...
research
06/24/2022

Contrastive Learning of Features between Images and LiDAR

Image and Point Clouds provide different information for robots. Finding...
research
12/30/2021

Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning

Could we automatically derive the score of a piano accompaniment based o...
research
06/13/2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

Cross-modal correlation provides an inherent supervision for video unsup...
research
07/03/2019

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Since we were babies, we intuitively develop the ability to correlate th...

Please sign up or login with your details

Forgot password? Click here to reset