Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

08/22/2020
by   Sicheng Zhao, et al.
0

Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matching model using an impractical multi-stage pipeline. In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space. First, we construct a large-scale dataset, termed Image-Music-Emotion-Matching-Net (IMEMNet), with over 140K image-music pairs. Second, we propose cross-modal deep continuous metric learning (CDCML) to learn a shared latent embedding space which preserves the cross-modal similarity relationship in the continuous matching space. Finally, we refine the embedding space by further preserving the single-modal emotion relationship in the VA spaces of both images and music. The metric learning in the embedding space and task regression in the label space are jointly optimized for both cross-modal matching and single-modal VA prediction. The extensive experiments conducted on IMEMNet demonstrate the superiority of CDCML for emotion-based image and music matching as compared to the state-of-the-art approaches.

READ FULL TEXT
research
12/14/2021

Cross-modal Music Emotion Recognition Using Composite Loss-based Embeddings

Most music emotion recognition approaches use one-way classification or ...
research
08/24/2023

Emotion-Aligned Contrastive Learning Between Images and Music

Traditional music search engines rely on retrieval methods that match na...
research
11/26/2021

Emotion Embedding Spaces for Matching Music to Stories

Content creators often use music to enhance their stories, as it can be ...
research
03/22/2023

VMCML: Video and Music Matching via Cross-Modality Lifting

We propose a content-based system for matching video and background musi...
research
03/19/2023

Textless Speech-to-Music Retrieval Using Emotion Similarity

We introduce a framework that recommends music based on the emotions of ...
research
09/10/2021

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

Temporal grounding aims to localize a video moment which is semantically...
research
12/11/2019

deepsing: Generating Sentiment-aware Visual Stories using Cross-modal Music Translation

In this paper we propose a deep learning method for performing attribute...

Please sign up or login with your details

Forgot password? Click here to reset