Emotion-Aligned Contrastive Learning Between Images and Music

08/24/2023
by   Shanti Stewart, et al.
0

Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few focus on affective qualities. In this work, we address the task of retrieving emotionally-relevant music from image queries by learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We evaluate the joint embeddings through cross-modal retrieval tasks (image-to-music and music-to-image) based on emotion labels. Furthermore, we investigate the generalizability of the learned music embeddings via automatic music tagging. Our experiments show that the proposed approach successfully aligns images and music, and that the learned embedding space is effective for cross-modal retrieval applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2021

Emotion Embedding Spaces for Matching Music to Stories

Content creators often use music to enhance their stories, as it can be ...
research
03/19/2023

Textless Speech-to-Music Retrieval Using Emotion Similarity

We introduce a framework that recommends music based on the emotions of ...
research
04/01/2021

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Modeling various aspects that make a music piece unique is a challenging...
research
03/25/2019

Learning Embodied Semantics via Music and Dance Semiotic Correlations

Music semantics is embodied, in the sense that meaning is biologically m...
research
09/18/2023

Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information

Background music (BGM) can enhance the video's emotion. However, selecti...
research
08/22/2020

Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

Both images and music can convey rich semantics and are widely used to i...
research
08/26/2022

MuLan: A Joint Embedding of Music Audio and Natural Language

Music tagging and content-based retrieval systems have traditionally bee...

Please sign up or login with your details

Forgot password? Click here to reset