Cross-Modal Music-Video Recommendation: A Study of Design Choices

04/30/2021
by   Laure Prétet, et al.
0

In this work, we study music/video cross-modal recommendation, i.e. recommending a music track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. More precisely, we jointly learn audio and video embeddings by using their co-occurrence in music-video clips. In this work, we build upon a recent video-music retrieval system (the VM-NET), which originally relies on an audio representation obtained by a set of statistics computed over handcrafted features. We demonstrate here that using audio representation learning such as the audio embeddings provided by the pre-trained MuSimNet, OpenL3, MusicCNN or by AudioSet, largely improves recommendations. We also validate the use of the cross-modal triplet loss originally proposed in the VM-NET compared to the binary cross-entropy loss commonly used in self-supervised learning. We perform all our experiments using the Music Video Dataset (MVD).

READ FULL TEXT
research
09/21/2023

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

Linking sheet music images to audio recordings remains a key problem for...
research
06/12/2023

Video-to-Music Recommendation using Temporal Alignment of Segments

We study cross-modal recommendation of music tracks to be used as soundt...
research
02/04/2022

Musical Audio Similarity with Self-supervised Convolutional Neural Networks

We have built a music similarity search engine that lets video producers...
research
11/15/2022

SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss

In this paper, we propose a new paradigm to learn audio features for Mus...
research
08/24/2022

Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

Lyric interpretations can help people understand songs and their lyrics ...
research
12/30/2021

Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning

Could we automatically derive the score of a piano accompaniment based o...
research
12/21/2022

RECAP: Retrieval Augmented Music Captioner

With the prevalence of stream media platforms serving music search and r...

Please sign up or login with your details

Forgot password? Click here to reset