Towards Contrastive Learning in Music Video Domain

09/01/2023
by   Karel Veldkamp, et al.
0

Contrastive learning is a powerful way of learning multimodal representations across various domains such as image-caption retrieval and audio-visual representation learning. In this work, we investigate if these findings generalize to the domain of music videos. Specifically, we create a dual en-coder for the audio and video modalities and train it using a bidirectional contrastive loss. For the experiments, we use an industry dataset containing 550 000 music videos as well as the public Million Song Dataset, and evaluate the quality of learned representations on the downstream tasks of music tagging and genre classification. Our results indicate that pre-trained networks without contrastive fine-tuning outperform our contrastive learning approach when evaluated on both tasks. To gain a better understanding of the reasons contrastive learning was not successful for music videos, we perform a qualitative analysis of the learned representations, revealing why contrastive learning might have difficulties uniting embeddings from two modalities. Based on these findings, we outline possible directions for future work. To facilitate the reproducibility of our results, we share our code and the pre-trained model.

READ FULL TEXT
research
04/24/2023

Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

In this work, we investigate an approach that relies on contrastive lear...
research
04/19/2023

EC^2: Emergent Communication for Embodied Control

Embodied control requires agents to leverage multi-modal pre-training to...
research
08/05/2023

Bootstrapping Contrastive Learning Enhanced Music Cold-Start Matching

We study a particular matching task we call Music Cold-Start Matching. I...
research
10/28/2022

On the Role of Visual Context in Enriching Music Representations

Human perception and experience of music is highly context-dependent. Co...
research
09/19/2023

Motif-Centric Representation Learning for Symbolic Music

Music motif, as a conceptual building block of composition, is crucial f...
research
09/28/2022

Learning Deep Representations via Contrastive Learning for Instance Retrieval

Instance-level Image Retrieval (IIR), or simply Instance Retrieval, deal...
research
06/06/2023

Systematic Analysis of Music Representations from BERT

There have been numerous attempts to represent raw data as numerical vec...

Please sign up or login with your details

Forgot password? Click here to reset