On the Role of Visual Context in Enriching Music Representations

10/28/2022
by   Kleanthis Avramidis, et al.
0

Human perception and experience of music is highly context-dependent. Contextual variability contributes to differences in how we interpret and interact with music, challenging the design of robust models for information retrieval. Incorporating multimodal context from diverse sources provides a promising approach toward modeling this variability. Music presented in media such as movies and music videos provide rich multimodal context that modulates underlying human experiences. However, such context modeling is underexplored, as it requires large amounts of multimodal data along with relevant annotations. Self-supervised learning can help address these challenges by automatically extracting rich, high-level correspondences between different modalities, hence alleviating the need for fine-grained annotations at scale. In this study, we propose VCMR – Video-Conditioned Music Representations, a contrastive learning framework that learns music representations from audio and the accompanying music videos. The contextual visual information enhances representations of music audio, as evaluated on the downstream task of music tagging. Experimental results show that the proposed framework can contribute additive robustness to audio representations and indicates to what extent musical elements are affected or determined by visual context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2022

Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation

The common research goal of self-supervised learning is to extract a gen...
research
09/01/2023

Towards Contrastive Learning in Music Video Domain

Contrastive learning is a powerful way of learning multimodal representa...
research
06/14/2022

It's Time for Artistic Correspondence in Music and Video

We present an approach for recommending a music track for a given video,...
research
03/08/2022

Skating-Mixer: Multimodal MLP for Scoring Figure Skating

Figure skating scoring is a challenging task because it requires judging...
research
02/14/2023

Multi-Source Contrastive Learning from Musical Audio

Contrastive learning constitutes an emerging branch of self-supervised l...
research
01/06/2023

Multimodal Lyrics-Rhythm Matching

Despite the recent increase in research on artificial intelligence for m...
research
08/03/2023

The virtual drum circle: polyrhythmic music interactions in extended reality

Emerging technologies in the domain of extended reality offer rich, new ...

Please sign up or login with your details

Forgot password? Click here to reset