Self-Supervised Learning of Music-Dance Representation through Explicit-Implicit Rhythm Synchronization

07/07/2022
by   Jiashuo Yu, et al.
0

Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated. Considering the intrinsic alignment between the cadent movement of dancer and music rhythm, we introduce MuDaR, a novel Music-Dance Representation learning framework to perform the synchronization of music and dance rhythms both in explicit and implicit ways. Specifically, we derive the dance rhythms based on visual appearance and motion cues inspired by the music rhythm analysis. Then the visual rhythms are temporally aligned with the music counterparts, which are extracted by the amplitude of sound intensity. Meanwhile, we exploit the implicit coherence of rhythms implied in audio and visual streams by contrastive learning. The model learns the joint embedding by predicting the temporal consistency between audio-visual pairs. The music-dance representation, together with the capability of detecting audio and visual rhythms, can further be applied to three downstream tasks: (a) dance classification, (b) music-dance retrieval, and (c) music-dance retargeting. Extensive experiments demonstrate that our proposed framework outperforms other self-supervised methods by a large margin.

READ FULL TEXT

page 1

page 3

page 8

research
08/13/2020

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

When watching videos, the occurrence of a visual event is often accompan...
research
12/05/2022

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

The deep learning community has witnessed an exponentially growing inter...
research
03/07/2023

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Self-supervised learning (SSL) has recently shown remarkable results in ...
research
09/14/2023

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

The goal of universal audio representation learning is to obtain foundat...
research
11/26/2020

Towards Movement Generation with Audio Features

Sound and movement are closely coupled, particularly in dance. Certain a...
research
02/14/2023

Multi-Source Contrastive Learning from Musical Audio

Contrastive learning constitutes an emerging branch of self-supervised l...
research
06/13/2022

Self-Supervised Representation Learning With MUlti-Segmental Informational Coding (MUSIC)

Self-supervised representation learning maps high-dimensional data into ...

Please sign up or login with your details

Forgot password? Click here to reset