Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

01/05/2022
by   Dorian Desblancs, et al.
35

Annotating musical beats is a very long in tedious process. In order to combat this problem, we present a new self-supervised learning pretext task for beat tracking and downbeat estimation. This task makes use of Spleeter, an audio source separation model, to separate a song's drums from the rest of its signal. The first set of signals are used as positives, and by extension negatives, for contrastive learning pre-training. The drum-less signals, on the other hand, are used as anchors. When pre-training a fully-convolutional and recurrent model using this pretext task, an onset function is learned. In some cases, this function was found to be mapped to periodic elements in a song. We found that pre-trained models outperformed randomly initialized models when a beat tracking training set was extremely small (less than 10 examples). When that was not the case, pre-training led to a learning speed-up that caused the model to overfit to the training set. More generally, this work defines new perspectives in the realm of musical self-supervised learning. It is notably one of the first works to use audio source separation as a fundamental component of self-supervision.

READ FULL TEXT

page 10

page 14

page 15

page 29

research
03/17/2021

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

Self-supervised pre-training using so-called "pretext" tasks has recentl...
research
03/17/2021

Contrastive Learning of Musical Representations

While supervised learning has enabled great advances in many areas of mu...
research
09/21/2023

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

Linking sheet music images to audio recordings remains a key problem for...
research
10/28/2022

Spectrograms Are Sequences of Patches

Self-supervised pre-training models have been used successfully in sever...
research
10/29/2020

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Speech separation has been well-developed while there are still problems...
research
06/16/2021

Drum-Aware Ensemble Architecture for Improved Joint Musical Beat and Downbeat Tracking

This paper presents a novel system architecture that integrates blind so...
research
03/26/2021

Modeling the Compatibility of Stem Tracks to Generate Music Mashups

A music mashup combines audio elements from two or more songs to create ...

Please sign up or login with your details

Forgot password? Click here to reset