Equivariant Self-Supervision for Musical Tempo Estimation

09/03/2022
by   Elio Quinton, et al.
1

Self-supervised methods have emerged as a promising avenue for representation learning in the recent years since they alleviate the need for labeled datasets, which are scarce and expensive to acquire. Contrastive methods are a popular choice for self-supervision in the audio domain, and typically provide a learning signal by forcing the model to be invariant to some transformations of the input. These methods, however, require measures such as negative sampling or some form of regularisation to be taken to prevent the model from collapsing on trivial solutions. In this work, instead of invariance, we propose to use equivariance as a self-supervision signal to learn audio tempo representations from unlabelled data. We derive a simple loss function that prevents the network from collapsing on a trivial solution during training, without requiring any form of regularisation or negative sampling. Our experiments show that it is possible to learn meaningful representations for tempo estimation by solely relying on equivariant self-supervision, achieving performance comparable with supervised methods on several benchmarks. As an added benefit, our method only requires moderate compute resources and therefore remains accessible to a wide research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Does Visual Self-Supervision Improve Learning of Speech Representations?

Self-supervised learning has attracted plenty of recent research interes...
research
01/31/2022

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Large scale databases with high-quality manual annotations are scarce in...
research
06/16/2022

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

Learning visual representations through self-supervision is an extremely...
research
10/15/2020

Representation Learning via Invariant Causal Mechanisms

Self-supervised learning has emerged as a strategy to reduce the relianc...
research
10/22/2020

A Framework for Contrastive and Generative Learning of Audio Representations

In this paper, we present a framework for contrastive learning for audio...
research
10/25/2019

SPICE: Self-supervised Pitch Estimation

We propose a model to estimate the fundamental frequency in monophonic a...
research
05/23/2020

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

We propose a sequential variational autoencoder to learn disentangled re...

Please sign up or login with your details

Forgot password? Click here to reset