Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

03/17/2021
by   Andrew N. Carr, et al.
12

Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance. We make two main contributions. First, we overcome the main challenges of integrating permutation inversions into an end-to-end training scheme, using recent advances in differentiable ranking. This was heretofore sidestepped by casting the reordering task as classification, fundamentally reducing the space of permutations that can be exploited. Our experiments validate that learning from all possible permutations improves the quality of the pre-trained representations over using a limited, fixed set. Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion. In particular, we improve instrument classification and pitch estimation of musical notes by reordering spectrogram patches in the time-frequency space.

READ FULL TEXT

page 1

page 2

research
03/29/2022

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural ...
research
01/05/2022

Self-Supervised Beat Tracking in Musical Signals with Polyphonic Contrastive Learning

Annotating musical beats is a very long in tedious process. In order to ...
research
10/28/2022

Spectrograms Are Sequences of Patches

Self-supervised pre-training models have been used successfully in sever...
research
09/05/2023

PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

In this paper, we address the problem of pitch estimation using Self Sup...
research
12/04/2019

Learnt dynamics generalizes across tasks, datasets, and populations

Differentiating multivariate dynamic signals is a difficult learning pro...
research
03/28/2016

Sparse Activity and Sparse Connectivity in Supervised Learning

Sparseness is a useful regularizer for learning in a wide range of appli...
research
06/17/2022

Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency

Pre-training on time series poses a unique challenge due to the potentia...

Please sign up or login with your details

Forgot password? Click here to reset