Learning neural audio features without supervision

03/29/2022
by   Sarthak Yadav, et al.
51

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work. The first one explores "learnable frontends", i.e., neural modules that produce a learnable time-frequency representation, to overcome limitations of fixed features. The second one uses self-supervised learning to leverage unprecedented scales of pre-training data. In this work, we study the feasibility of combining both approaches, i.e., pre-training learnable frontend jointly with the main architecture for downstream classification. First, we show that pretraining two previously proposed frontends (SincNet and LEAF) on Audioset drastically improves linear-probe performance over fixed mel-filterbanks, suggesting that learnable time-frequency representations can benefit self-supervised pre-training even more than supervised training. Surprisingly, randomly initialized learnable filterbanks outperform mel-scaled initialization in the self-supervised setting, a counter-intuitive result that questions the appropriateness of strong priors when designing learnable filters. Through exploratory analysis of the learned frontend components, we uncover crucial differences in properties of these frontends when used in a supervised and self-supervised setting, especially the affinity of self-supervised filters to diverge significantly from the mel-scale to model a broader range of frequencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2021

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

Self-supervised pre-training using so-called "pretext" tasks has recentl...
research
10/19/2020

CLAR: Contrastive Learning of Auditory Representations

Learning rich visual representations using contrastive self-supervised l...
research
10/28/2022

Spectrograms Are Sequences of Patches

Self-supervised pre-training models have been used successfully in sever...
research
07/13/2022

Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach

Continual learning (CL) over non-stationary data streams remains one of ...
research
03/18/2023

Content Adaptive Front End For Audio Signal Processing

We propose a learnable content adaptive front end for audio signal proce...
research
09/05/2023

PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

In this paper, we address the problem of pitch estimation using Self Sup...
research
06/17/2022

Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency

Pre-training on time series poses a unique challenge due to the potentia...

Please sign up or login with your details

Forgot password? Click here to reset