Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

03/07/2023
by   Bac Nguyen, et al.
0

Self-supervised learning (SSL) has recently shown remarkable results in closing the gap between supervised and unsupervised learning. The idea is to learn robust features that are invariant to distortions of the input data. Despite its success, this idea can suffer from a collapsing issue where the network produces a constant representation. To this end, we introduce SELFIE, a novel Self-supervised Learning approach for audio representation via Feature Diversity and Decorrelation. SELFIE avoids the collapsing issue by ensuring that the representation (i) maintains a high diversity among embeddings and (ii) decorrelates the dependencies between dimensions. SELFIE is pre-trained on the large-scale AudioSet dataset and its embeddings are validated on nine audio downstream tasks, including speech, music, and sound event recognition. Experimental results show that SELFIE outperforms existing SSL methods in several tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2021

Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations

Improving generalization is a major challenge in audio classification du...
research
09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...
research
05/24/2019

Self-supervised audio representation learning for mobile devices

We explore self-supervised models that can be potentially deployed on mo...
research
03/14/2023

Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Self-supervised learning method that provides generalized speech represe...
research
08/31/2023

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the...
research
03/25/2022

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

Inspired by the recent progress in self-supervised learning for computer...
research
07/07/2022

Self-Supervised Learning of Music-Dance Representation through Explicit-Implicit Rhythm Synchronization

Although audio-visual representation has been proved to be applicable in...

Please sign up or login with your details

Forgot password? Click here to reset