Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

10/14/2021
by   Sangeeta Srivastava, et al.
0

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks. We combine the well-known wav2vec 2.0 framework, which has shown success in self-supervised learning for speech tasks, with parameter-efficient conformer architectures. Our self-supervised pre-training can reduce the need for labeled data by two-thirds. On the AudioSet benchmark, we achieve a mean average precision (mAP) score of 0.415, which is a new state-of-the-art on this dataset through audio-only self-supervised learning. Our fine-tuned conformers also surpass or match the performance of previous systems pre-trained in a supervised way on several downstream tasks. We further discuss the important design considerations for both pre-training and fine-tuning.

READ FULL TEXT
research
11/02/2022

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

We present a new Self-Supervised Learning (SSL) approach to pre-train en...
research
09/28/2022

Audio Barlow Twins: Self-Supervised Audio Representation Learning

The Barlow Twins self-supervised learning objective requires neither neg...
research
05/21/2022

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio pr...
research
06/07/2023

Label Aware Speech Representation Learning For Language Identification

Speech representation learning approaches for non-semantic tasks such as...
research
05/18/2020

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

For self-supervised speech processing, it is crucial to use pretrained m...
research
10/18/2021

Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

Speech representation learning plays a vital role in speech processing. ...
research
04/27/2022

Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

Transformer-based models attain excellent results and generalize well wh...

Please sign up or login with your details

Forgot password? Click here to reset