SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

02/03/2023
by   John Harvill, et al.
0

Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identity, and rhythm. In this work, we introduce a self-supervised approach to disentangle room acoustics from speech and use the acoustic representation on the downstream task of device arbitration. Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce, indicating that our pretraining scheme learns to encode room acoustic information while remaining invariant to other attributes of the speech signal.

READ FULL TEXT

page 2

page 3

research
10/18/2021

Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

Speech representation learning plays a vital role in speech processing. ...
research
05/27/2022

Self-supervised models of audio effectively explain human cortical responses to speech

Self-supervised language models are very effective at predicting high-le...
research
11/23/2021

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

Self-supervised learning algorithms, including BERT and SimCLR, have ena...
research
02/07/2022

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

Speaker recognition, recognizing speaker identities based on voice alone...
research
07/15/2022

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applic...
research
10/04/2022

Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

In recent years, the development of accurate deep keyword spotting (KWS)...
research
10/25/2022

Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19

This work explores speech as a biomarker and investigates the detection ...

Please sign up or login with your details

Forgot password? Click here to reset