deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition

02/28/2023
by   Dianwen Ng, et al.
0

Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world applications. In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow's redundancy-reduction principle. The new framework improves the HuBERT training algorithm by introducing auxiliary losses that drive the self- and cross-correlation matrix between pairwise noise-distorted embeddings towards identity matrix. This encourages the model to produce noise-agnostic speech representations. With this method, we report improved robustness in noisy environments, including unseen noises, without impairing the performance on the clean set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2022

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Wav2vec2.0 is a popular self-supervised pre-training framework for learn...
research
05/26/2022

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

Speech enhancement (SE) is usually required as a front end to improve th...
research
01/03/2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

In speech recognition, it is essential to model the phonetic content of ...
research
07/02/2023

Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

Speech representations learned in a self-supervised fashion from massive...
research
09/14/2022

I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization

Noise robustness in keyword spotting remains a challenge as many models ...
research
11/11/2020

Towards Semi-Supervised Semantics Understanding from Speech

Much recent work on Spoken Language Understanding (SLU) falls short in a...
research
07/15/2013

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

In this paper, a modification to the training process of the popular SPL...

Please sign up or login with your details

Forgot password? Click here to reset