Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

10/27/2022
by   Qiu-Shi Zhu, et al.
0

Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR). However, the robustness impact of combining the two pre-training tasks and constructing different negative samples for contrastive learning still remains unclear. In this paper, we propose a noise-robust data2vec for self-supervised speech representation learning by jointly optimizing the contrastive learning and regression tasks in the pre-training stage. Furthermore, we present two improved methods to facilitate contrastive learning. More specifically, we first propose to construct patch-based non-semantic negative samples to boost the noise robustness of the pre-training model, which is achieved by dividing the features into patches at different sizes (i.e., so-called negative samples). Second, by analyzing the distribution of positive and negative samples, we propose to remove the easily distinguishable negative samples to improve the discriminative capacity for pre-training models. Experimental results on the CHiME-4 dataset show that our method is able to improve the performance of the pre-trained model in noisy scenarios. We find that joint training of the contrastive learning and regression tasks can avoid the model collapse to some extent compared to only training the regression task.

READ FULL TEXT
research
01/22/2022

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Wav2vec2.0 is a popular self-supervised pre-training framework for learn...
research
03/27/2022

CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive Learning

As a representative self-supervised method, contrastive learning has ach...
research
06/11/2021

Hybrid Generative-Contrastive Representation Learning

Unsupervised representation learning has recently received lots of inter...
research
12/01/2022

Language Model Pre-training on True Negatives

Discriminative pre-trained language models (PLMs) learn to predict origi...
research
09/17/2020

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Video Question Answering (Video QA) requires fine-grained understanding ...
research
11/06/2021

Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words

Trigger-word detection plays an important role as the entry point of use...
research
12/15/2022

Edema Estimation From Facial Images Taken Before and After Dialysis via Contrastive Multi-Patient Pre-Training

Edema is a common symptom of kidney disease, and quantitative measuremen...

Please sign up or login with your details

Forgot password? Click here to reset