Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition

12/23/2021
by   Changfeng Gao, et al.
0

Self-supervised acoustic pre-training has achieved amazing results on the automatic speech recognition (ASR) task. Most of the successful acoustic pre-training methods use contrastive learning to learn the acoustic representations by distinguish the representations from different time steps, ignoring the speaker and environment robustness. As a result, the pre-trained model could show poor performance when meeting out-of-domain data during fine-tuning. In this letter, we design a novel consistency contrastive learning (CCL) method by utilizing data augmentation for acoustic pre-training. Different kinds of augmentation are applied on the original audios and then the augmented audios are fed into an encoder. The encoder should not only contrast the representations within one audio but also maximize the measurement of the representations across different augmented audios. By this way, the pre-trained model can learn a text-related representation method which is more robust with the change of the speaker or the environment.Experiments show that by applying the CCL method on the Wav2Vec2.0, better results can be realized both on the in-domain data and the out-of-domain data. Especially for noisy out-of-domain data, more than 15

READ FULL TEXT

page 1

page 2

research
03/01/2022

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

Human speech data comprises a rich set of domain factors such as accent,...
research
10/22/2022

Guided contrastive self-supervised pre-training for automatic speech recognition

Contrastive Predictive Coding (CPC) is a representation learning method ...
research
07/12/2020

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

We introduce a self-supervised speech pre-training method called TERA, w...
research
06/29/2022

The THUEE System Description for the IARPA OpenASR21 Challenge

This paper describes the THUEE team's speech recognition system for the ...
research
10/30/2022

Adaptive Speech Quality Aware Complex Neural Network for Acoustic Echo Cancellation with Supervised Contrastive Learning

Acoustic echo cancellation (AEC) is designed to remove echoes, reverbera...
research
10/13/2020

CAPT: Contrastive Pre-Training for LearningDenoised Sequence Representations

Pre-trained self-supervised models such as BERT have achieved striking s...
research
05/30/2021

CLEVE: Contrastive Pre-training for Event Extraction

Event extraction (EE) has considerably benefited from pre-trained langua...

Please sign up or login with your details

Forgot password? Click here to reset