Boosting Cross-Domain Speech Recognition with Self-Supervision

06/20/2022
by   Han Zhu, et al.
0

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach could effectively boost the cross-domain performance and significantly outperform previous approaches.

READ FULL TEXT
research
09/24/2020

Feature Adaptation of Pre-Trained Language Models across Languages and Domains for Text Classification

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains...
research
04/15/2021

Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

End-to-end automatic speech recognition (ASR) can achieve promising perf...
research
12/20/2022

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity f...
research
03/01/2022

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

Human speech data comprises a rich set of domain factors such as accent,...
research
03/19/2018

Acoustic feature learning cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognitio...
research
03/19/2018

Acoustic feature learning using cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognitio...
research
06/01/2023

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

Self-Supervised Learning (SSL) has allowed leveraging large amounts of u...

Please sign up or login with your details

Forgot password? Click here to reset