Improving low-resource ASR performance with untranscribed out-of-domain data

06/02/2021
by   Jayadev Billa, et al.
0

Semi-supervised training (SST) is a common approach to leverage untranscribed/unlabeled speech data to improve automatic speech recognition performance in low-resource languages. However, if the available unlabeled speech is mismatched to the target domain, SST is not as effective, and in many cases performs worse than the original system. In this paper, we address the issue of low-resource ASR when only untranscribed out-of-domain speech data is readily available in the target language. Specifically, we look to improve performance on conversational/telephony speech (target domain) using web resources, in particular YouTube data, which more closely resembles news/topical broadcast data. Leveraging SST, we show that while in some cases simply pooling the out-of-domain data with the training data lowers word error rate (WER), in all cases, we see improvements if we train first with the out-of-domain data and then fine-tune the resulting model with the original training data. Using 2000 hours of speed perturbed YouTube audio in each target language, with semi-supervised transcripts, we show improvements on multiple languages/data sets, of up to 16.3 baseline systems and up to 7.4 simply pools the out-of-domain data with the training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

A Novel Self-training Approach for Low-resource Speech Recognition

In this paper, we propose a self-training approach for automatic speech ...
research
07/01/2022

Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training

Self-supervised Transformer based models, such as wav2vec 2.0 and HuBERT...
research
06/14/2021

Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts

Sequence-to-sequence (seq2seq) models are competitive with hybrid models...
research
04/17/2019

Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

It is an effective way that improves the performance of the existing Aut...
research
09/26/2019

DARTS: Dialectal Arabic Transcription System

We present the speech to text transcription system, called DARTS, for lo...
research
06/25/2018

Robust Feature Clustering for Unsupervised Speech Activity Detection

In certain applications such as zero-resource speech processing or very-...
research
11/09/2022

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Noisy Student Training (NST) has recently demonstrated extremely strong ...

Please sign up or login with your details

Forgot password? Click here to reset