Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training

07/01/2022
by   Mitchell DeHaven, et al.
0

Self-supervised Transformer based models, such as wav2vec 2.0 and HuBERT, have produced significant improvements over existing approaches to automatic speech recognition (ASR). This is evident in the performance of the wav2vec 2.0 based pretrained XLSR-53 model across many languages when fine-tuned with available labeled data. However, the performance from finetuning these models can be dependent on the amount of in-language or similar-to-in-language data included in the pretraining dataset. In this paper we investigate continued pretraining (CoPT) with unlabeled in-language audio data on the XLSR-53 pretrained model in several low-resource languages. CoPT is more computationally efficient than semi-supervised training (SST), the standard approach of utilizing unlabeled data in ASR, since it omits the need for pseudo-labeling of the unlabeled data. We show CoPT results in word error rates (WERs), equal to or slightly better than using SST. In addition, we show that using the CoPT model for pseudo-labeling, and using these labels in SST, results in further improvements in WER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2020

Iterative Pseudo-Labeling for Speech Recognition

Pseudo-labeling has recently shown promise in end-to-end automatic speec...
research
10/30/2021

Pseudo-Labeling for Massively Multilingual Speech Recognition

Semi-supervised learning through pseudo-labeling has become a staple of ...
research
05/19/2023

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

We improve low-resource ASR by integrating the ideas of multilingual tra...
research
06/02/2021

Improving low-resource ASR performance with untranscribed out-of-domain data

Semi-supervised training (SST) is a common approach to leverage untransc...
research
02/12/2022

Wav2Vec2.0 on the Edge: Performance Evaluation

Wav2Vec2.0 is a state-of-the-art model which learns speech representatio...
research
05/16/2020

Large scale weakly and semi-supervised learning for low-resource video ASR

Many semi- and weakly-supervised approaches have been investigated for o...
research
05/06/2022

Revisiting Pretraining for Semi-Supervised Learning in the Low-Label Regime

Semi-supervised learning (SSL) addresses the lack of labeled data by exp...

Please sign up or login with your details

Forgot password? Click here to reset