Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that a considerable amount of unlabeled data is available for the same domain or language that can be leveraged for SSL pre-training, which we acknowledge is not feasible in a real-world setting. In this paper, as part of the Interspeech Gram Vaani ASR challenge, we try to study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task. We also build on the continued pre-training paradigm to study the effect of prior knowledge possessed by models trained using SSL. Extensive experiments and studies reveal that the performance of ASR systems is susceptible to the data used for SSL pre-training. Their performance improves with an increase in similarity and volume of pre-training data. We believe our work will be helpful to the speech community in building better ASR systems in low-resource settings and steer research towards improving generalization in SSL-based pre-training for speech systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2018

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

We present a simple approach to improve direct speech-to-text translatio...
research
06/10/2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Recent work on speech self-supervised learning (speech SSL) demonstrated...
research
05/23/2022

Informed Pre-Training on Prior Knowledge

When training data is scarce, the incorporation of additional prior know...
research
04/05/2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in ...
research
02/24/2022

Ask2Mask: Guided Data Selection for Masked Speech Modeling

Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn ...
research
03/16/2021

Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning

This paper describes the results of an informal collaboration launched d...
research
04/23/2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Self-Supervised Learning (SSL) using huge unlabeled data has been succes...

Please sign up or login with your details

Forgot password? Click here to reset