Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

10/27/2022
by   Chak-Fai Li, et al.
0

Advances in self-supervised learning have significantly reduced the amount of transcribed audio required for training. However, the majority of work in this area is focused on read speech. We explore limited supervision in the domain of conversational speech. While we assume the amount of in-domain data is limited, we augment the model with open source read speech data. The XLS-R model has been shown to perform well with limited adaptation data and serves as a strong baseline. We use untranscribed data for self-supervised learning and semi-supervised training in an autoregressive encoder-decoder model. We demonstrate that by using the XLS-R model for pseudotranscription, a much smaller autoregressive model can outperform a finetuned XLS-R model when transcribed in-domain data is limited, reducing WER by as much as 8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Does Visual Self-Supervision Improve Learning of Speech Representations?

Self-supervised learning has attracted plenty of recent research interes...
research
04/06/2022

Can Self-Supervised Learning solve the problem of child speech recognition?

Despite recent advancements in deep learning technologies, Child Speech ...
research
03/05/2023

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

Recent work has explored using self-supervised learning (SSL) speech rep...
research
08/23/2023

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Despite recent availability of large transcribed Kinyarwanda speech data...
research
10/29/2021

Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

Recent advances in unsupervised representation learning have demonstrate...
research
10/01/2021

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

Streaming end-to-end speech recognition models have been widely applied ...
research
07/11/2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Self-supervised learning (SSL) speech representations learned from large...

Please sign up or login with your details

Forgot password? Click here to reset