Self-Training for End-to-End Speech Recognition

09/19/2019
by   Jacob Kahn, et al.
0

We revisit self-training in the context of end-to-end speech recognition. We demonstrate that training with pseudo-labels can substantially improve the accuracy of a baseline model by leveraging unlabelled data. Key to our approach are a strong baseline acoustic and language model used to generate the pseudo-labels, a robust and stable beam-search decoder, and a novel ensemble approach used to increase pseudo-label diversity. Experiments on the LibriSpeech corpus show that self-training with a single model can yield a 21 relative WER improvement on clean data over a baseline trained on 100 hours of labelled data. We also evaluate label filtering approaches to increase pseudo-label quality. With an ensemble of six models in conjunction with label filtering, self-training yields a 26 the gap between the baseline and an oracle model trained with all of the labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

In the recent trend of semi-supervised speech recognition, both self-sup...
research
06/03/2020

Self-Training for End-to-End Speech Translation

One of the main challenges for end-to-end speech translation is data sca...
research
01/24/2020

Semi-supervised ASR by End-to-end Self-training

While deep learning based end-to-end automatic speech recognition (ASR) ...
research
08/14/2023

O-1: Self-training with Oracle and 1-best Hypothesis

We introduce O-1, a new self-training objective to reduce training bias ...
research
03/29/2022

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Current leading mispronunciation detection and diagnosis (MDD) systems a...
research
05/29/2020

Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

The Sparsespeech model is an unsupervised acoustic model that can genera...
research
12/20/2022

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity f...

Please sign up or login with your details

Forgot password? Click here to reset