Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition

10/28/2022
by   Zezhong Jin, et al.
0

Fine tuning self supervised pretrained models using pseudo labels can effectively improve speech recognition performance. But, low quality pseudo labels can misguide decision boundaries and degrade performance. We propose a simple yet effective strategy to filter low quality pseudo labels to alleviate this problem. Specifically, pseudo-labels are produced over the entire training set and filtered via average probability scores calculated from the model output. Subsequently, an optimal percentage of utterances with high probability scores are considered reliable training data with trustworthy labels. The model is iteratively updated to correct the unreliable pseudo labels to minimize the effect of noisy labels. The process above is repeated until unreliable pseudo abels have been adequately corrected. Extensive experiments on LibriSpeech show that these filtered samples enable the refined model to yield more correct predictions, leading to better ASR performances under various experimental settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

Pseudo-labeling (PL) has been shown to be effective in semi-supervised a...
research
10/30/2021

Pseudo-Labeling for Massively Multilingual Speech Recognition

Semi-supervised learning through pseudo-labeling has become a staple of ...
research
03/22/2022

Pseudo Label Is Better Than Human Label

State-of-the-art automatic speech recognition (ASR) systems are trained ...
research
09/18/2023

Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

Recent semi-supervised learning (SSL) methods typically include a filter...
research
03/29/2022

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Current leading mispronunciation detection and diagnosis (MDD) systems a...
research
08/12/2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pse...
research
11/02/2022

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

This paper presents InterMPL, a semi-supervised learning method of end-t...

Please sign up or login with your details

Forgot password? Click here to reset