Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

03/29/2022
by   Mu Yang, et al.
2

Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels gains a 5.35 2.48 baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution achieves a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2021

Pseudo-Labeling for Massively Multilingual Speech Recognition

Semi-supervised learning through pseudo-labeling has become a staple of ...
research
04/26/2023

Improving Conversational Passage Re-ranking with View Ensemble

This paper presents ConvRerank, a conversational passage re-ranker that ...
research
06/03/2020

Self-Training for End-to-End Speech Translation

One of the main challenges for end-to-end speech translation is data sca...
research
03/10/2023

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

In this paper, we introduce UnFuSeD, a novel approach to leverage self-s...
research
10/08/2021

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

In the recent trend of semi-supervised speech recognition, both self-sup...
research
09/19/2019

Self-Training for End-to-End Speech Recognition

We revisit self-training in the context of end-to-end speech recognition...
research
10/28/2022

Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition

Fine tuning self supervised pretrained models using pseudo labels can ef...

Please sign up or login with your details

Forgot password? Click here to reset