Weakly-supervised word-level pronunciation error detection in non-native English speech

06/07/2021
by   Daniel Korzekwa, et al.
0

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced L2 speech, the model is more likely to overfit. To limit this risk, we train it in a multi-task setup. In the first task, we estimate the probabilities of word-level mispronunciation. For the second task, we use a phoneme recognizer trained on phonetically transcribed L1 speech that is easily accessible and can be automatically annotated. Compared to state-of-the-art approaches, we improve the accuracy of detecting word-level pronunciation errors in AUC metric by 30 on the GUT Isle Corpus of L2 Polish speakers, and by 21.5 of L2 German and Italian speakers.

READ FULL TEXT
research
07/02/2022

Computer-assisted Pronunciation Training – Speech synthesis is almost all you need

The research community has long studied computer-assisted pronunciation ...
research
06/14/2022

Frequency-centroid features for word recognition of non-native English speakers

The objective of this work is to investigate complementary features whic...
research
09/13/2022

Automated detection of pronunciation errors in non-native English speech employing deep learning

Despite significant advances in recent years, the existing Computer-Assi...
research
04/03/2021

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

This paper introduces a new open-source speech corpus named "speechocean...
research
12/20/2014

Weakly Supervised Multi-Embeddings Learning of Acoustic Models

We trained a Siamese network with multi-task same/different information ...
research
11/25/2020

Neural Representations for Modeling Variation in English Speech

Variation in speech is often represented and investigated using phonetic...
research
06/22/2022

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

Finding word boundaries in continuous speech is challenging as there is ...

Please sign up or login with your details

Forgot password? Click here to reset