Computer-assisted Pronunciation Training – Speech synthesis is almost all you need

07/02/2022
by   Daniel Korzekwa, et al.
0

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high accuracy (only 60% precision at 40%-80% recall). One of the key problems is the low availability of mispronounced speech that is needed for the reliable training of pronunciation error detection models. If we had a generative model that could mimic non-native speech and produce any amount of training data, then the task of detecting pronunciation errors would be much easier. We present three innovative techniques based on phoneme-to-phoneme (P2P), text-to-speech (T2S), and speech-to-speech (S2S) conversion to generate correctly pronounced and mispronounced synthetic speech. We show that these techniques not only improve the accuracy of three machine learning models for detecting pronunciation errors but also help establish a new state-of-the-art in the field. Earlier studies have used simple speech generation techniques such as P2P conversion, but only as an additional mechanism to improve the accuracy of pronunciation error detection. We, on the other hand, consider speech generation to be the first-class method of detecting pronunciation errors. The effectiveness of these techniques is assessed in the tasks of detecting pronunciation and lexical stress errors. Non-native English speech corpora of German, Italian, and Polish speakers are used in the evaluations. The best proposed S2S technique improves the accuracy of detecting pronunciation errors in AUC metric by 41% from 0.528 to 0.749 compared to the state-of-the-art approach.

READ FULL TEXT

page 13

page 19

research
09/13/2022

Automated detection of pronunciation errors in non-native English speech employing deep learning

Despite significant advances in recent years, the existing Computer-Assi...
research
06/07/2021

Weakly-supervised word-level pronunciation error detection in non-native English speech

We propose a weakly-supervised model for word-level mispronunciation det...
research
12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...
research
01/16/2021

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

A common approach to the automatic detection of mispronunciation in lang...
research
07/30/2020

Detecting Distrust Towards the Skills of a Virtual Assistant Using Speech

Research has shown that trust is an essential aspect of human-computer i...
research
04/20/2019

Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

Self-imitating feedback is an effective and learner-friendly method for ...
research
11/29/2021

Speech Tasks Relevant to Sleepiness Determined with Deep Transfer Learning

Excessive sleepiness in attention-critical contexts can lead to adverse ...

Please sign up or login with your details

Forgot password? Click here to reset