A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

04/17/2021
by   Kaiqi Fu, et al.
0

Recently, end-to-end mispronunciation detection and diagnosis (MD D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments on L2-ARCTIC, and our best performance improved from 49.29 CNN-RNN-CTC model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Mispronunciation detection and diagnosis (MDD) technology is a key compo...
research
09/12/2021

Good-Enough Example Extrapolation

This paper asks whether extrapolating the hidden space distribution of t...
research
05/18/2020

An Effective End-to-End Modeling Approach for Mispronunciation Detection

Recently, end-to-end (E2E) automatic speech recognition (ASR) systems ha...
research
06/26/2022

Data Augmentation for Dementia Detection in Spoken Language

Dementia is a growing problem as our society ages, and detection methods...
research
11/16/2018

AclNet: efficient end-to-end audio classification CNN

We propose an efficient end-to-end convolutional neural network architec...
research
10/21/2020

Controllable Text Simplification with Explicit Paraphrasing

Text Simplification improves the readability of sentences through severa...
research
10/17/2021

Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms

Recently, end-to-end (E2E) models, which allow to take spectral vector s...

Please sign up or login with your details

Forgot password? Click here to reset