An Effective End-to-End Modeling Approach for Mispronunciation Detection

05/18/2020
by   Tien-Hong Lo, et al.
0

Recently, end-to-end (E2E) automatic speech recognition (ASR) systems have garnered tremendous attention because of their great success and unified modeling paradigms in comparison to conventional hybrid DNN-HMM ASR systems. Despite the widespread adoption of E2E modeling frameworks on ASR, there still is a dearth of work on investigating the E2E frameworks for use in computer-assisted pronunciation learning (CAPT), particularly for Mispronunciation detection (MD). In response, we first present a novel use of hybrid CTCAttention approach to the MD task, taking advantage of the strengths of both CTC and the attention-based model meanwhile getting around the need for phone-level forced alignment. Second, we perform input augmentation with text prompt information to make the resulting E2E model more tailored for the MD task. On the other hand, we adopt two MD decision methods so as to better cooperate with the proposed framework: 1) decision-making based on a recognition confidence measure or 2) simply based on speech recognition results. A series of Mandarin MD experiments demonstrate that our approach not only simplifies the processing pipeline of existing hybrid DNN-HMM systems but also brings about systematic and substantial performance improvements. Furthermore, input augmentation with text prompts seems to hold excellent promise for the E2E-based MD approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
10/31/2018

Towards End-to-End Code-Switching Speech Recognition

Code-switching speech recognition has attracted an increasing interest r...
research
12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
research
05/14/2020

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Data augmentation is one of the most effective ways to make end-to-end a...
research
05/25/2020

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Mispronunciation detection and diagnosis (MDD) is a core component of co...
research
04/17/2021

A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

Recently, end-to-end mispronunciation detection and diagnosis (MD D) s...
research
05/08/2019

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation

We present state-of-the-art automatic speech recognition (ASR) systems e...
research
10/17/2021

Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms

Recently, end-to-end (E2E) models, which allow to take spectral vector s...

Please sign up or login with your details

Forgot password? Click here to reset