An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

by   Bi-Cheng Yan, et al.

Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end- to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better mispronunciation detection and diagnosis feedback. Furthermore, a novel transfer-learning paradigm is devised to obtain the initial model estimate of the E2E-based MDD system without resource to any phonological rules. Extensive sets of experimental results on the L2-ARCTIC dataset show that our best system can outperform the existing E2E baseline system and pronunciation scoring based method (GOP) in terms of the F1-score, by 11.05



There are no comments yet.


page 1

page 2

page 3

page 4


Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Second language (L2) speech is often labeled with the native, phone cate...

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Ainu is an unwritten language that has been spoken by Ainu people who ar...

Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling

In recent years, end-to-end models have become popular for application i...

An Effective End-to-End Modeling Approach for Mispronunciation Detection

Recently, end-to-end (E2E) automatic speech recognition (ASR) systems ha...

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (AS...

Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods

With the acceleration of globalization, more and more people are willing...

A transfer learning based approach for pronunciation scoring

Phone-level pronunciation scoring is a challenging task, with performanc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.