Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

by   Jiatong Shi, et al.

Mispronunciation detection is an essential component of the Computer-Assisted Pronunciation Training (CAPT) systems. State-of-the-art mispronunciation detection models use Deep Neural Networks (DNN) for acoustic modeling, and a Goodness of Pronunciation (GOP) based algorithm for pronunciation scoring. However, GOP based scoring models have two major limitations: i.e., (i) They depend on forced alignment which splits the speech into phonetic segments and independently use them for scoring, which neglects the transitions between phonemes within the segment; (ii) They only focus on phonetic segments, which fails to consider the context effects across phonemes (such as liaison, omission, incomplete plosive sound, etc.). In this work, we propose the Context-aware Goodness of Pronunciation (CaGOP) scoring model. Particularly, two factors namely the transition factor and the duration factor are injected into CaGOP scoring. The transition factor identifies the transitions between phonemes and applies them to weight the frame-wise GOP. Moreover, a self-attention based phonetic duration modeling is proposed to introduce the duration factor into the scoring model. The proposed scoring model significantly outperforms baselines, achieving 20 and 12 sentence-level mispronunciation detection respectively.



page 1

page 2

page 3

page 4


Selective Attention for Context-aware Neural Machine Translation

Despite the progress made in sentence-level NMT, current systems still f...

Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods

With the acceleration of globalization, more and more people are willing...

Context-Aware Learning to Rank with Self-Attention

In learning to rank, one is interested in optimising the global ordering...

Analysis of Multivariate Scoring Functions for Automatic Unbiased Learning to Rank

Leveraging biased click data for optimizing learning to rank systems has...

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Sentence scoring aims at measuring the likelihood score of a sentence an...

CASE: Context-Aware Semantic Expansion

In this paper, we define and study a new task called Context-Aware Seman...

Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis

End-to-end (E2E) neural modeling has emerged as one predominant school o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.