Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

02/21/2023
by   Wei Liu, et al.
0

Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly measure the deviation of non-native production from its native reference for pronunciation assessment. Specifically, the deviation is first estimated by the cosine similarity between reference phone embedding and corresponding acoustic embedding. Next, a phone-level Goodness of pronunciation (GOP) pre-training stage is introduced to guide this similarity-based learning for better initialization of the aforementioned two embeddings. Finally, a transformer-based hierarchical pronunciation scorer is used to map a sequence of phone embeddings, acoustic embeddings along with their similarity measures to predict the final utterance-level score. Experimental results on the non-native databases suggest that the proposed system significantly outperforms the baselines, where the acoustic and phone embeddings are simply added or concatenated. A further examination shows that the phone embeddings learned in the proposed approach are able to capture linguistic-acoustic attributes of native pronunciation as reference.

READ FULL TEXT
research
09/05/2020

A multi-view approach for Mandarin non-native mispronunciation verification

Traditionally, the performance of non-native mispronunciation verificati...
research
04/07/2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-leve...
research
03/01/2022

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information

Deep learning-based pronunciation scoring models highly rely on the avai...
research
10/14/2021

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD D) research approach...
research
05/19/2020

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a n...
research
10/16/2019

Contextual Joint Factor Acoustic Embeddings

Embedding acoustic information into fixed length representations is of i...
research
05/07/2019

Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

For researching effects of gamification in foreign language learning for...

Please sign up or login with your details

Forgot password? Click here to reset