A multi-view approach for Mandarin non-native mispronunciation verification

09/05/2020
by   Zhenyu Wang, et al.
0

Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. Here, models are jointly learned to embed acoustic sequence and multi-source information for speech attributes and bottleneck features. Bidirectional LSTM embedding models with contrastive losses are used to map acoustic sequences and multi-source information into fixed-dimensional embeddings. The distance between acoustic embeddings is taken as the similarity between phones. Accordingly, examples of mispronounced phones are expected to have a small similarity score with their canonical pronunciations. The approach shows improvement over GOP-based approach by +11.23 mispronunciation verification task.

READ FULL TEXT
research
11/14/2016

Multi-view Recurrent Neural Acoustic Word Embeddings

Recent work has begun exploring neural acoustic word embeddings---fixed-...
research
02/21/2023

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

Recent studies on pronunciation scoring have explored the effect of intr...
research
02/16/2018

Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition

This paper explores the use of multi-view features and their discriminat...
research
10/01/2019

Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

Acoustic word embeddings — fixed-dimensional vector representations of a...
research
03/30/2022

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are discriminative representations of sp...
research
06/30/2020

Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

Acoustic models in real-time speech recognition systems typically stack ...
research
03/01/2022

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information

Deep learning-based pronunciation scoring models highly rely on the avai...

Please sign up or login with your details

Forgot password? Click here to reset