End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

06/05/2023
by   Yukang Liang, et al.
0

Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propose a simple yet effective method, namely Masked pre-training for Pronunciation Assessment (MPA). Specifically, by incorporating a mask-predict strategy, our MPA supports end-to-end training without leveraging any aligning components and can solve misalignment issues to a large extent during prediction. Furthermore, we design two evaluation strategies to enable our model to conduct assessments in both unsupervised and supervised settings. Experimental results on SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. In spite of this, MPA still has some limitations, such as requiring more inference time and reference text. They expect to be addressed in future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2022

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

We present a simple yet effective end-to-end Video-language Pre-training...
research
09/17/2019

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

End-to-end speech translation, a hot topic in recent years, aims to tran...
research
02/02/2019

Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models

Acoustic-to-word (A2W) models that allow direct mapping from acoustic si...
research
07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...
research
12/09/2020

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

Automatic song writing aims to compose a song (lyric and/or melody) by m...
research
06/09/2023

WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

Most existing word alignment methods rely on manual alignment datasets o...

Please sign up or login with your details

Forgot password? Click here to reset