Mispronunciation detection using self-supervised speech representations

07/30/2023
by   Jazmín Vidal, et al.
0

In recent years, self-supervised learning (SSL) models have produced promising results in a variety of speech-processing tasks, especially in contexts of data scarcity. In this paper, we study the use of SSL models for the task of mispronunciation detection for second language learners. We compare two downstream approaches: 1) training the model for phone recognition (PR) using native English data, and 2) training a model directly for the target task using non-native English data. We compare the performance of these two approaches for various SSL representations as well as a representation extracted from a traditional DNN-based speech recognition model. We evaluate the models on L2Arctic and EpaDB, two datasets of non-native speech annotated with pronunciation labels at the phone level. Overall, we find that using a downstream model trained for the target task gives the best performance and that most upstream models perform similarly for the task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2021

Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch

Recent research in speech processing exhibits a growing interest in unsu...
research
04/08/2022

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT...
research
02/23/2023

ProsAudit, a prosodic benchmark for self-supervised speech models

We present ProsAudit, a benchmark in English to assess structural prosod...
research
11/25/2020

Neural Representations for Modeling Variation in English Speech

Variation in speech is often represented and investigated using phonetic...
research
05/31/2022

Do self-supervised speech models develop human-like perception biases?

Self-supervised models for speech processing form representational space...
research
10/12/2020

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

In this paper, we present a data set and methods to compare speech proce...
research
02/01/2020

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Second language (L2) speech is often labeled with the native, phone cate...

Please sign up or login with your details

Forgot password? Click here to reset