Zero-Shot Automatic Pronunciation Assessment

05/31/2023
by   Hongfu Liu, et al.
0

Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting them via a masking module. We then employ the Transformer encoder and apply k-means clustering to obtain token sequences. Finally, a scoring module is designed to measure the number of wrongly recovered tokens. Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines and outperforms non-regression baselines in terms of Pearson Correlation Coefficient (PCC). Additionally, we analyze how masking strategies affect the performance of APA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

The idea of combining multiple languages' recordings to train a single a...
research
04/08/2022

Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT...
research
05/15/2020

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

We propose a neural network for zero-shot voice conversion (VC) without ...
research
04/13/2023

[CLS] Token is All You Need for Zero-Shot Semantic Segmentation

In this paper, we propose an embarrassingly simple yet highly effective ...
research
10/26/2020

Improving pronunciation assessment via ordinal regression with anchored reference samples

Sentence level pronunciation assessment is important for Computer Assist...
research
08/09/2023

Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data

Background: Speech and language pathologists (SLPs) often relyon judgeme...
research
09/18/2023

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Large self-supervised pre-trained speech models require computationally ...

Please sign up or login with your details

Forgot password? Click here to reset