Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

06/24/2022
by   Hang Ji, et al.
0

In this work, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory features (AF) information and their subsequent prediction of phone recognition performance for within and across language scenarios. Specifically, we compared CPC, wav2vec 2.0, and HuBert. First, frame-level AF probing tasks were implemented. Subsequently, phone-level end-to-end ASR systems for phoneme recognition tasks were implemented, and the performance on the frame-level AF probing task and the phone accuracy were correlated. Compared to the conventional speech representation MFCC, all SSL pre-trained speech representations captured more AF information, and achieved better phoneme recognition performance within and across languages, with HuBert performing best. The frame-level AF probing task is a good predictor of phoneme recognition performance, showing the importance of capturing AF information in the speech representations. Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34.4 lowest PER of 10.2 increase of 26.7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2021

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

Language identification greatly impacts the success of downstream tasks ...
research
09/29/2021

Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch

Recent research in speech processing exhibits a growing interest in unsu...
research
11/09/2021

Membership Inference Attacks Against Self-supervised Speech Models

Recently, adapting the idea of self-supervised learning (SSL) on continu...
research
04/07/2021

Utilizing Self-supervised Representations for MOS Prediction

Speech quality assessment has been a critical issue in speech processing...
research
11/23/2022

Device Directedness with Contextual Cues for Spoken Dialog Systems

In this work, we define barge-in verification as a supervised learning t...
research
06/05/2022

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

The success of deep learning comes from its ability to capture the hiera...

Please sign up or login with your details

Forgot password? Click here to reset