Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

06/15/2023
by   Nayan Anand, et al.
0

Speech intelligibility is crucial in language learning for effective communication. Thus, to develop computer-assisted language learning systems, automatic speech intelligibility detection (SID) is necessary. Most of the works have assessed the intelligibility in a supervised manner considering manual annotations, which requires cost and time; hence scalability is limited. To overcome these, this work proposes an unsupervised approach for SID. The proposed approach considers alignment distance computed with dynamic-time warping (DTW) between teacher and learner representation sequence as a measure to separate intelligible versus non-intelligible speech. We obtain the feature sequence using current state-of-the-art self-supervised representations from Wav2Vec-2.0. We found the detection accuracies as 90.37%, 92.57% and 96.58%, respectively, with three alignment distance measures – mean absolute error, mean squared error and cosine distance (equal to one minus cosine similarity).

READ FULL TEXT
research
10/08/2022

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Speech is the surface form of a finite set of phonetic units, which can ...
research
09/30/2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Self-supervised representations have been extensively studied for discri...
research
11/12/2022

Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings

Automatic speech quality assessment is essential for audio researchers, ...
research
05/14/2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

Self-supervised learning (SSL) speech models such as wav2vec and HuBERT ...
research
10/27/2022

Evaluating context-invariance in unsupervised speech representations

Unsupervised speech representations have taken off, with benchmarks (SUP...
research
08/07/2020

Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder

Speech sound disorder (SSD) refers to the developmental disorder in whic...
research
02/17/2022

A recommender system for automatic picking of subsurface formation tops

Geoscience domain experts traditionally correlate formation tops in the ...

Please sign up or login with your details

Forgot password? Click here to reset