Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

09/03/2023
by   Sarthak Kumar Maharana, et al.
0

Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic space to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ∼1.81% and ∼4.56% for healthy controls and patients, respectively, over MFCCs. In the unseen case, we observe similar average trends for different SSL features. Overall, SSL networks like wav2vec, APC, and DeCoAR, which are trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

In this work, we investigate the effectiveness of pretrained Self-Superv...
research
11/10/2019

Effectiveness of self-supervised pre-training for speech recognition

We present pre-training approaches for self-supervised representation le...
research
10/24/2022

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

Pre-trained speech Transformers have facilitated great success across va...
research
05/17/2022

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Accurate ADMET (an abbreviation for "absorption, distribution, metabolis...
research
09/17/2023

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

The performance of deep learning models depends significantly on their c...
research
04/05/2022

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

We propose a computational model of speech production combining a pre-tr...
research
01/28/2021

BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

Deep neural networks (DNNs) used for brain-computer-interface (BCI) clas...

Please sign up or login with your details

Forgot password? Click here to reset