Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

09/17/2023
by   Ahmed Adel Attia, et al.
0

The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs. Better input and output representation has the potential to boost models' performance and generalization. In the context of acoustic-to-articulatory speech inversion (SI) systems, we study the impact of utilizing speech representations acquired via self-supervised learning (SSL) models, such as HuBERT compared to conventional acoustic features. Additionally, we investigate the incorporation of novel tract variables (TVs) through an improved geometric transformation model. By combining these two approaches, we improve the Pearson product-moment correlation (PPMC) scores which evaluate the accuracy of TV estimation of the SI system from 0.7452 to 0.8141, a 6.9 feature representations from SSL models and improved geometric transformations with target TVs on the enhanced functionality of SI systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

In this work, we investigate the effectiveness of pretrained Self-Superv...
research
01/03/2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

In speech recognition, it is essential to model the phonetic content of ...
research
09/03/2023

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Acoustic-to-articulatory inversion (AAI) involves mapping from the acous...
research
04/24/2023

Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model

This paper proposes a zero-shot text-to-speech (TTS) conditioned by a se...
research
06/16/2023

Evaluation of Speech Representations for MOS prediction

In this paper, we evaluate feature extraction models for predicting spee...
research
08/02/2023

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

While FastSpeech2 aims to integrate aspects of speech such as pitch, ene...
research
10/29/2022

Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

Most organisms including humans function by coordinating and integrating...

Please sign up or login with your details

Forgot password? Click here to reset