Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

03/11/2022
by   Rahil Parikh, et al.
0

Multi-resolution spectro-temporal features of a speech signal represent how the brain perceives sounds by tuning cortical cells to different spectral and temporal modulations. These features produce a higher dimensional representation of the speech signals. The purpose of this paper is to evaluate how well the auditory cortex representation of speech signals contribute to estimate articulatory features of those corresponding signals. Since obtaining articulatory features from acoustic features of speech signals has been a challenging topic of interest for different speech communities, we investigate the possibility of using this multi-resolution representation of speech signals as acoustic features. We used U. of Wisconsin X-ray Microbeam (XRMB) database of clean speech signals to train a feed-forward deep neural network (DNN) to estimate articulatory trajectories of six tract variables. The optimal set of multi-resolution spectro-temporal features to train the model were chosen using appropriate scale and rate vector parameters to obtain the best performing model. Experiments achieved a correlation of 0.675 with ground-truth tract variables. We compared the performance of this speech inversion system with prior experiments conducted using Mel Frequency Cepstral Coefficients (MFCCs).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2022

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

In this work, we incorporated acoustically derived source features, aper...
research
04/08/2022

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

An accurate objective speech intelligibility prediction algorithms is of...
research
10/29/2022

Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

Most organisms including humans function by coordinating and integrating...
research
06/12/2021

A Multi-Implicit Neural Representation for Fonts

Fonts are ubiquitous across documents and come in a variety of styles. T...
research
10/30/2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

In this work, we investigate the effectiveness of pretrained Self-Superv...
research
07/26/2023

Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation

Acoustic echo cancellation (AEC) aims to remove interference signals whi...
research
11/24/2021

Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations

Non-intrusive speech intelligibility (SI) prediction from binaural signa...

Please sign up or login with your details

Forgot password? Click here to reset