A comparative study of estimating articulatory movements from phoneme sequences and acoustic features

10/31/2019
by   Abhayjeet Singh, et al.
0

Unlike phoneme sequences, movements of speech articulators (lips, tongue, jaw, velum) and the resultant acoustic signal are known to encode not only the linguistic message but also carry para-linguistic information. While several works exist for estimating articulatory movement from acoustic signals, little is known to what extent articulatory movements can be predicted only from linguistic information, i.e., phoneme sequence. In this work, we estimate articulatory movements from three different input representations: R1) acoustic signal, R2) phoneme sequence, R3) phoneme sequence with timing information. While an attention network is used for estimating articulatory movement in the case of R2, BLSTM network is used for R1 and R3. Experiments with ten subjects' acoustic-articulatory data reveal that the estimation techniques achieve an average correlation coefficient of 0.85, 0.81, and 0.81 in the case of R1, R2, and R3 respectively. This indicates that attention network, although uses only phoneme sequence (R2) without any timing information, results in an estimation performance similar to that using rich acoustic signal (R1), suggesting that articulatory motion is primarily driven by the linguistic message. The correlation coefficient is further improved to 0.88 when R1 and R3 are used together for estimating articulatory movements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2021

Estimating articulatory movements in speech production with transformer networks

We estimate articulatory movements in speech production from different m...
research
04/02/2022

Acoustic-to-articulatory Inversion based on Speech Decomposition and Auxiliary Feature

Acoustic-to-articulatory inversion (AAI) is to obtain the movement of ar...
research
12/05/1998

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

While neural networks have been employed to handle several different tex...
research
09/27/2018

Acoustic Probing for Estimating the Storage Time and Firmness of Tomatoes and Mandarin Oranges

This paper introduces an acoustic probing technique to estimate the stor...
research
10/18/2022

Eye-tracking based classification of Mandarin Chinese readers with and without dyslexia using neural sequence models

Eye movements are known to reflect cognitive processes in reading, and p...
research
06/04/2020

Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

While speaking at different rates, articulators (like tongue, lips) tend...

Please sign up or login with your details

Forgot password? Click here to reset