Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

07/29/2018
by   Pramit Saha, et al.
0

Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.

READ FULL TEXT
research
06/16/2021

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Speech sounds of spoken language are obtained by varying configuration o...
research
12/17/2018

Persian phonemes recognition using PPNet

In this paper a new approach for recognition of Persian phonemes on the ...
research
11/17/2014

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Models based on deep convolutional networks have dominated recent image ...
research
08/03/2020

Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

Articulatory-to-acoustic (forward) mapping is a technique to predict spe...
research
11/04/2020

Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

Inspired by a human speech chain mechanism, a machine speech chain frame...
research
05/21/2020

Large scale evaluation of importance maps in automatic speech recognition

In this paper, we propose a metric that we call the structured saliency ...
research
11/16/2017

Speech Dereverberation with Context-aware Recurrent Neural Networks

In this paper, we propose a model to perform speech dereverberation by e...

Please sign up or login with your details

Forgot password? Click here to reset