Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

08/03/2020
by   Tamás Gábor Csapó, et al.
0

Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high `relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker `m1' are significantly weaker than other speakers. We show that this is caused by the fact that 74

READ FULL TEXT

page 2

page 4

research
08/04/2020

Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Acoustic-to-articulatory inversion (AAI) methods estimate articulatory m...
research
04/29/2021

Towards a practical lip-to-speech conversion system using deep neural networks and mobile application frontend

Articulatory-to-acoustic (forward) mapping is a technique to predict spe...
research
06/08/2021

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Articulatory-to-acoustic mapping seeks to reconstruct speech from a reco...
research
07/12/2021

Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging

In this paper, we present our first experiments in text-to-articulation ...
research
10/30/2021

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Most current speech technology systems are designed to operate well even...
research
07/29/2018

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI

Vocal tract configurations play a vital role in generating distinguishab...
research
02/14/2021

Attention-gated convolutional neural networks for off-resonance correction of spiral real-time MRI

Spiral acquisitions are preferred in real-time MRI because of their effi...

Please sign up or login with your details

Forgot password? Click here to reset