Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging

07/12/2021
by   Tamás Gábor Csapó, et al.
0

In this paper, we present our first experiments in text-to-articulation prediction, using ultrasound tongue image targets. We extend a traditional (vocoder-based) DNN-TTS framework with predicting PCA-compressed ultrasound images, of which the continuous tongue motion can be reconstructed in synchrony with synthesized speech. We use the data of eight speakers, train fully connected and recurrent neural networks, and show that FC-DNNs are more suitable for the prediction of sequential data than LSTMs, in case of limited training data. Objective experiments and visualized predictions show that the proposed solution is feasible and the generated ultrasound videos are close to natural tongue movement. Articulatory movement prediction from text input can be useful for audiovisual speech synthesis or computer-assisted pronunciation training.

READ FULL TEXT
research
07/05/2021

Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input

Articulatory information has been shown to be effective in improving the...
research
05/22/2023

Towards Ultrasound Tongue Image prediction from EEG during speech production

Previous initial research has already been carried out to propose speech...
research
04/12/2019

DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

Speech sounds are produced as the coordinated movement of the speaking o...
research
07/26/2021

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

For articulatory-to-acoustic mapping, typically only limited parallel tr...
research
08/03/2020

Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

Articulatory-to-acoustic (forward) mapping is a technique to predict spe...
research
04/10/2019

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces

When using ultrasound video as input, Deep Neural Network-based Silent S...
research
06/23/2023

Unsupervised Deformable Image Registration for Respiratory Motion Compensation in Ultrasound Images

In this paper, we present a novel deep-learning model for deformable reg...

Please sign up or login with your details

Forgot password? Click here to reset