DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

04/12/2019
by   Dagoberto Porras, et al.
0

Speech sounds are produced as the coordinated movement of the speaking organs. There are several available methods to model the relation of articulatory movements and the resulting speech signal. The reverse problem is often called as acoustic-to-articulatory inversion (AAI). In this paper we have implemented several different Deep Neural Networks (DNNs) to estimate the articulatory information from the acoustic signal. There are several previous works related to performing this task, but most of them are using ElectroMagnetic Articulography (EMA) for tracking the articulatory movement. Compared to EMA, Ultrasound Tongue Imaging (UTI) is a technique of higher cost-benefit if we take into account equipment cost, portability, safety and visualized structures. Seeing that, our goal is to train a DNN to obtain UT images, when using speech as input. We also test two approaches to represent the articulatory information: 1) the EigenTongue space and 2) the raw ultrasound image. As an objective quality measure for the reconstructed UT images, we use MSE, Structural Similarity Index (SSIM) and Complex-Wavelet SSIM (CW-SSIM). Our experimental results show that CW-SSIM is the most useful error measure in the UTI context. We tested three different system configurations: a) simple DNN composed of 2 hidden layers with 64x64 pixels of an UTI file as target; b) the same simple DNN but with ultrasound images projected to the EigenTongue space as the target; c) and a more complex DNN composed of 5 hidden layers with UTI files projected to the EigenTongue space. In a subjective experiment the subjects found that the neural networks with two hidden layers were more suitable for this inversion task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging

In this paper, we present our first experiments in text-to-articulation ...
research
08/04/2020

Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Acoustic-to-articulatory inversion (AAI) methods estimate articulatory m...
research
08/06/2020

Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

In speech production research, different imaging modalities have been em...
research
06/26/2022

Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks

Silent Speech Interfaces aim to reconstruct the acoustic signal from a s...
research
06/29/2020

Ultra2Speech – A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images

Thousands of individuals need surgical removal of their larynx due to cr...
research
08/26/2023

A small vocabulary database of ultrasound image sequences of vocal tract dynamics

This paper presents a new database consisting of concurrent articulatory...
research
04/10/2019

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces

When using ultrasound video as input, Deep Neural Network-based Silent S...

Please sign up or login with your details

Forgot password? Click here to reset