Ultra2Speech – A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images

06/29/2020
by   Pramit Saha, et al.
5

Thousands of individuals need surgical removal of their larynx due to critical diseases every year and therefore, require an alternative form of communication to articulate speech sounds after the loss of their voice box. This work addresses the articulatory-to-acoustic mapping problem based on ultrasound (US) tongue images for the development of a silent-speech interface (SSI) that can provide them with an assistance in their daily interactions. Our approach targets automatically extracting tongue movement information by selecting an optimal feature set from US images and mapping these features to the acoustic space. We use a novel deep learning architecture to map US tongue images from the US probe placed beneath a subject's chin to formants that we call, Ultrasound2Formant (U2F) Net. It uses hybrid spatio-temporal 3D convolutions followed by feature shuffling, for the estimation and tracking of vowel formants from US images. The formant values are then utilized to synthesize continuous time-varying vowel trajectories, via Klatt Synthesizer. Our best model achieves R-squared (R^2) measure of 99.96 task. Our network lays the foundation for an SSI as it successfully tracks the tongue contour automatically as an internal representation without any explicit annotation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2019

A CNN-based tool for automatic tongue contour tracking in ultrasound images

For speech research, ultrasound tongue imaging provides a non-invasive m...
research
08/06/2020

Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

For articulatory-to-acoustic mapping using deep neural networks, typical...
research
06/24/2019

Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

Recently it was shown that within the Silent Speech Interface (SSI) fiel...
research
04/12/2019

DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

Speech sounds are produced as the coordinated movement of the speaking o...
research
06/10/2018

IVUS-Net: An Intravascular Ultrasound Segmentation Network

IntraVascular UltraSound (IVUS) is one of the most effective imaging mod...
research
01/24/2023

WhisperWand: Simultaneous Voice and Gesture Tracking Interface

This paper presents the design and implementation of WhisperWand, a comp...
research
07/19/2018

EchoFusion: Tracking and Reconstruction of Objects in 4D Freehand Ultrasound Imaging without External Trackers

Ultrasound (US) is the most widely used fetal imaging technique. However...

Please sign up or login with your details

Forgot password? Click here to reset