Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks

06/26/2022
by   Amin Honarmandi Shandiz, et al.
0

Silent Speech Interfaces aim to reconstruct the acoustic signal from a sequence of ultrasound tongue images that records the articulatory movement. The extraction of information about the tongue movement requires us to efficiently process the whole sequence of images, not just as a single image. Several approaches have been suggested to process such a sequential image data. The classic neural network structure combines two-dimensional convolutional (2D-CNN) layers that process the images separately with recurrent layers (eg. an LSTM) on top of them to fuse the information along time. More recently, it was shown that one may also apply a 3D-CNN network that can extract information along both the spatial and the temporal axes in parallel, achieving a similar accuracy while being less time consuming. A third option is to apply the less well-known ConvLSTM layer type, which combines the advantages of LSTM and CNN layers by replacing matrix multiplication with the convolution operation. In this paper, we experimentally compared various combinations of the above mentions layer types for a silent speech interface task, and we obtained the best result with a hybrid model that consists of a combination of 3D-CNN and ConvLSTM layers. This hybrid network is slightly faster, smaller and more accurate than our previous 3D-CNN model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2021

3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

Silent speech interfaces (SSI) aim to reconstruct the speech signal from...
research
05/28/2021

Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

Voice Activity Detection (VAD) is not easy task when the input audio sig...
research
04/12/2019

DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

Speech sounds are produced as the coordinated movement of the speaking o...
research
06/24/2019

Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

Recently it was shown that within the Silent Speech Interface (SSI) fiel...
research
01/10/2017

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

In this paper, a novel architecture for a deep recurrent neural network,...
research
04/06/2016

Advances in Very Deep Convolutional Neural Networks for LVCSR

Very deep CNNs with small 3x3 kernels have recently been shown to achiev...
research
04/10/2019

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces

When using ultrasound video as input, Deep Neural Network-based Silent S...

Please sign up or login with your details

Forgot password? Click here to reset