3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

04/23/2021
by   László Tóth, et al.
0

Silent speech interfaces (SSI) aim to reconstruct the speech signal from a recording of the articulatory movement, such as an ultrasound video of the tongue. Currently, deep neural networks are the most successful technology for this task. The efficient solution requires methods that do not simply process single images, but are able to extract the tongue movement information from a sequence of video frames. One option for this is to apply recurrent neural structures such as the long short-term memory network (LSTM) in combination with 2D convolutional neural networks (CNNs). Here, we experiment with another approach that extends the CNN to perform 3D convolution, where the extra dimension corresponds to time. In particular, we apply the spatial and temporal convolutions in a decomposed form, which proved very successful recently in video action recognition. We find experimentally that our 3D network outperforms the CNN+LSTM model, indicating that 3D CNNs may be a feasible alternative to CNN+LSTM networks in SSI systems.

READ FULL TEXT

page 4

page 7

research
06/26/2022

Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks

Silent Speech Interfaces aim to reconstruct the acoustic signal from a s...
research
02/19/2019

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

A challenge in speech production research is to predict future tongue mo...
research
05/28/2021

Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

Voice Activity Detection (VAD) is not easy task when the input audio sig...
research
06/09/2021

Eight Reasons Why Cybersecurity on Novel Generations of Brain-Computer Interfaces Must Be Prioritized

This article presents eight neural cyberattacks affecting spontaneous ne...
research
09/13/2017

AJILE Movement Prediction: Multimodal Deep Learning for Natural Human Neural Recordings and Video

Developing useful interfaces between brains and machines is a grand chal...
research
12/14/2019

Efficient Convolutional Neural Networks for Diacritic Restoration

Diacritic restoration has gained importance with the growing need for ma...
research
06/02/2017

Automating Carotid Intima-Media Thickness Video Interpretation with Convolutional Neural Networks

Cardiovascular disease (CVD) is the leading cause of mortality yet large...

Please sign up or login with your details

Forgot password? Click here to reset