Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

05/28/2021
by   Amin Honarmandi Shandiz, et al.
0

Voice Activity Detection (VAD) is not easy task when the input audio signal is noisy, and it is even more complicated when the input is not even an audio recording. This is the case with Silent Speech Interfaces (SSI) where we record the movement of the articulatory organs during speech, and we aim to reconstruct the speech signal from this recording. Our SSI system synthesizes speech from ultrasonic videos of the tongue movement, and the quality of the resulting speech signals are evaluated by metrics such as the mean squared error loss function of the underlying neural network and the Mel-Cepstral Distortion (MCD) of the reconstructed speech compared to the original. Here, we first demonstrate that the amount of silence in the training data can have an influence both on the MCD evaluation metric and on the performance of the neural network model. Then, we train a convolutional neural network classifier to separate silent and speech-containing ultrasound tongue images, using a conventional VAD algorithm to create the training labels from the corresponding speech signal. In the experiments our ultrasound-based speech/silence separator achieved a classification accuracy of about 85% and an AUC score around 86%.

READ FULL TEXT
research
07/01/2019

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

We investigate the automatic processing of child speech therapy sessions...
research
03/03/2023

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

The availability of digital devices operated by voice is expanding rapid...
research
04/23/2021

Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

Several approaches exist for the recording of articulatory movements, su...
research
05/30/2023

Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Thanks to the latest deep learning algorithms, silent speech interfaces ...
research
06/26/2022

Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks

Silent Speech Interfaces aim to reconstruct the acoustic signal from a s...
research
04/23/2021

3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

Silent speech interfaces (SSI) aim to reconstruct the speech signal from...
research
04/23/2021

Improving Neural Silent Speech Interface Models by Adversarial Training

Besides the well-known classification task, these days neural networks a...

Please sign up or login with your details

Forgot password? Click here to reset