Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

04/23/2021
by   Yide Yu, et al.
0

Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.

READ FULL TEXT

page 1

page 2

research
05/28/2021

Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

Voice Activity Detection (VAD) is not easy task when the input audio sig...
research
02/16/2021

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Real-time magnetic resonance imaging (RT-MRI) of human speech production...
research
02/14/2021

Attention-gated convolutional neural networks for off-resonance correction of spiral real-time MRI

Spiral acquisitions are preferred in real-time MRI because of their effi...
research
04/23/2021

Improving Neural Silent Speech Interface Models by Adversarial Training

Besides the well-known classification task, these days neural networks a...
research
05/26/2023

Neural modeling of magnetic tape recorders

The sound of magnetic recording media, such as open-reel and cassette ta...
research
05/08/2018

Highly Scalable Image Reconstruction using Deep Neural Networks with Bandpass Filtering

To increase the flexibility and scalability of deep neural networks for ...
research
03/09/2022

An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production

The best performance in Air-tissue boundary (ATB) segmentation of real-t...

Please sign up or login with your details

Forgot password? Click here to reset