Lip2AudSpec: Speech reconstruction from silent lip movements video

10/26/2017
by   Hassan Akbari, et al.
0

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98 reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.

READ FULL TEXT

page 4

page 6

page 7

research
08/01/2017

Improved Speech Reconstruction from Silent Video

Speechreading is the task of inferring phonetic information from visuall...
research
05/02/2022

A Novel Speech-Driven Lip-Sync Model with CNN and LSTM

Generating synchronized and natural lip movement with speech is one of t...
research
02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...
research
01/02/2017

Vid2speech: Speech Reconstruction from Silent Video

Speechreading is a notoriously difficult task for humans to perform. In ...
research
06/29/2022

DDKtor: Automatic Diadochokinetic Speech Analysis

Diadochokinetic speech tasks (DDK), in which participants repeatedly pro...
research
11/26/2019

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Lip reading has witnessed unparalleled development in recent years thank...
research
07/02/2018

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Speechreading or lipreading is the technique of understanding and gettin...

Please sign up or login with your details

Forgot password? Click here to reset