Improved Speech Reconstruction from Silent Video

08/01/2017
by   Ariel Ephrat, et al.
0

Speechreading is the task of inferring phonetic information from visually observed articulatory facial movements, and is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible and natural-sounding acoustic speech signal from silent video frames of a speaking person. We train our model on speakers from the GRID and TCD-TIMIT datasets, and evaluate the quality and intelligibility of reconstructed speech using common objective measurements. We show that speech predictions from the proposed model attain scores which indicate significantly improved quality over existing models. In addition, we show promising results towards reconstructing speech from an unconstrained dictionary.

READ FULL TEXT
research
01/02/2017

Vid2speech: Speech Reconstruction from Silent Video

Speechreading is a notoriously difficult task for humans to perform. In ...
research
10/26/2017

Lip2AudSpec: Speech reconstruction from silent lip movements video

In this study, we propose a deep neural network for reconstructing intel...
research
04/06/2020

Vocoder-Based Speech Synthesis from Silent Videos

Both acoustic and visual information influence human perception of speec...
research
08/03/2020

Audiovisual Speech Synthesis using Tacotron2

Audiovisual speech synthesis is the problem of synthesizing a talking fa...
research
11/12/2019

Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing

The use of photoplethysmogram signal (PPG) for heart and sleep monitorin...
research
11/19/2021

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech

In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. ...
research
10/02/2020

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

Speech disorders such as stuttering disrupt the normal fluency of speech...

Please sign up or login with your details

Forgot password? Click here to reset