Visual Speech Enhancement

11/23/2017
by   Aviv Gabbay, et al.
0

When video is shot in noisy environment, the voice of a speaker seen in the video can be enhanced using the visible mouth movements, reducing background noise. While most existing methods use audio-only inputs, improved performance is obtained with our visual speech enhancement, based on an audio-visual neural network. We add to the training data videos with synthetic background noise taken from the voice of the target speaker. Since the audio input is not sufficient to separate the voice of a speaker from his own voice, the trained model better exploits the visual input and generalizes well to different noise types. The proposed model outperforms prior audio visual methods on two public lipreading datasets. It is also the first to be demonstrated on a dataset not designed for lipreading, such as the weekly addresses of Barack Obama.

READ FULL TEXT

page 4

page 6

research
11/23/2017

Visual Speech Enhancement using Noise-Invariant Training

Visual speech enhancement is used on videos shot in noisy environments t...
research
07/11/2019

My lips are concealed: Audio-visual speech enhancement through obstructions

Our objective is an audio-visual model for separating a single speaker f...
research
08/22/2017

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Isolating the voice of a specific person while filtering out other voice...
research
11/08/2022

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Personalised speech enhancement (PSE), which extracts only the speech of...
research
09/06/2021

Machine Learning: Challenges, Limitations, and Compatibility for Audio Restoration Processes

In this paper machine learning networks are explored for their use in re...
research
04/11/2018

The Conversation: Deep Audio-Visual Speech Enhancement

Our goal is to isolate individual speakers from multi-talker simultaneou...
research
04/05/2022

VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

In this paper, we address the problem of lip-voice synchronisation in vi...

Please sign up or login with your details

Forgot password? Click here to reset