Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

03/29/2023
by   Jiadong Wang, et al.
0

Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. The previous studies revealed the importance of lip-speech synchronization and visual quality. Despite much progress, they hardly focus on the content of lip movements i.e., the visual intelligibility of the spoken words, which is an important aspect of generation quality. To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results. Moreover, to compensate for data scarcity, we train the lip-reading expert in an audio-visual self-supervised manner. With a lip-reading expert, we propose a novel contrastive learning to enhance lip-speech synchronization, and a transformer to encode audio synchronically with video, while considering global temporal dependency of audio. For evaluation, we propose a new strategy with two different lip-reading experts to measure intelligibility of the generated videos. Rigorous experiments show that our proposal is superior to other State-of-the-art (SOTA) methods, such as Wav2Lip, in reading intelligibility i.e., over 38 dataset. We also achieve the SOTA performance in lip-speech synchronization and comparable performances in visual quality.

READ FULL TEXT

page 3

page 4

page 7

page 12

page 13

page 14

research
06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...
research
09/06/2018

Deep Audio-Visual Speech Recognition

The goal of this work is to recognise phrases and sentences being spoken...
research
09/12/2020

DualLip: A System for Joint Lip Reading and Generation

Lip reading aims to recognize text from talking lip, while lip generatio...
research
08/31/2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

Lip reading, aiming to recognize spoken sentences according to the given...
research
07/18/2023

Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow

Audio-driven talking face generation is the task of creating a lip-synch...
research
08/23/2020

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In this work, we investigate the problem of lip-syncing a talking face v...
research
03/06/2020

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Recent advances in deep learning have heightened interest among research...

Please sign up or login with your details

Forgot password? Click here to reset