Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow

07/18/2023
by   Dogucan Yaman, et al.
0

Audio-driven talking face generation is the task of creating a lip-synchronized, realistic face video from given audio and reference frames. This involves two major challenges: overall visual quality of generated images on the one hand, and audio-visual synchronization of the mouth part on the other hand. In this paper, we start by identifying several problematic aspects of synchronization methods in recent audio-driven talking face generation approaches. Specifically, this involves unintended flow of lip and pose information from the reference to the generated image, as well as instabilities during model training. Subsequently, we propose various techniques for obviating these issues: First, a silent-lip reference image generator prevents leaking of lips from the reference to the generated image. Second, an adaptive triplet loss handles the pose leaking problem. Finally, we propose a stabilized formulation of synchronization loss, circumventing aforementioned training instabilities while additionally further alleviating the lip leaking issue. Combining the individual improvements, we present state-of-the art performance on LRS2 and LRW in both synchronization and visual quality. We further validate our design in various ablation experiments, confirming the individual contributions as well as their complementary effects.

READ FULL TEXT

page 3

page 4

page 7

research
02/24/2020

Audio-driven Talking Face Video Generation with Natural Head Pose

Real-world talking faces often accompany with natural head movement. How...
research
07/27/2018

X2Face: A network for controlling face generation by using images, audio, and pose codes

The objective of this paper is a neural network model that controls the ...
research
03/29/2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Talking face generation, also known as speech-to-lip generation, reconst...
research
08/18/2023

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

The task of lip synchronization (lip-sync) seeks to match the lips of hu...
research
05/07/2022

Timestamp-independent Haptic-Visual Synchronization

The booming haptic data significantly improves the users'immersion durin...
research
04/19/2022

A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization

Performance-score synchronization is an integral task in signal processi...
research
06/26/2022

Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

This paper reports our solution for MultiMedia ViCo 2022 Conversational ...

Please sign up or login with your details

Forgot password? Click here to reset