Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

03/08/2022
by   Ganglai Wang, et al.
1

Talking face generation with great practical significance has attracted more attention in recent audio-visual studies. How to achieve accurate lip synchronization is a long-standing challenge to be further investigated. Motivated by xxx, in this paper, an AttnWav2Lip model is proposed by incorporating spatial attention module and channel attention module into lip-syncing strategy. Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction. To our limited knowledge, this is the first attempt to introduce attention mechanism to the scheme of talking face generation. An extensive experiments have been conducted to evaluate the effectiveness of the proposed model. Compared to the baseline measured by LSE-D and LSE-C metrics, a superior performance has been demonstrated on the benchmark lip synthesis datasets, including LRW, LRS2 and LRS3.

READ FULL TEXT
research
03/10/2022

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

DeepFake based digital facial forgery is threatening the public media se...
research
06/29/2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

The Video-to-Audio (V2A) model has recently gained attention for its pra...
research
09/14/2023

Efficient Face Detection with Audio-Based Region Proposals

Robot vision often involves a large computational load due to large imag...
research
05/08/2022

Past and Future Motion Guided Network for Audio Visual Event Localization

In recent years, audio-visual event localization has attracted much atte...
research
05/21/2010

Face Synthesis (FASY) System for Generation of a Face Image from Human Description

This paper aims at generating a new face based on the human like descrip...
research
07/18/2023

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields ...
research
06/13/2020

DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms

As the GAN-based face image and video generation techniques, widely know...

Please sign up or login with your details

Forgot password? Click here to reset