Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss

11/20/2022
by   Shailza Sharma, et al.
0

Recently, there has been numerous breakthroughs in face hallucination tasks. However, the task remains rather challenging in videos in comparison to the images due to inherent consistency issues. The presence of extra temporal dimension in video face hallucination makes it non-trivial to learn the facial motion through out the sequence. In order to learn these fine spatio-temporal motion details, we propose a novel cross-modal audio-visual Video Face Hallucination Generative Adversarial Network (VFH-GAN). The architecture exploits the semantic correlation of between the movement of the facial structure and the associated speech signal. Another major issue in present video based approaches is the presence of blurriness around the key facial regions such as mouth and lips - where spatial displacement is much higher in comparison to other areas. The proposed approach explicitly defines a lip reading loss to learn the fine grain motion in these facial areas. During training, GANs have potential to fit frequencies from low to high, which leads to miss the hard to synthesize frequencies. Therefore, to add salient frequency features to the network we add a frequency based loss function. The visual and the quantitative comparison with state-of-the-art shows a significant improvement in performance and efficacy.

READ FULL TEXT

page 5

page 8

page 9

page 10

research
04/16/2021

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

This paper presents a generic method for generating full facial 3D anima...
research
07/08/2023

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

DeepFake based digital facial forgery is threatening public media securi...
research
04/25/2023

AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

In this work, we present a multimodal solution to the problem of 4D face...
research
04/13/2018

Talking Face Generation by Conditional Recurrent Adversarial Network

Given an arbitrary face image and an arbitrary speech clip, the proposed...
research
05/09/2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

We devise a cascade GAN approach to generate talking face video, which i...
research
07/10/2021

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation so...
research
03/06/2020

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Recent advances in deep learning have heightened interest among research...

Please sign up or login with your details

Forgot password? Click here to reset