DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video

03/07/2023
by   Zhimeng Zhang, et al.
0

For few-shot learning, it is still a critical challenge to realize photo-realistic face visually dubbing on high-resolution videos. Previous works fail to generate high-fidelity dubbing results. To address the above problem, this paper proposes a Deformation Inpainting Network (DINet) for high-resolution face visually dubbing. Different from previous works relying on multiple up-sample layers to directly generate pixels from latent embeddings, DINet performs spatial deformation on feature maps of reference images to better preserve high-frequency textural details. Specifically, DINet consists of one deformation part and one inpainting part. In the first part, five reference facial images adaptively perform spatial deformation to create deformed feature maps encoding mouth shapes at each frame, in order to align with the input driving audio and also the head poses of the input source images. In the second part, to produce face visually dubbing, a feature decoder is responsible for adaptively incorporating mouth movements from the deformed feature maps and other attributes (i.e., head pose and upper facial expression) from the source feature maps together. Finally, DINet achieves face visually dubbing with rich textural details. We conduct qualitative and quantitative comparisons to validate our DINet on high-resolution videos. The experimental results show that our method outperforms state-of-the-art works.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

research
04/07/2021

Facial Attribute Transformers for Precise and Robust Makeup Transfer

In this paper, we address the problem of makeup transfer, which aims at ...
research
11/05/2021

Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

Recent learning-based inpainting algorithms have achieved compelling res...
research
10/19/2022

FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping

In this work, we present a new single-stage method for subject agnostic ...
research
03/03/2021

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

View synthesis is usually done by an autoencoder, in which the encoder m...
research
08/16/2023

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

Omnidirectional images (ODIs) have become increasingly popular, as their...
research
12/15/2021

Detail-aware Deep Clothing Animations Infused with Multi-source Attributes

This paper presents a novel learning-based clothing deformation method t...
research
08/23/2020

Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation

Realistic speech-driven 3D facial animation is a challenging problem due...

Please sign up or login with your details

Forgot password? Click here to reset