StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

08/16/2022
by   Haonan Qiu, et al.
9

Realistic generative face video synthesis has long been a pursuit in both computer vision and graphics community. However, existing face video generation methods tend to produce low-quality frames with drifted facial identities and unnatural movements. To tackle these challenges, we propose a principled framework named StyleFaceV, which produces high-fidelity identity-preserving face videos with vivid movements. Our core insight is to decompose appearance and pose information and recompose them in the latent space of StyleGAN3 to produce stable and dynamic results. Specifically, StyleGAN3 provides strong priors for high-fidelity facial image generation, but the latent space is intrinsically entangled. By carefully examining its latent properties, we propose our decomposition and recomposition designs which allow for the disentangled combination of facial appearance and movements. Moreover, a temporal-dependent model is built upon the decomposed latent features, and samples reasonable sequences of motions that are capable of generating realistic and temporally coherent face videos. Particularly, our pipeline is trained with a joint training strategy on both static images and high-quality video data, which is of higher data efficiency. Extensive experiments demonstrate that our framework achieves state-of-the-art face video generation results both qualitatively and quantitatively. Notably, StyleFaceV is capable of generating realistic 1024×1024 face videos even without high-resolution training videos.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

page 8

research
03/28/2022

Encode-in-Style: Latent-based Video Encoding using StyleGAN2

We propose an end-to-end facial video encoding approach that facilitates...
research
05/23/2023

CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation

Recently, talking face generation has drawn ever-increasing attention fr...
research
02/15/2023

One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

While recent research has progressively overcome the low-resolution cons...
research
04/17/2023

Text2Performer: Text-Driven Human Video Generation

Text-driven content creation has evolved to be a transformative techniqu...
research
03/30/2022

High-resolution Face Swapping via Latent Semantics Disentanglement

We present a novel high-resolution face swapping method using the inhere...
research
10/12/2020

High-Fidelity 3D Digital Human Creation from RGB-D Selfies

We present a fully automatic system that can produce high-fidelity, phot...
research
06/06/2019

Scaling Autoregressive Video Models

Due to the statistical complexity of video, the high degree of inherent ...

Please sign up or login with your details

Forgot password? Click here to reset