StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

05/09/2023
by   Jiazhi Guan, et al.
0

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability. Previous studies either require long-term data for training or produce a similar movement pattern on all subjects with low quality. In this paper, we propose StyleSync, an effective framework that enables high-fidelity lip synchronization. We identify that a style-based generator would sufficiently enable such a charming property on both one-shot and few-shot scenarios. Specifically, we design a mask-guided spatial information encoding module that preserves the details of the given face. The mouth shapes are accurately modified by audio through modulated convolutions. Moreover, our design also enables personalized lip-sync by introducing style space and generator refinement on only limited frames. Thus the identity and talking style of a target person could be accurately preserved. Extensive experiments demonstrate the effectiveness of our method in producing high-fidelity results on a variety of scenes. Resources can be found at https://hangz-nju-cuhk.github.io/projects/StyleSync.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

research
01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...
research
04/30/2023

StyleLipSync: Style-based Personalized Lip-sync Video Generation

In this paper, we present StyleLipSync, a style-based personalized lip-s...
research
05/04/2023

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

Recently, emotional talking face generation has received considerable at...
research
12/09/2022

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

Previous studies have explored generating accurately lip-synced talking ...
research
01/03/2022

DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering

While recent advances in deep neural networks have made it possible to r...
research
06/09/2022

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Despite recent progress in generative adversarial network(GAN)-based voc...
research
08/16/2021

3D High-Fidelity Mask Face Presentation Attack Detection Challenge

The threat of 3D masks to face recognition systems is increasingly serio...

Please sign up or login with your details

Forgot password? Click here to reset