StyleLipSync: Style-based Personalized Lip-sync Video Generation

04/30/2023
by   Taekyung Ki, et al.
0

In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation. In contrast to the previous lip-sync methods, we introduce pose-aware masking that dynamically locates the mask to improve the naturalness over frames by utilizing a 3D parametric mesh predictor frame by frame. Moreover, we propose a few-shot lip-sync adaptation method for an arbitrary person by introducing a sync regularizer that preserves lips-sync generalization while enhancing the person-specific visual information. Extensive experiments demonstrate that our model can generate accurate lip-sync videos even with the zero-shot setting and enhance characteristics of an unseen face using a few seconds of target video through the proposed adaptation method. Please refer to our project page.

READ FULL TEXT

page 3

page 4

page 7

page 8

page 12

research
05/09/2023

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

Despite recent advances in syncing lip movements with any audio waves, c...
research
05/23/2023

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

This paper presents a controllable text-to-video (T2V) diffusion model, ...
research
08/09/2023

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Current talking face generation methods mainly focus on speech-lip synch...
research
01/13/2022

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

Dancing video retargeting aims to synthesize a video that transfers the ...
research
05/09/2023

Zero-shot personalized lip-to-speech synthesis with face image based voice control

Lip-to-Speech (Lip2Speech) synthesis, which predicts corresponding speec...
research
05/05/2022

Parametric Reshaping of Portraits in Videos

Sharing short personalized videos to various social media networks has b...
research
05/23/2023

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

In the paradigm of AI-generated content (AIGC), there has been increasin...

Please sign up or login with your details

Forgot password? Click here to reset