MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions

07/19/2023
by   Yunfei Liu, et al.
0

Audio-driven portrait animation aims to synthesize portrait videos that are conditioned by given audio. Animating high-fidelity and multimodal video portraits has a variety of applications. Previous methods have attempted to capture different motion modes and generate high-fidelity portrait videos by training different models or sampling signals from given videos. However, lacking correlation learning between lip-sync and other movements (e.g., head pose/eye blinking) usually leads to unnatural results. In this paper, we propose a unified system for multi-person, diverse, and high-fidelity talking portrait generation. Our method contains three stages, i.e., 1) Mapping-Once network with Dual Attentions (MODA) generates talking representation from given audio. In MODA, we design a dual-attention module to encode accurate mouth movements and diverse modalities. 2) Facial composer network generates dense and detailed face landmarks, and 3) temporal-guided renderer syntheses stable videos. Extensive evaluations demonstrate that the proposed system produces more natural and realistic video portraits compared to previous methods.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 12

page 13

research
01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...
research
03/20/2021

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Generating high-fidelity talking head video by fitting with the input au...
research
01/19/2022

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

Animating high-fidelity video portrait with speech audio is crucial for ...
research
01/15/2023

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

We present a novel audio-driven facial animation approach that can gener...
research
06/08/2021

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

In this paper, we present a video-based learning framework for animating...
research
07/18/2023

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields ...
research
11/22/2022

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

While dynamic Neural Radiance Fields (NeRF) have shown success in high-f...

Please sign up or login with your details

Forgot password? Click here to reset