Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

07/18/2023
by   Jiahe Li, et al.
0

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Our idea is to explicitly exploit the unequal contribution of spatial regions to guide talking portrait modeling. Specifically, to improve the accuracy of dynamic head reconstruction, a compact and expressive NeRF-based Tri-Plane Hash Representation is introduced by pruning empty spatial regions with three planar hash encoders. For speech audio, we propose a Region Attention Module to generate region-aware condition feature via an attention mechanism. Different from existing methods that utilize an MLP-based encoder to learn the cross-modal relation implicitly, the attention mechanism builds an explicit connection between audio features and spatial regions to capture the priors of local motions. Moreover, a direct and fast Adaptive Pose Encoding is introduced to optimize the head-torso separation problem by mapping the complex transformation of the head pose into spatial coordinates. Extensive experiments demonstrate that our method renders better high-fidelity and audio-lips synchronized talking portrait videos, with realistic details and high efficiency compared to previous methods.

READ FULL TEXT

page 3

page 5

page 7

page 8

page 13

page 15

research
11/22/2022

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

While dynamic Neural Radiance Fields (NeRF) have shown success in high-f...
research
09/14/2023

DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

In this paper, we present the decomposed triplane-hash neural radiance f...
research
07/19/2023

MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions

Audio-driven portrait animation aims to synthesize portrait videos that ...
research
01/19/2022

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

Animating high-fidelity video portrait with speech audio is crucial for ...
research
09/22/2021

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

To the best of our knowledge, we first present a live system that genera...
research
06/28/2023

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promisin...
research
03/08/2022

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

Talking face generation with great practical significance has attracted ...

Please sign up or login with your details

Forgot password? Click here to reset