Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

09/03/2022
by   Siddarth Ravichandran, et al.
0

Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking head generation methods in recent years. Despite their impressive results and popularity, they usually lack certain qualitative aspects such as texture quality, lips synchronization, or resolution, and practical aspects such as the ability to run in real-time. To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speech with a special emphasis on performance. We introduce a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach that allows disentanglement of the different modalities used to control the global head motion. Our method runs in real-time, and is able to deliver superior results compared to the current state-of-the-art.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

page 8

research
07/15/2022

MegaPortraits: One-shot Megapixel Neural Head Avatars

In this work, we advance the neural head avatar technology to the megapi...
research
08/19/2019

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

Deep generative models have led to significant advances in cross-modal g...
research
07/07/2022

AV-Gaze: A Study on the Effectiveness of Audio Guided Visual Attention Estimation for Non-Profilic Faces

In challenging real-life conditions such as extreme head-pose, occlusion...
research
07/05/2023

Interactive Conversational Head Generation

We introduce a new conversation head generation benchmark for synthesizi...
research
09/14/2023

HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods

Talking Face Generation (TFG) aims to reconstruct facial movements to ac...
research
12/27/2022

Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research

Research into multi-modal perception, human cognition, behavior, and att...

Please sign up or login with your details

Forgot password? Click here to reset