Capture, Learning, and Synthesis of 3D Speaking Styles

05/08/2019
by   Daniel Cudeiro, et al.
0

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.

READ FULL TEXT

page 1

page 5

page 7

page 8

research
12/30/2022

Imitator: Personalized Speech-driven 3D Facial Animation

Speech-driven 3D facial animation has been widely explored, with applica...
research
10/30/2021

Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis

People talk with diversified styles. For one piece of speech, different ...
research
02/01/2021

Universal Neural Vocoding with Parallel WaveNet

We present a universal neural vocoder based on Parallel WaveNet, with an...
research
04/15/2019

Synthesising 3D Facial Motion from "In-the-Wild" Speech

Synthesising 3D facial motion from speech is a crucial problem manifesti...
research
07/18/2023

FACTS: Facial Animation Creation using the Transfer of Styles

The ability to accurately capture and express emotions is a critical asp...
research
08/11/2020

Audio- and Gaze-driven Facial Animation of Codec Avatars

Codec Avatars are a recent class of learned, photorealistic face models ...
research
10/02/2017

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

We present a deep learning framework for real-time speech-driven 3D faci...

Please sign up or login with your details

Forgot password? Click here to reset