End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

10/02/2017
by   Hai X. Pham, et al.
1

We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and head rotations to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech. Hence, our model not only activates appropriate facial action units at inference to depict different utterance generating actions, in the form of lip movements, but also, without any assumption, automatically estimates emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of facial unit activations. For example, in a happy speech, the mouth opens wider than normal, while other facial units are relaxed; or in a surprised state, both eyebrows raise higher. Experiments on a diverse audiovisual corpus of different actors across a wide range of emotional states show interesting and promising results of our approach. Being speaker-independent, our generalized model is readily applicable to various tasks in human-machine interaction and animation.

READ FULL TEXT
research
05/27/2019

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time...
research
04/16/2021

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

In this paper, we propose a novel text-based talking-head video generati...
research
04/25/2021

3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head

Impressive progress has been made in audio-driven 3D facial animation re...
research
10/02/2020

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

Speech disorders such as stuttering disrupt the normal fluency of speech...
research
05/08/2019

Capture, Learning, and Synthesis of 3D Speaking Styles

Audio-driven 3D facial animation has been widely explored, but achieving...
research
06/23/2017

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Extensive efforts have been devoted to recognizing facial action units (...
research
06/16/2015

Time Series Classification using the Hidden-Unit Logistic Model

We present a new model for time series classification, called the hidden...

Please sign up or login with your details

Forgot password? Click here to reset