VisemeNet: Audio-Driven Animator-Centric Speech Animation

05/24/2018
by   Yang Zhou, et al.
0

We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.

READ FULL TEXT

page 1

page 3

page 6

research
05/27/2019

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time...
research
07/31/2018

DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

Human auditory cortex excels at selectively suppressing background noise...
research
04/10/2019

Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

We propose a method using a long short-term memory (LSTM) network to est...
research
06/02/2023

Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

This paper presents a novel approach for generating 3D talking heads fro...
research
01/15/2023

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

We present a novel audio-driven facial animation approach that can gener...
research
07/31/2018

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

This paper proposes a novel lip-reading driven deep learning framework f...

Please sign up or login with your details

Forgot password? Click here to reset