Listen to Your Face: Inferring Facial Action Units from Audio Channel

06/23/2017
by   Zibo Meng, et al.
0

Extensive efforts have been devoted to recognizing facial action units (AUs). However, it is still challenging to recognize AUs from spontaneous facial displays especially when they are accompanied with speech. Different from all prior work that utilized visual observations for facial AU recognition, this paper presents a novel approach that recognizes speech-related AUs exclusively from audio signals based on the fact that facial activities are highly correlated with voice during speech. Specifically, dynamic and physiological relationships between AUs and phonemes are modeled through a continuous time Bayesian network (CTBN); then AU recognition is performed by probabilistic inference via the CTBN model. A pilot audiovisual AU-coded database has been constructed to evaluate the proposed audio-based AU recognition framework. The database consists of a "clean" subset with frontal and neutral faces and a challenging subset collected with large head movements and occlusions. Experimental results on this database show that the proposed CTBN model achieves promising recognition performance for 7 speech-related AUs and outperforms the state-of-the-art visual-based methods especially for those AUs that are activated at low intensities or "hardly visible" in the visual channel. Furthermore, the CTBN model yields more impressive recognition performance on the challenging subset, where the visual-based approaches suffer significantly.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 9

page 10

page 11

page 12

research
06/29/2017

Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

It is challenging to recognize facial action unit (AU) from spontaneous ...
research
10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...
research
06/01/2023

Speech inpainting: Context-based speech synthesis guided by video

Audio and visual modalities are inherently connected in speech signals: ...
research
04/27/2022

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Talking head generation is to synthesize a lip-synchronized talking head...
research
10/02/2017

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

We present a deep learning framework for real-time speech-driven 3D faci...
research
07/17/2017

Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition

Recognizing facial action units (AUs) from spontaneous facial expression...
research
04/04/2010

Recognizing Combinations of Facial Action Units with Different Intensity Using a Mixture of Hidden Markov Models and Neural Network

Facial Action Coding System consists of 44 action units (AUs) and more t...

Please sign up or login with your details

Forgot password? Click here to reset