Modeling Spatial and Temporal Cues for Multi-label Facial Action Unit Detection

08/02/2016
by   Wen-Sheng Chu, et al.
0

Facial action units (AUs) are essential to decode human facial expressions. Researchers have focused on training AU detectors with a variety of features and classifiers. However, several issues remain. These are spatial representation, temporal modeling, and AU correlation. Unlike most studies that tackle these issues separately, we propose a hybrid network architecture to jointly address them. Specifically, spatial representations are extracted by a Convolutional Neural Network (CNN), which, as analyzed in this paper, is able to reduce person-specific biases caused by hand-crafted features (eg, SIFT and Gabor). To model temporal dependencies, Long Short-Term Memory (LSTMs) are stacked on top of these representations, regardless of the lengths of input videos. The outputs of CNNs and LSTMs are further aggregated into a fusion network to produce per-frame predictions of 12 AUs. Our network naturally addresses the three issues, and leads to superior performance compared to existing methods that consider these issues independently. Extensive experiments were conducted on two large spontaneous datasets, GFT and BP4D, containing more than 400,000 frames coded with 12 AUs. On both datasets, we report significant improvement over a standard multi-label CNN and feature-based state-of-the-art. Finally, we provide visualization of the learned AU models, which, to our best knowledge, reveal how machines see facial AUs for the first time.

READ FULL TEXT

page 2

page 5

page 6

page 8

page 10

page 11

page 12

page 13

research
04/07/2015

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Classifying videos according to content semantics is an important proble...
research
05/29/2020

A Hierarchical Deep Convolutional Neural Network and Gated Recurrent Unit Framework for Structural Damage Detection

Structural damage detection has become an interdisciplinary area of inte...
research
04/10/2017

Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing

Action Unit (AU) detection becomes essential for facial analysis. Many p...
research
01/08/2018

Long-term Multi-granularity Deep Framework for Driver Drowsiness Detection

For real-world driver drowsiness detection from videos, the variation of...
research
08/17/2020

Multi-label Learning with Missing Values using Combined Facial Action Unit Datasets

Facial action units allow an objective, standardized description of faci...
research
12/14/2018

AU R-CNN: Encoding Expert Prior Knowledge into R-CNN for Action Unit Detection

Modeling action units (AUs) on human faces is challenging because variou...
research
03/01/2023

Learning Person-specific Network Representation for Apparent Personality Traits Recognition

Recent studies show that apparent personality traits can be reflected fr...

Please sign up or login with your details

Forgot password? Click here to reset