Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional Architectures in a Contextual Approach for Video-Based Visual Emotion Recognition in the Wild

05/16/2021
by   Ioannis Pikoulis, et al.
0

In this work we tackle the task of video-based visual emotion recognition in the wild. Standard methodologies that rely solely on the extraction of bodily and facial features often fall short of accurate emotion prediction in cases where the aforementioned sources of affective information are inaccessible due to head/body orientation, low resolution and poor illumination. We aspire to alleviate this problem by leveraging visual context in the form of scene characteristics and attributes, as part of a broader emotion recognition framework. Temporal Segment Networks (TSN) constitute the backbone of our proposed model. Apart from the RGB input modality, we make use of dense Optical Flow, following an intuitive multi-stream approach for a more effective encoding of motion. Furthermore, we shift our attention towards skeleton-based learning and leverage action-centric data as means of pre-training a Spatial-Temporal Graph Convolutional Network (ST-GCN) for the task of emotion recognition. Our extensive experiments on the challenging Body Language Dataset (BoLD) verify the superiority of our methods over existing approaches, while by properly incorporating all of the aforementioned modules in a network ensemble, we manage to surpass the previous best published recognition scores, by a large margin.

READ FULL TEXT

page 1

page 3

page 6

page 7

research
07/07/2021

An audiovisual and contextual approach for categorical and continuous emotion recognition in-the-wild

In this work we tackle the task of video-based audio-visual emotion reco...
research
08/07/2021

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

The research on human emotion under multimedia stimulation based on phys...
research
10/27/2021

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Multimodal emotion recognition study is hindered by the lack of labelled...
research
11/21/2019

MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition

Spatial-temporal feature learning is of vital importance for video emoti...
research
08/01/2023

Using Scene and Semantic Features for Multi-modal Emotion Recognition

Automatic emotion recognition is a hot topic with a wide range of applic...
research
03/05/2015

EmoNets: Multimodal deep learning approaches for emotion recognition in video

The task of the emotion recognition in the wild (EmotiW) Challenge is to...
research
12/11/2018

Face-Focused Cross-Stream Network for Deception Detection in Videos

Automated deception detection (ADD) from real-life videos is a challengi...

Please sign up or login with your details

Forgot password? Click here to reset