EMERSK – Explainable Multimodal Emotion Recognition with Situational Knowledge

06/14/2023
by   Mijanur Palash, et al.
0

Automatic emotion recognition has recently gained significant attention due to the growing popularity of deep learning algorithms. One of the primary challenges in emotion recognition is effectively utilizing the various cues (modalities) available in the data. Another challenge is providing a proper explanation of the outcome of the learning.To address these challenges, we present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK), a generalized and modular system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait, in a flexible and modular manner. The network consists of different modules that can be added or removed depending on the available data. We utilize a two-stream network architecture with convolutional neural networks (CNNs) and encoder-decoder style attention mechanisms to extract deep features from face images. Similarly, CNNs and recurrent neural networks (RNNs) with Long Short-term Memory (LSTM) are employed to extract features from posture and gait data. We also incorporate deep features from the background as contextual information for the learning process. The deep features from each module are fused using an early fusion network. Furthermore, we leverage situational knowledge derived from the location type and adjective-noun pair (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate explanations. Ablation studies demonstrate that each sub-network can independently perform emotion recognition, and combining them in a multimodal approach significantly improves overall recognition performance. Extensive experiments conducted on various benchmark datasets, including GroupWalk, validate the superior performance of our approach compared to other state-of-the-art methods.

READ FULL TEXT

page 1

page 4

page 9

research
04/28/2020

Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition

Multimodal dimensional emotion recognition has drawn a great attention f...
research
02/24/2016

How Deep Neural Networks Can Improve Emotion Recognition on Video Data

We consider the task of dimensional emotion recognition on video data us...
research
04/27/2017

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

Automatic affect recognition is a challenging task due to the various mo...
research
08/14/2017

Learning spectro-temporal features with 3D CNNs for speech emotion recognition

In this paper, we propose to use deep 3-dimensional convolutional networ...
research
10/27/2020

Deformable Convolutional LSTM for Human Body Emotion Recognition

People represent their emotions in a myriad of ways. Among the most impo...
research
03/05/2015

EmoNets: Multimodal deep learning approaches for emotion recognition in video

The task of the emotion recognition in the wild (EmotiW) Challenge is to...
research
10/26/2021

TNTC: two-stream network with transformer-based complementarity for gait-based emotion recognition

Recognizing the human emotion automatically from visual characteristics ...

Please sign up or login with your details

Forgot password? Click here to reset