DeepAI AI Chat
Log In Sign Up

Temporal aggregation of audio-visual modalities for emotion recognition

by   Andreea Birhala, et al.
Politehnica University of Bucharest

Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy rating. The experiments are conducted over the open-access multimodal dataset CREMA-D.


Contrastive Regularization for Multimodal Emotion Recognition Using Audio and Text

Speech emotion recognition is a challenge and an important step towards ...

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

With the fast development of artificial intelligence and short videos, e...

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Speech emotion recognition is a challenging and important research topic...

Variants of BERT, Random Forests and SVM approach for Multimodal Emotion-Target Sub-challenge

Emotion recognition has become a major problem in computer vision in rec...

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Emotion recognition is a core research area at the intersection of artif...

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Insufficient Labelled Data

This paper proposes a multimodal emotion recognition system, VIsual Spok...

Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

Emotion recognition is involved in several real-world applications. With...