Disentanglement for audio-visual emotion recognition using multitask setup

by   Raghuveer Peri, et al.

Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world data used for training. This work explores the disentanglement of multimodal signal representations for the primary task of emotion recognition and a secondary person identification task. In particular, we developed a multitask framework to extract low-dimensional embeddings that aim to capture emotion specific information, while containing minimal information related to person identity. We evaluate three different techniques for disentanglement and report results of up to 13 recognition performance.



There are no comments yet.


page 4


Learning Alignment for Multimodal Emotion Recognition from Speech

Speech emotion recognition is a challenging problem because human convey...

Temporal aggregation of audio-visual modalities for emotion recognition

Emotion recognition has a pivotal role in affective computing and in hum...

Why Should I Trust a Model is Private? Using Shifts in Model Explanation for Evaluating Privacy-Preserving Emotion Recognition Model

Privacy preservation is a crucial component of any real-world applicatio...

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Identifying emotion from speech is a non-trivial task pertaining to the ...

Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network

In recent years, deep-learning-based speech emotion recognition models h...

Learning Functions to Study the Benefit of Multitask Learning

We study and quantify the generalization patterns of multitask learning ...

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention

Simile recognition is to detect simile sentences and to extract simile c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.