DeepAI AI Chat
Log In Sign Up

Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss

by   Berkay Köprü, et al.

In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, correlation losses, Concordance Correlation Coefficient (CCC) and Pearson Correlation Coefficient (PCC), are used as an optimization objective during the training. Experiments are conducted on CreativeIT and RECOLA database, and evaluations are performed using the CCC metric. To highlight the effect of MTL, correlation losses and multi-modality, we respectively compare the performance of MTL against STL, CCC loss against root mean square error (MSE) loss and, PCC loss, multi-modality against single modality. We observe significant performance improvements with MTL training over STL, especially for estimation of the valence. Furthermore, the CCC loss achieves more than 7 RECOLA against MSE loss.


Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition

Compared with facial emotion recognition on categorical model, the dimen...

Multi-script Handwritten Digit Recognition Using Multi-task Learning

Handwritten digit recognition is one of the extensively studied area in ...

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

Traditionally, in paralinguistic analysis for emotion detection from spe...

Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

One of the challenges in Speech Emotion Recognition (SER) "in the wild" ...

Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

In this paper, we propose the Redundancy Reduction Twins Network (RRTN),...

Learning behavioral context recognition with multi-stream temporal convolutional networks

Smart devices of everyday use (such as smartphones and wearables) are in...

On Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square Error

The concordance correlation coefficient (CCC) is one of the most widely ...