Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss

11/02/2020
by   Berkay Köprü, et al.
0

In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, correlation losses, Concordance Correlation Coefficient (CCC) and Pearson Correlation Coefficient (PCC), are used as an optimization objective during the training. Experiments are conducted on CreativeIT and RECOLA database, and evaluations are performed using the CCC metric. To highlight the effect of MTL, correlation losses and multi-modality, we respectively compare the performance of MTL against STL, CCC loss against root mean square error (MSE) loss and, PCC loss, multi-modality against single modality. We observe significant performance improvements with MTL training over STL, especially for estimation of the valence. Furthermore, the CCC loss achieves more than 7 RECOLA against MSE loss.

READ FULL TEXT
research
11/29/2018

Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition

Compared with facial emotion recognition on categorical model, the dimen...
research
06/15/2021

Multi-script Handwritten Digit Recognition Using Multi-task Learning

Handwritten digit recognition is one of the extensively studied area in ...
research
10/29/2022

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

Traditionally, in paralinguistic analysis for emotion detection from spe...
research
08/13/2017

Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

One of the challenges in Speech Emotion Recognition (SER) "in the wild" ...
research
06/18/2022

Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

In this paper, we propose the Redundancy Reduction Twins Network (RRTN),...
research
08/27/2018

Learning behavioral context recognition with multi-stream temporal convolutional networks

Smart devices of everyday use (such as smartphones and wearables) are in...
research
02/14/2019

On Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square Error

The concordance correlation coefficient (CCC) is one of the most widely ...

Please sign up or login with your details

Forgot password? Click here to reset