Emotion Recognition Using Fusion of Audio and Video Features

06/25/2019
by   Juan D. S. Ortega, et al.
3

In this paper we propose a fusion approach to continuous emotion recognition that combines visual and auditory modalities in their representation spaces to predict the arousal and valence levels. The proposed approach employs a pre-trained convolution neural network and transfer learning to extract features from video frames that capture the emotional content. For the auditory content, a minimalistic set of parameters such as prosodic, excitation, vocal tract, and spectral descriptors are used as features. The fusion of these two modalities is carried out at a feature level, before training a single support vector regressor (SVR) or at a prediction level, after training one SVR for each modality. The proposed approach also includes preprocessing and post-processing techniques which contribute favorably to improving the concordance correlation coefficient (CCC). Experimental results for predicting spontaneous and natural emotions on the RECOLA dataset have shown that the proposed approach takes advantage of the complementary information of visual and auditory modalities and provides CCCs of 0.749 and 0.565 for arousal and valence, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2020

Continuous Emotion Recognition via Deep Convolutional Autoencoder and Support Vector Regressor

Automatic facial expression recognition is an important research area in...
research
07/06/2019

Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

This paper presents a novel deep neural network (DNN) for multimodal fus...
research
01/15/2019

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition

Automatic emotion recognition (AER) is a challenging task due to the abs...
research
06/17/2023

Enhancing the Prediction of Emotional Experience in Movies using Deep Neural Networks: The Significance of Audio and Language

Our paper focuses on making use of deep neural network models to accurat...
research
04/02/2019

The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level

Depression is a serious medical condition that is suffered by a large nu...
research
06/06/2019

Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild

Emotion recognition plays an important role in human-computer interactio...

Please sign up or login with your details

Forgot password? Click here to reset