Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition

by   Xin Chang, et al.
Politechnika Warszawska

Audio-Video Emotion Recognition is now attacked with Deep Neural Network modeling tools. In published papers, as a rule, the authors show only cases of the superiority in multi-modality over audio-only or video-only modality. However, there are cases superiority in uni-modality can be found. In our research, we hypothesize that for fuzzy categories of emotional events, the within-modal and inter-modal noisy information represented indirectly in the parameters of the modeling neural network impedes better performance in the existing late fusion and end-to-end multi-modal network training strategies. To take advantage and overcome the deficiencies in both solutions, we define a Multi-modal Residual Perceptron Network which performs end-to-end learning from multi-modal network branches, generalizing better multi-modal feature representation. For the proposed Multi-modal Residual Perceptron Network and the novel time augmentation for streaming digital movies, the state-of-art average recognition rate was improved to 91.4 Database of Emotional Speech and Song dataset and to 83.15 Emotional multi-modal Actors dataset. Moreover, the Multi-modal Residual Perceptron Network concept shows its potential for multi-modal applications dealing with signal sources not only of optical and acoustical types.


page 1

page 2

page 3

page 5

page 10

page 13


M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Emotion Recognition in Conversations (ERC) is crucial in developing symp...

Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context

Predicting the emotional impact of videos using machine learning is a ch...

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...

Multi-Modality in Music: Predicting Emotion in Music from High-Level Audio Features and Lyrics

This paper aims to test whether a multi-modal approach for music emotion...

DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation

Emotion Recognition in Conversations (ERC) is essential for building emp...

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

In this paper, we present a multi-modal online person verification syste...

Multi-Modal Emotion Recognition for Enhanced Requirements Engineering: A Novel Approach

Requirements engineering (RE) plays a crucial role in developing softwar...

Please sign up or login with your details

Forgot password? Click here to reset