Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

08/25/2022
by   Puneet Kumar, et al.
0

This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determined through intensive ablation studies. It fuses the speech image features and then combines speech, image, and intermediate fusion outputs. The proposed interpretability technique incorporates the divide conquer approach to compute shapely values denoting each speech image feature's importance. We have also constructed a large-scale dataset (IIT-R SIER dataset), consisting of speech utterances, corresponding images, and class labels, i.e., 'anger,' 'happy,' 'hate,' and 'sad.' The proposed system has achieved 83.29 the proposed system advocates the importance of utilizing complementary information from multiple modalities for emotion recognition.

READ FULL TEXT

page 5

page 6

research
08/24/2022

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Insufficient Labelled Data

This paper proposes a multimodal emotion recognition system, VIsual Spok...
research
06/05/2023

Interpretable Multimodal Emotion Recognition using Facial Features and Physiological Signals

This paper aims to demonstrate the importance and feasibility of fusing ...
research
03/31/2022

MMER: Multimodal Multi-task learning for Emotion Recognition in Spoken Utterances

Emotion Recognition (ER) aims to classify human utterances into differen...
research
06/12/2022

COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Automatically recognising apparent emotions from face and voice is hard,...
research
11/17/2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Feature subspace selection is an important part in speech emotion recogn...
research
03/14/2020

EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle

We present EmotiCon, a learning-based algorithm for context-aware percei...
research
10/08/2021

Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks

As speech-interfaces are getting richer and widespread, speech emotion r...

Please sign up or login with your details

Forgot password? Click here to reset