Attention Driven Fusion for Multi-Modal Emotion Recognition

09/23/2020
by   Darshana Priyasad, et al.
0

Deep learning has emerged as a powerful alternative to hand-crafted methods for emotion recognition on combined acoustic and text modalities. Baseline systems model emotion information in text and acoustic modes independently using Deep Convolutional Neural Networks (DCNN) and Recurrent Neural Networks (RNN), followed by applying attention, fusion, and classification. In this paper, we present a deep learning-based approach to exploit and fuse text and acoustic data for emotion classification. We utilize a SincNet layer, based on parameterized sinc functions with band-pass filters, to extract acoustic features from raw audio followed by a DCNN. This approach learns filter banks tuned for emotion recognition and provides more effective features compared to directly applying convolutions over the raw speech signal. For text processing, we use two branches (a DCNN and a Bi-direction RNN followed by a DCNN) in parallel where cross attention is introduced to infer the N-gram level correlations on hidden representations received from the Bi-RNN. Following existing state-of-the-art, we evaluate the performance of the proposed system on the IEMOCAP dataset. Experimental results indicate that the proposed system outperforms existing methods, achieving 3.5

READ FULL TEXT
research
06/08/2021

Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

Emotion recognition from speech is a challenging task. Re-cent advances ...
research
04/14/2023

HCAM – Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Emotion recognition in conversations is challenging due to the multi-mod...
research
04/23/2019

Speech Emotion Recognition Using Multi-Hop Attention Mechanism

In this paper, we are interested in exploiting textual and acoustic data...
research
10/26/2022

Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

Automatic speech emotion recognition (SER) by a computer is a critical c...
research
06/16/2022

Deep Learning Architecture for Automatic Essay Scoring

Automatic evaluation of essay (AES) and also called automatic essay scor...
research
02/24/2016

How Deep Neural Networks Can Improve Emotion Recognition on Video Data

We consider the task of dimensional emotion recognition on video data us...
research
03/13/2018

A Multi-Modal Approach to Infer Image Affect

The group affect or emotion in an image of people can be inferred by ext...

Please sign up or login with your details

Forgot password? Click here to reset