Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

06/12/2023
by   Théo Deschamps-Berger, et al.
0

The emotion detection technology to enhance human decision-making is an important research issue for real-world applications, but real-life emotion datasets are relatively rare and small. The experiments conducted in this paper use the CEMO, which was collected in a French emergency call center. Two pre-trained models based on speech and text were fine-tuned for speech emotion recognition. Using pre-trained Transformer encoders mitigates our data's limited and sparse nature. This paper explores the different fusion strategies of these modality-specific models. In particular, fusions with and without cross-attention mechanisms were tested to gather the most relevant information from both the speech and text encoders. We show that multimodal fusion brings an absolute gain of 4-9 Symmetric multi-headed cross-attention mechanism performed better than late classical fusion approaches. Our experiments also suggest that for the real-life CEMO corpus, the audio component encodes more emotive information than the textual one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

Multimodal emotion recognition is a challenging research area that aims ...
research
12/27/2020

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition

The audio-video based emotion recognition aims to classify a given video...
research
03/29/2023

Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Recently, wearable emotion recognition based on peripheral physiological...
research
10/28/2021

End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings

Recognizing a speaker's emotion from their speech can be a key element i...
research
07/26/2022

Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

In this paper, we propose a novel speech emotion recognition model calle...
research
06/23/2023

Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers

Despite the recent progress in speech emotion recognition (SER), state-o...
research
10/13/2021

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

In this contribution, we investigate the effectiveness of deep fusion of...

Please sign up or login with your details

Forgot password? Click here to reset