DeepAI AI Chat
Log In Sign Up

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

06/22/2021
by   Weidong Chen, et al.
South China University of Technology International Student Union
0

Speech emotion recognition is a challenging and important research topic that plays a critical role in human-computer interaction. Multimodal inputs can improve the performance as more emotional information is used for recognition. However, existing studies learnt all the information in the sample while only a small portion of it is about emotion. Moreover, under the multimodal framework, the interaction between different modalities is shallow and insufficient. In this paper, a keysparse Transformer is proposed for efficient SER by only focusing on emotion related information. Furthermore, a cascaded cross-attention block, which is specially designed for multimodal framework, is introduced to achieve deep interaction between different modalities. The proposed method is evaluated by IEMOCAP corpus and the experimental results show that the proposed method gives better performance than the state-of-theart approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/08/2020

Temporal aggregation of audio-visual modalities for emotion recognition

Emotion recognition has a pivotal role in affective computing and in hum...
05/17/2018

Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

Emotion recognition has become a popular topic of interest, especially i...
01/17/2022

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

Emotion recognition is a challenging and actively-studied research area ...
03/03/2023

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Speech emotion recognition is crucial to human-computer interaction. The...
07/08/2019

Attending to Emotional Narratives

Attention mechanisms in deep neural networks have achieved excellent per...
03/08/2022

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal pr...
10/12/2017

Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving

In this paper we present the first results of a pilot experiment in the ...