DeepAI AI Chat
Log In Sign Up

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

by   Weidong Chen, et al.
South China University of Technology International Student Union

Speech emotion recognition is a challenging and important research topic that plays a critical role in human-computer interaction. Multimodal inputs can improve the performance as more emotional information is used for recognition. However, existing studies learnt all the information in the sample while only a small portion of it is about emotion. Moreover, under the multimodal framework, the interaction between different modalities is shallow and insufficient. In this paper, a keysparse Transformer is proposed for efficient SER by only focusing on emotion related information. Furthermore, a cascaded cross-attention block, which is specially designed for multimodal framework, is introduced to achieve deep interaction between different modalities. The proposed method is evaluated by IEMOCAP corpus and the experimental results show that the proposed method gives better performance than the state-of-theart approaches.


page 1

page 2

page 3

page 4


Temporal aggregation of audio-visual modalities for emotion recognition

Emotion recognition has a pivotal role in affective computing and in hum...

Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

Emotion recognition has become a popular topic of interest, especially i...

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

Emotion recognition is a challenging and actively-studied research area ...

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Speech emotion recognition is crucial to human-computer interaction. The...

Attending to Emotional Narratives

Attention mechanisms in deep neural networks have achieved excellent per...

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal pr...

Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving

In this paper we present the first results of a pilot experiment in the ...