DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

03/03/2023
by   Shuaiqi Chen, et al.
0

Speech emotion recognition is crucial to human-computer interaction. The temporal regions that represent different emotions scatter in different parts of the speech locally. Moreover, the temporal scales of important information may vary over a large range within and across speech segments. Although transformer-based models have made progress in this field, the existing models could not precisely locate important regions at different temporal scales. To address the issue, we propose Dynamic Window transFormer (DWFormer), a new architecture that leverages temporal importance by dynamically splitting samples into windows. Self-attention mechanism is applied within windows for capturing temporal important information locally in a fine-grained way. Cross-window information interaction is also taken into account for global communication. DWFormer is evaluated on both the IEMOCAP and the MELD datasets. Experimental results show that the proposed model achieves better performance than the previous state-of-the-art methods.

READ FULL TEXT
research
02/27/2023

DST: Deformable Speech Transformer for Emotion Recognition

Enabled by multi-head self-attention, Transformer has exhibited remarkab...
research
06/22/2021

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Speech emotion recognition is a challenging and important research topic...
research
05/07/2023

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

Speech Emotion Recognition (SER) is to recognize human emotions in a nat...
research
02/27/2023

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Paralinguistic speech processing is important in addressing many issues,...
research
08/21/2022

Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms

Attention has become one of the most commonly used mechanisms in deep le...
research
06/02/2023

Learning Local to Global Feature Aggregation for Speech Emotion Recognition

Transformer has emerged in speech emotion recognition (SER) at present. ...
research
01/30/2021

MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing

Transformer based knowledge tracing model is an extensively studied prob...

Please sign up or login with your details

Forgot password? Click here to reset