DST: Deformable Speech Transformer for Emotion Recognition

02/27/2023
by   Weidong Chen, et al.
0

Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on input speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable information embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST.

READ FULL TEXT
research
03/03/2023

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Speech emotion recognition is crucial to human-computer interaction. The...
research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
02/27/2023

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Paralinguistic speech processing is important in addressing many issues,...
research
11/11/2018

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning

Speech emotion recognition is an important aspect of human-computer inte...
research
08/21/2022

Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms

Attention has become one of the most commonly used mechanisms in deep le...
research
03/08/2022

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal pr...
research
09/19/2023

Multi-spectral Entropy Constrained Neural Compression of Solar Imagery

Missions studying the dynamic behaviour of the Sun are defined to captur...

Please sign up or login with your details

Forgot password? Click here to reset