SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

03/08/2022
by   Weidong Chen, et al.
0

Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computational burden due to its full attention operation. In this paper, a hierarchical efficient framework, called SpeechFormer, which considers the structural characteristics of speech, is proposed and can be served as a general-purpose backbone for cognitive speech signal processing. The proposed SpeechFormer consists of frame, phoneme, word and utterance stages in succession, each performing a neighboring attention according to the structural pattern of speech with high computational efficiency. SpeechFormer is evaluated on speech emotion recognition (IEMOCAP MELD) and neurocognitive disorder detection (Pitt DAIC-WOZ) tasks, and the results show that SpeechFormer outperforms the standard Transformer-based framework while greatly reducing the computational cost. Furthermore, our SpeechFormer achieves comparable results to the state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2023

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Paralinguistic speech processing is important in addressing many issues,...
research
08/05/2020

Compact Graph Architecture for Speech Emotion Recognition

We propose a deep graph approach to address the task of speech emotion r...
research
06/02/2023

Learning Local to Global Feature Aggregation for Speech Emotion Recognition

Transformer has emerged in speech emotion recognition (SER) at present. ...
research
11/09/2018

Integrating Recurrence Dynamics for Speech Emotion Recognition

We investigate the performance of features that can capture nonlinear re...
research
02/27/2023

DST: Deformable Speech Transformer for Emotion Recognition

Enabled by multi-head self-attention, Transformer has exhibited remarkab...
research
10/08/2021

Cognitive Coding of Speech

We propose an approach for cognitive coding of speech by unsupervised ex...
research
10/22/2021

Signal-Envelope: A C++ library with Python bindings for temporal envelope estimation

Signals can be interpreted as composed of a rapidly varying component mo...

Please sign up or login with your details

Forgot password? Click here to reset