Deep scattering network for speech emotion recognition

05/11/2021
by   Premjeet Singh, et al.
0

This paper introduces scattering transform for speech emotion recognition (SER). Scattering transform generates feature representations which remain stable to deformations and shifting in time and frequency without much loss of information. In speech, the emotion cues are spread across time and localised in frequency. The time and frequency invariance characteristic of scattering coefficients provides a representation robust against emotion irrelevant variations e.g., different speakers, language, gender etc. while preserving the variations caused by emotion cues. Hence, such a representation captures the emotion information more efficiently from speech. We perform experiments to compare scattering coefficients with standard mel-frequency cepstral coefficients (MFCCs) over different databases. It is observed that frequency scattering performs better than time-domain scattering and MFCCs. We also investigate layer-wise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER. We observe that layer-wise coefficients taken independently also perform better than MFCCs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Analysis of constant-Q filterbank based representations for speech emotion recognition

This work analyzes the constant-Q filterbank-based time-frequency repres...
research
01/14/2023

Modulation spectral features for speech emotion recognition using deep neural networks

This work explores the use of constant-Q transform based modulation spec...
research
02/08/2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

In this work, we explore the constant-Q transform (CQT) for speech emoti...
research
01/03/2016

Wavelet Scattering on the Pitch Spiral

We present a new representation of harmonic sounds that linearizes the d...
research
06/11/2019

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

This paper proposes a Residual Convolutional Neural Network (ResNet) bas...
research
10/08/2021

Joint Scattering for Automatic Chick Call Recognition

Animal vocalisations contain important information about health, emotion...
research
08/07/2019

Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition

Convolutional neural networks (CNN) are widely used for speech emotion r...

Please sign up or login with your details

Forgot password? Click here to reset