Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition

08/07/2019
by   Shruti Gupta, et al.
5

Convolutional neural networks (CNN) are widely used for speech emotion recognition (SER). In such cases, the short time fourier transform (STFT) spectrogram is the most popular choice for representing speech, which is fed as input to the CNN. However, the uncertainty principles of the short-time Fourier transform prevent it from capturing time and frequency resolutions simultaneously. On the other hand, the recently proposed single frequency filtering (SFF) spectrogram promises to be a better alternative because it captures both time and frequency resolutions simultaneously. In this work, we explore the SFF spectrogram as an alternative representation of speech for SER. We have modified the SFF spectrogram by taking the average of the amplitudes of all the samples between two successive glottal closure instants (GCI) locations. The duration between two successive GCI locations gives the pitch, motivating us to name the modified SFF spectrogram as pitch-synchronous SFF spectrogram. The GCI locations were detected using zero frequency filtering approach. The proposed pitch-synchronous SFF spectrogram produced accuracy values of 63.95 These correspond to an improvement of +7.35 over state-of-the-art result on the STFT sepctrogram using CNN. Specially, the proposed method recognized 22.7 whereas this number was 0 promise a much wider use of the proposed pitch-synchronous SFF spectrogram for other speech-based applications.

READ FULL TEXT

page 1

page 3

page 8

page 9

research
02/08/2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

In this work, we explore the constant-Q transform (CQT) for speech emoti...
research
08/28/2023

Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

In this paper, we propose a novel time-frequency joint learning method f...
research
10/22/2022

Speech Emotion Recognition via an Attentive Time-Frequency Neural Network

Spectrogram is commonly used as the input feature of deep neural network...
research
11/25/2022

Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

We present a neural vocoder designed with low-powered Alternative and Au...
research
10/27/2022

A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

Estimation of fundamental frequency (F0) in voiced segments of speech si...
research
05/11/2021

Deep scattering network for speech emotion recognition

This paper introduces scattering transform for speech emotion recognitio...
research
02/23/2023

Frequency bin-wise single channel speech presence probability estimation using multiple DNNs

In this work, we propose a frequency bin-wise method to estimate the sin...

Please sign up or login with your details

Forgot password? Click here to reset