Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

04/25/2022
by   Abdul Rehman, et al.
0

Speech emotion recognition systems have high prediction latency because of the high computational requirements for deep learning models and low generalizability mainly because of the poor reliability of emotional measurements across multiple corpora. To solve these problems, we present a speech emotion recognition system based on a reductionist approach of decomposing and analyzing syllable-level features. Mel-spectrogram of an audio stream is decomposed into syllable-level components, which are then analyzed to extract statistical features. The proposed method uses formant attention, noise-gate filtering, and rolling normalization contexts to increase feature processing speed and tolerance to adversity. A set of syllable-level formant features is extracted and fed into a single hidden layer neural network that makes predictions for each syllable as opposed to the conventional approach of using a sophisticated deep learner to make sentence-wide predictions. The syllable level predictions help to achieve the real-time latency and lower the aggregated error in utterance level cross-corpus predictions. The experiments on IEMOCAP (IE), MSP-Improv (MI), and RAVDESS (RA) databases show that the method archives real-time latency while predicting with state-of-the-art cross-corpus unweighted accuracy of 47.6

READ FULL TEXT

page 1

page 8

research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
11/24/2021

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

The way that humans encode their emotion into speech signals is complex....
research
01/16/2020

Speech Emotion Recognition Based on Multi-feature and Multi-lingual Fusion

A speech emotion recognition algorithm based on multi-feature and Multi-...
research
11/09/2018

Integrating Recurrence Dynamics for Speech Emotion Recognition

We investigate the performance of features that can capture nonlinear re...
research
10/03/2019

Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset

This paper presents a novel CNN-RNN based approach, which exploits multi...
research
01/04/2018

A pairwise discriminative task for speech emotion recognition

Speech emotion recognition is an important task in human-machine interac...
research
04/03/2018

EmoRL: Continuous Acoustic Emotion Classification using Deep Reinforcement Learning

Acoustically expressed emotions can make communication with a robot more...

Please sign up or login with your details

Forgot password? Click here to reset