Multi-Window Data Augmentation Approach for Speech Emotion Recognition

10/19/2020
by   Sarala Padi, et al.
5

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method to generate additional data samples and building the deep learning models to recognize the underlying emotion of an audio signal. The multi-window augmentation method extracts more audio features from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our proposed augmentation method, combined with a deep learning model, improves the speech emotion recognition performance. We evaluate the performance of our MWA-SER approach on the IEMOCAP corpus and show that our proposed method achieves state-of-the-art results. Furthermore, the proposed system demonstrated 70 while recognizing the emotions for the SAVEE and RAVDESS datasets, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2022

A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

Automated emotion recognition in speech is a long-standing problem. Whil...
research
02/15/2018

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

In this work, we design a neural network for recognizing emotions in spe...
research
09/18/2021

Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

Speech emotion recognition (SER) has been one of the significant tasks i...
research
04/25/2022

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech emotion recognition systems have high prediction latency because ...
research
10/26/2022

Pretrained audio neural networks for Speech emotion recognition in Portuguese

The goal of speech emotion recognition (SER) is to identify the emotiona...
research
05/15/2020

"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

In applications that use emotion recognition via speech, frame-loss can ...
research
02/27/2023

DST: Deformable Speech Transformer for Emotion Recognition

Enabled by multi-head self-attention, Transformer has exhibited remarkab...

Please sign up or login with your details

Forgot password? Click here to reset