Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

09/18/2021
by   Nhat Truong Pham, et al.
8

Speech emotion recognition (SER) has been one of the significant tasks in Human-Computer Interaction (HCI) applications. However, it is hard to choose the optimal features and deal with imbalance labeled data. In this article, we investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods. To evaluate the effectiveness of HDA methods, a deep learning framework namely (ADCRNN) is designed by integrating deep dilated convolutional-recurrent neural networks with an attention mechanism. Besides, we choose 3D log Mel-spectrogram (MelSpec) features as the inputs for the deep learning framework. Furthermore, we reconfigure a loss function by combining a softmax loss and a center loss to classify the emotions. For validating our proposed methods, we use the EmoDB dataset that consists of several emotions with imbalanced samples. Experimental results prove that the proposed methods achieve better accuracy than the state-of-the-art methods on the EmoDB with 87.12 traditional and GAN-based methods, respectively.

READ FULL TEXT

page 1

page 4

page 7

page 8

page 9

research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
05/07/2023

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

Speech Emotion Recognition (SER) is to recognize human emotions in a nat...
research
11/02/2017

Data Augmentation in Emotion Classification Using Generative Adversarial Networks

It is a difficult task to classify images with multiple class labels usi...
research
05/18/2020

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Generative adversarial networks (GANs) have shown potential in learning ...
research
02/15/2018

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

In this work, we design a neural network for recognizing emotions in spe...
research
07/12/2017

A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks

Speech emotion recognition (SER) is to study the formation and change of...
research
08/09/2022

Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

Speech Emotion Recognition (SER) is crucial for human-computer interacti...

Please sign up or login with your details

Forgot password? Click here to reset