A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

11/09/2022
by   Ravi Shankar, et al.
0

Automated emotion recognition in speech is a long-standing problem. While early work on emotion recognition relied on hand-crafted features and simple classifiers, the field has now embraced end-to-end feature learning and classification using deep neural networks. In parallel to these models, researchers have proposed several data augmentation techniques to increase the size and variability of existing labeled datasets. Despite many seminal contributions in the field, we still have a poor understanding of the interplay between the network architecture and the choice of data augmentation. Moreover, only a handful of studies demonstrate the generalizability of a particular model across multiple datasets, which is a prerequisite for robust real-world performance. In this paper, we conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition. To eliminate bias, we fix the model architectures and optimization hyperparameters using the VESUS dataset and then use repeated 5-fold cross validation to evaluate the performance on the IEMOCAP and CREMA-D datasets. Our results demonstrate that long-range dependencies in the speech signal are critical for emotion recognition and that speed/rate augmentation offers the most robust performance gain across models.

READ FULL TEXT
research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
07/12/2023

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Despite recent advancements in speech emotion recognition (SER) models, ...
research
08/02/2018

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition

Regularization is crucial to the success of many practical deep learning...
research
05/18/2020

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Speech emotion recognition systems (SER) can achieve high accuracy when ...
research
04/12/2019

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Identifying emotion from speech is a non-trivial task pertaining to the ...
research
02/15/2018

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

In this work, we design a neural network for recognizing emotions in spe...
research
04/06/2018

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Speech emotion recognition (SER) is an important aspect of effective hum...

Please sign up or login with your details

Forgot password? Click here to reset