Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

02/15/2018
by   Caroline Etienne, et al.
0

In this work, we design a neural network for recognizing emotions in speech, using the standard IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. Applying techniques of data augmentation, layer-wise learning rate adjustment and batch normalization, we obtain highly competitive results, with 64.5 on four emotions. Moreover, we show that the model performance is strongly correlated with the labeling confidence, which highlights a fundamental difficulty in emotion recognition.

READ FULL TEXT
research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
11/09/2022

A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

Automated emotion recognition in speech is a long-standing problem. Whil...
research
11/14/2022

Describing emotions with acoustic property prompts for speech emotion recognition

Emotions lie on a broad continuum and treating emotions as a discrete nu...
research
09/18/2021

Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

Speech emotion recognition (SER) has been one of the significant tasks i...
research
10/26/2022

Pretrained audio neural networks for Speech emotion recognition in Portuguese

The goal of speech emotion recognition (SER) is to identify the emotiona...
research
12/10/2021

An Ensemble 1D-CNN-LSTM-GRU Model with Data Augmentation for Speech Emotion Recognition

In this paper, we propose an ensemble of deep neural networks along with...
research
07/12/2017

A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks

Speech emotion recognition (SER) is to study the formation and change of...

Please sign up or login with your details

Forgot password? Click here to reset