Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

02/15/2018

∙

In this work, we design a neural network for recognizing emotions in speech, using the standard IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. Applying techniques of data augmentation, layer-wise learning rate adjustment and batch normalization, we obtain highly competitive results, with 64.5 on four emotions. Moreover, we show that the model performance is strongly correlated with the labeling confidence, which highlights a fundamental difficulty in emotion recognition.

READ FULL TEXT

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

Sign in with Google

Consider DeepAI Pro