Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

08/05/2021
by   Sarala Padi, et al.
4

Automatic speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity, i.e., insufficient amounts of carefully labeled data to build and fully explore complex deep learning models for emotion classification. This paper aims to address this challenge using a transfer learning strategy combined with spectrogram augmentation. Specifically, we propose a transfer learning approach that leverages a pre-trained residual network (ResNet) model including a statistics pooling layer from speaker recognition trained using large amounts of speaker-labeled data. The statistics pooling layer enables the model to efficiently process variable-length input, thereby eliminating the need for sequence truncation which is commonly used in SER systems. In addition, we adopt a spectrogram augmentation technique to generate additional training data samples by applying random time-frequency masks to log-mel spectrograms to mitigate overfitting and improve the generalization of emotion recognition models. We evaluate the effectiveness of our proposed approach on the interactive emotional dyadic motion capture (IEMOCAP) dataset. Experimental results indicate that the transfer learning and spectrogram augmentation approaches improve the SER performance, and when combined achieve state-of-the-art results.

READ FULL TEXT

page 2

page 3

page 6

research
02/16/2022

Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

Automatic emotion recognition plays a key role in computer-human interac...
research
10/27/2020

CopyPaste: An Augmentation Method for Speech Emotion Recognition

Data augmentation is a widely used strategy for training robust machine ...
research
11/11/2018

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning

Speech emotion recognition is an important aspect of human-computer inte...
research
11/07/2022

Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

Wake-up words (WUW) is a short sentence used to activate a speech recogn...
research
09/15/2023

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Significant advances are being made in speech emotion recognition (SER) ...
research
03/24/2020

Joint Deep Cross-Domain Transfer Learning for Emotion Recognition

Deep learning has been applied to achieve significant progress in emotio...
research
02/06/2019

Transfer Learning From Sound Representations For Anger Detection in Speech

In this work, we train fully convolutional networks to detect anger in s...

Please sign up or login with your details

Forgot password? Click here to reset