DeepAI AI Chat
Log In Sign Up

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning

by   Zheng Lian, et al.

Speech emotion recognition is an important aspect of human-computer interaction. Prior works propose various transfer learning approaches to deal with limited samples in speech emotion recognition. However, they require labeled data for the source task, which cost much effort to collect them. To solve this problem, we focus on the unsupervised task, predictive coding. Nearly unlimited data for most domains can be utilized. In this paper, we utilize the multi-layer Transformer model for the predictive coding, followed with transfer learning approaches to share knowledge of the pre-trained predictive model for speech emotion recognition. We conduct experiments on IEMOCAP, and experimental results reveal the advantages of the proposed method. Our method reaches 65.03 currently advanced approaches.


page 1

page 2

page 3

page 4


Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

Prior works on speech emotion recognition utilize various unsupervised l...

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Emotion recognition datasets are relatively small, making the use of the...

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Key challenges in developing generalized automatic emotion recognition s...

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Automatic speech emotion recognition (SER) is a challenging task that pl...

Speech Emotion Recognition via Contrastive Loss under Siamese Networks

Speech emotion recognition is an important aspect of human-computer inte...

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

Speech emotion recognition (SER) has been a popular research topic in hu...

DeepEMO: Deep Learning for Speech Emotion Recognition

We proposed the industry level deep learning approach for speech emotion...