Towards adversarial learning of speaker-invariant representation for speech emotion recognition

03/22/2019
by   Ming Tu, et al.
0

Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations for speech emotion recognition is crucial. In this paper, we propose to apply adversarial training to SER to learn speaker-invariant representations. Our model consists of three parts: a representation learning sub-network with time-delay neural network (TDNN) and LSTM with statistical pooling, an emotion classification network and a speaker classification network. Both the emotion and speaker classification network take the output of the representation learning network as input. Two training strategies are employed: one based on domain adversarial training (DAT) and the other one based on cross-gradient training (CGT). Besides the conventional data set, we also evaluate our proposed models on a much larger publicly available emotion data set with 250 speakers. Evaluation results show that on IEMOCAP, DAT and CGT provides 5.6 system without speaker-invariant representation learning on 5-fold cross validation. On the larger emotion data set, while CGT fails to yield better results than baseline, DAT can still provide 9.8 standalone test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2019

Speaker-invariant Affective Representation Learning via Adversarial Training

Representation learning for speech emotion recognition is challenging du...
research
01/02/2023

EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Vocal Bursts – short, non-speech vocalizations that convey emotions, suc...
research
10/24/2019

Domain adversarial learning for emotion recognition

In practical applications for emotion recognition, users do not always e...
research
11/04/2022

SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers

In recent years, Speech Emotion Recognition (SER) has been investigated ...
research
01/02/2020

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Research on speech processing has traditionally considered the task of d...
research
10/25/2019

Learning Domain Invariant Representations for Child-Adult Classification from Speech

Diagnostic procedures for ASD (autism spectrum disorder) involve semi-na...
research
01/19/2022

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

The prediction of valence from speech is an important, but challenging p...

Please sign up or login with your details

Forgot password? Click here to reset