Speaker-invariant Affective Representation Learning via Adversarial Training

11/04/2019
by   Haoqi Li, et al.
0

Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. Specifically, we propose to disentangle the speaker characteristics from emotion through an adversarial training network in order to better represent emotion. Our method combines the gradient reversal technique with an entropy loss function to remove such speaker information. Our approach is evaluated on both IEMOCAP and CMU-MOSEI datasets. We show that our method improves speech emotion classification and increases generalization to unseen speakers.

READ FULL TEXT
research
03/22/2019

Towards adversarial learning of speaker-invariant representation for speech emotion recognition

Speech emotion recognition (SER) has attracted great attention in recent...
research
02/02/2022

Speaker Normalization for Self-supervised Speech Emotion Recognition

Large speech emotion recognition datasets are hard to obtain, and small ...
research
10/25/2019

Learning Domain Invariant Representations for Child-Adult Classification from Speech

Diagnostic procedures for ASD (autism spectrum disorder) involve semi-na...
research
04/05/2021

Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition

Key challenges in developing generalized automatic emotion recognition s...
research
01/02/2020

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Research on speech processing has traditionally considered the task of d...
research
06/05/2021

An Attribute-Aligned Strategy for Learning Speech Representation

Advancement in speech technology has brought convenience to our life. Ho...
research
10/26/2022

Effect of different splitting criteria on the performance of speech emotion recognition

Traditional speech emotion recognition (SER) evaluations have been perfo...

Please sign up or login with your details

Forgot password? Click here to reset