Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

08/09/2022
by   Shijun Wang, et al.
0

Speech Emotion Recognition (SER) is crucial for human-computer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8 score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2018

Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages

Cross-lingual speech emotion recognition is an important task for practi...
research
02/21/2023

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Stuttering is a neuro-developmental speech impairment characterized by u...
research
07/14/2022

Semi-supervised cross-lingual speech emotion recognition

Speech emotion recognition (SER) on a single language has achieved remar...
research
06/09/2023

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

Effective speech emotional representations play a key role in Speech Emo...
research
07/13/2019

Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition

Cross-lingual speech emotion recognition (SER) is a crucial task for man...
research
09/18/2021

Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

Speech emotion recognition (SER) has been one of the significant tasks i...
research
08/10/2022

Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Emotions (e.g., Joy, Anger) are prevalent in daily software engineering ...

Please sign up or login with your details

Forgot password? Click here to reset