Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

06/09/2023
by   Shijun Wang, et al.
7

Effective speech emotional representations play a key role in Speech Emotion Recognition (SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional speech samples are more difficult and expensive to acquire compared with Neutral style speech, which causes one issue that most related works unfortunately neglect: imbalanced datasets. Models might overfit to the majority Neutral class and fail to produce robust and effective emotional representations. In this paper, we propose an Emotion Extractor to address this issue. We use augmentation approaches to train the model and enable it to extract effective and generalizable emotional representations from imbalanced datasets. Our empirical results show that (1) for the SER task, the proposed Emotion Extractor surpasses the state-of-the-art baseline on three imbalanced datasets; (2) the produced representations from our Emotion Extractor benefit the TTS model, and enable it to synthesize more expressive speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2023

Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation

Despite advances in deep learning, current state-of-the-art speech emoti...
research
02/17/2023

Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition

In speech emotion recognition tasks, models learn emotional representati...
research
03/15/2023

Reevaluating Data Partitioning for Emotion Detection in EmoWOZ

This paper focuses on the EmoWoz dataset, an extension of MultiWOZ that ...
research
07/12/2023

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Despite recent advancements in speech emotion recognition (SER) models, ...
research
10/07/2021

SERAB: A multi-lingual benchmark for speech emotion recognition

Recent developments in speech emotion recognition (SER) often leverage d...
research
08/09/2022

Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

Speech Emotion Recognition (SER) is crucial for human-computer interacti...
research
11/14/2022

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Speech emotion recognition (SER) plays a vital role in improving the int...

Please sign up or login with your details

Forgot password? Click here to reset