Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation

01/10/2023
by   Abdullah Shahid, et al.
0

Despite advances in deep learning, current state-of-the-art speech emotion recognition (SER) systems still have poor performance due to a lack of speech emotion datasets. This paper proposes augmenting SER systems with synthetic emotional speech generated by an end-to-end text-to-speech (TTS) system based on an extended Tacotron architecture. The proposed TTS system includes encoders for speaker and emotion embeddings, a sequence-to-sequence text generator for creating Mel-spectrograms, and a WaveRNN to generate audio from the Mel-spectrograms. Extensive experiments show that the quality of the generated emotional speech can significantly improve SER performance on multiple datasets, as demonstrated by a higher mean opinion score (MOS) compared to the baseline. The generated samples were also effective at augmenting SER performance.

READ FULL TEXT

page 6

page 7

page 8

research
06/09/2023

Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

Effective speech emotional representations play a key role in Speech Emo...
research
09/19/2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

In this paper, we explored how to boost speech emotion recognition (SER)...
research
09/15/2023

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Significant advances are being made in speech emotion recognition (SER) ...
research
05/19/2023

A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

In this paper, we propose to utilise diffusion models for data augmentat...
research
07/12/2023

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

Despite recent advancements in speech emotion recognition (SER) models, ...
research
11/15/2017

Emotional End-to-End Neural Speech Synthesizer

In this paper, we introduce an emotional speech synthesizer based on the...
research
04/03/2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Emotional text-to-speech synthesis (ETTS) has seen much progress in rece...

Please sign up or login with your details

Forgot password? Click here to reset