Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

09/14/2023
by   Jia Qi Yip, et al.
0

Despite recent strides made in Speech Separation, most models are trained on datasets with neutral emotions. Emotional speech has been known to degrade performance of models in a variety of speech tasks, which reduces the effectiveness of these models when deployed in real-world scenarios. In this paper we perform analysis to differentiate the performance degradation arising from the emotions in speech from the impact of out-of-domain inference. This is measured using a carefully designed test dataset, Emo2Mix, consisting of balanced data across all emotional combinations. We show that even models with strong out-of-domain performance such as Sepformer can still suffer significant degradation of up to 5.1 dB SI-SDRi on mixtures with strong emotions. This demonstrates the importance of accounting for emotions in real-world speech separation applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2022

Analysis of impact of emotions on target speech extraction and speech separation

Recently, the performance of blind speech separation (BSS) and target sp...
research
07/27/2020

Analysis of Emotional Content in Indian Political Speeches

Emotions play an essential role in public speaking. The emotional conten...
research
10/08/2020

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

Emotional state of a speaker is found to have significant effect in spee...
research
01/09/2021

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

We have applied two state-of-the-art speech synthesis techniques (unit s...
research
10/05/2016

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Besides spoken words, speech signals also carry information about speake...
research
08/27/2020

A Blast From the Past: Personalizing Predictions of Video-Induced Emotions using Personal Memories as Context

A key challenge in the accurate prediction of viewers' emotional respons...
research
03/08/2022

Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems

Recent advancements in deep learning have led to drastic improvements in...

Please sign up or login with your details

Forgot password? Click here to reset