Speech Emotion Diarization: Which Emotion Appears When?

06/22/2023
by   Yingzhi Wang, et al.
0

Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be considered as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To reflect the fine-grained nature of speech emotions, we propose a new task: Speech Emotion Diarization (SED). Just as Speaker Diarization answers the question of "Who speaks when?", Speech Emotion Diarization answers the question of "Which emotion appears when?". To facilitate the evaluation of the performance and establish a common benchmark for researchers, we introduce the Zaion Emotion Dataset (ZED), an openly accessible speech emotion dataset that includes non-acted emotions recorded in real-life conditions, along with manually-annotated boundaries of emotion segments within the utterance. We provide competitive baselines and open-source the code and the pre-trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Learning Fine-Grained Multimodal Alignment for Speech Emotion Recognition

Speech emotion recognition is a challenging task because the emotion exp...
research
10/18/2019

Indian EmoSpeech Command Dataset: A dataset for emotion based speech recognition in the wild

Speech emotion analysis is an important task which further enables sever...
research
03/03/2022

Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition

Automatic emotion recognition for real-life appli-cations is a challengi...
research
03/23/2022

Chat-Capsule: A Hierarchical Capsule for Dialog-level Emotion Analysis

Many studies on dialog emotion analysis focus on utterance-level emotion...
research
01/02/2023

EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Vocal Bursts – short, non-speech vocalizations that convey emotions, suc...
research
06/30/2023

Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition

Speech emotion recognition (SER) is vital for obtaining emotional intell...
research
03/27/2019

MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Emotion recognition algorithms rely on data annotated with high quality ...

Please sign up or login with your details

Forgot password? Click here to reset