EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification

08/15/2020
by   Shuiyang Mao, et al.
0

Human emotional speech is, by its very nature, a variant signal. This results in dynamics intrinsic to automatic emotion classification based on speech. In this work, we explore a spectral decomposition method stemming from fluid-dynamics, known as Dynamic Mode Decomposition (DMD), to computationally represent and analyze the global utterance-level dynamics of emotional speech. Specifically, segment-level emotion-specific representations are first learned through an Emotion Distillation process. This forms a multi-dimensional signal of emotion flow for each utterance, called Emotion Profiles (EPs). The DMD algorithm is then applied to the resultant EPs to capture the eigenfrequencies, and hence the fundamental transition dynamics of the emotion flow. Evaluation experiments using the proposed approach, which we call EigenEmo, show promising results. Moreover, due to the positive combination of their complementary properties, concatenating the utterance representations generated by EigenEmo with simple EPs averaging yields noticeable gains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Emotion Profile Refinery for Speech Emotion Classification

Human emotions are inherently ambiguous and impure. When designing syste...
research
06/02/2023

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spo...
research
08/15/2020

Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition

Categorical speech emotion recognition is typically performed as a seque...
research
03/30/2021

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

Despite the widespread utilization of deep neural networks (DNNs) for sp...
research
11/15/2015

Learning Representations of Affect from Speech

There has been a lot of prior work on representation learning for speech...
research
03/14/2020

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

In English, prosody adds a broad range of information to segment sequenc...
research
03/01/2021

Emotion Dynamics in Movie Dialogues

Emotion dynamics is a framework for measuring how an individual's emotio...

Please sign up or login with your details

Forgot password? Click here to reset