Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition

08/15/2020
by   Shuiyang Mao, et al.
0

Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.

READ FULL TEXT
research
03/30/2021

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

Despite the widespread utilization of deep neural networks (DNNs) for sp...
research
08/12/2020

Emotion Profile Refinery for Speech Emotion Classification

Human emotions are inherently ambiguous and impure. When designing syste...
research
11/24/2021

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

The way that humans encode their emotion into speech signals is complex....
research
06/02/2023

Learning Local to Global Feature Aggregation for Speech Emotion Recognition

Transformer has emerged in speech emotion recognition (SER) at present. ...
research
09/30/2022

End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning

To train machine learning algorithms to predict emotional expressions in...
research
06/30/2023

Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition

Speech emotion recognition (SER) is vital for obtaining emotional intell...
research
08/15/2020

EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification

Human emotional speech is, by its very nature, a variant signal. This re...

Please sign up or login with your details

Forgot password? Click here to reset