Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM

10/26/2022
by   Bagus Tris Atmaja, et al.
0

Automatic speech emotion recognition (SER) by a computer is a critical component for more natural human-machine interaction. As in human-human interaction, the capability to perceive emotion correctly is essential to take further steps in a particular situation. One issue in SER is whether it is necessary to combine acoustic features with other data such as facial expressions, text, and motion capture. This research proposes to combine acoustic and text information by applying a late-fusion approach consisting of two steps. First, acoustic and text features are trained separately in deep learning systems. Second, the prediction results from the deep learning systems are fed into a support vector machine (SVM) to predict the final regression score. Furthermore, the task in this research is dimensional emotion modeling because it can enable a deeper analysis of affective states. Experimental results show that this two-stage, late-fusion approach, obtains higher performance than that of any one-stage processing, with a linear correlation from one-stage to two-stage processing. This late-fusion approach improves previous early fusion results measured in concordance correlation coefficients score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2018

Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge

The continuous dimensional emotion modelled by arousal and valence can d...
research
02/04/2020

Emotion Recognition Using Speaker Cues

This research aims at identifying the unknown emotion using speaker cues...
research
09/23/2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Deep learning has emerged as a powerful alternative to hand-crafted meth...
research
10/31/2018

Deep Net Features for Complex Emotion Recognition

This paper investigates the influence of different acoustic features, au...
research
03/30/2022

Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities

We present a novel feasibility study on the automatic recognition of Exp...
research
01/31/2019

On Intra-Class Variance for Deep Learning of Classifiers

Several computer algorithms for recognition of visible human emotions ar...
research
08/11/2019

Emotion Dependent Facial Animation from Affective Speech

In human-to-computer interaction, facial animation in synchrony with aff...

Please sign up or login with your details

Forgot password? Click here to reset