Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

12/13/2021
by   Sebastian P. Bayerl, et al.
0

Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task.

READ FULL TEXT
research
02/27/2020

Annotation of Emotion Carriers in Personal Narratives

We are interested in the problem of understanding personal narratives (P...
research
12/07/2022

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Entrainment is the phenomenon by which an interlocutor adapts their spea...
research
04/13/2021

Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion

Textual escalation detection has been widely applied to e-commerce compa...
research
02/20/2019

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and li...
research
05/04/2023

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

It is challenging to extract semantic meanings directly from audio signa...
research
06/02/2023

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spo...
research
11/13/2020

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the c...

Please sign up or login with your details

Forgot password? Click here to reset