DeepAI AI Chat
Log In Sign Up

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

by   Zong Xuan Tan, et al.

People naturally understand the emotions of-and often also empathize with-those around them. In this paper, we predict the emotional valence of an empathic listener over time as they listen to a speaker narrating a life story. We use the dataset provided by the OMG-Empathy Prediction Challenge, a workshop held in conjunction with IEEE FG 2019. We present a multimodal LSTM model with feature-level fusion and local attention that predicts empathic responses from audio, text, and visual features. Our best-performing model, which used only the audio and text features, achieved a concordance correlation coefficient (CCC) of 0.29 and 0.32 on the Validation set for the Generalized and Personalized track respectively, and achieved a CCC of 0.14 and 0.14 on the held-out Test set. We discuss the difficulties faced and the lessons learnt tackling this challenge.


page 1

page 2

page 3

page 4


Hybrid Mutimodal Fusion for Dimensional Emotion Recognition

In this paper, we extensively present our solutions for the MuSe-Stress ...

Continuous Multimodal Emotion Recognition Approach for AVEC 2017

This paper reports the analysis of audio and visual features in predicti...

Multimodal Deep Models for Predicting Affective Responses Evoked by Movies

The goal of this study is to develop and analyze multimodal models for p...

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

In this paper, we present our solutions for the Multimodal Sentiment Ana...

Emotional Reaction Intensity Estimation Based on Multimodal Data

This paper introduces our method for the Emotional Reaction Intensity (E...

Fashion-IQ 2020 Challenge 2nd Place Team's Solution

This paper is dedicated to team VAA's approach submitted to the Fashion-...

Code Repositories


Repository for the A*AI Team's submission to the OMG-Empathy Challenge 2019

view repo