AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies

10/21/2020
by   Ha Thi Phuong Thao, et al.
0

In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet. We take both audio and video into account and incorporate the relation among multiple modalities by applying self-attention mechanism in a novel manner into the extracted features for emotion prediction. We compare it to the typically temporal integration of the self-attention based model, which in our case, allows to capture the relation of temporal representations of the movie while considering the sequential dependencies of emotion responses. We demonstrate the effectiveness of our proposed architectures on the extended COGNIMUSE dataset [1], [2] and the MediaEval 2016 Emotional Impact of Movies Task [3], which consist of movies with emotion annotations. Our results show that applying the self-attention mechanism on the different audio-visual features, rather than in the time domain, is more effective for emotion prediction. Our approach is also proven to outperform many state-ofthe-art models for emotion prediction. The code to reproduce our results with the models' implementation is available at: https://github.com/ivyha010/AttendAffectNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
02/18/2022

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Humans express their emotions via facial expressions, voice intonation a...
research
11/11/2019

Visualizing and Understanding Self-attention based Music Tagging

Recently, we proposed a self-attention based music tagging model. Differ...
research
08/26/2021

Self-Attention for Audio Super-Resolution

Convolutions operate only locally, thus failing to model global interact...
research
03/18/2023

Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

The continuous improvement of human-computer interaction technology make...
research
02/15/2022

Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model

Depression is a global mental health problem, the worst case of which ca...
research
02/28/2018

Pop Music Highlighter: Marking the Emotion Keypoints

The goal of music highlight extraction is to get a short consecutive seg...
research
05/02/2018

Fast Directional Self-Attention Mechanism

In this paper, we propose a self-attention mechanism, dubbed "fast direc...

Please sign up or login with your details

Forgot password? Click here to reset