An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

02/12/2020
by   Sicheng Zhao, et al.
0

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

READ FULL TEXT
research
09/11/2019

PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression

Existing methods on visual emotion analysis mainly focus on coarse-grain...
research
08/08/2020

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Visual emotion expression plays an important role in audiovisual speech ...
research
09/19/2022

Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

Driver distraction detection is an important computer vision problem tha...
research
08/06/2023

StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning

Emotion distribution learning has gained increasing attention with the t...
research
08/22/2020

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Considerable attention has been paid for physiological signal-based emot...
research
03/20/2016

Modelling Temporal Information Using Discrete Fourier Transform for Recognizing Emotions in User-generated Videos

With the widespread of user-generated Internet videos, emotion recogniti...
research
10/26/2021

Emotion recognition in talking-face videos using persistent entropy and neural networks

The automatic recognition of a person's emotional state has become a ver...

Please sign up or login with your details

Forgot password? Click here to reset