Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention

03/28/2016
by   Linlin Chao, et al.
0

This paper focuses on two key problems for audio-visual emotion recognition in the video. One is the audio and visual streams temporal alignment for feature level fusion. The other one is locating and re-weighting the perception attentions in the whole audio-visual stream for better recognition. The Long Short Term Memory Recurrent Neural Network (LSTM-RNN) is employed as the main classification architecture. Firstly, soft attention mechanism aligns the audio and visual streams. Secondly, seven emotion embedding vectors, which are corresponding to each classification emotion type, are added to locate the perception attentions. The locating and re-weighting process is also based on the soft attention mechanism. The experiment results on EmotiW2015 dataset and the qualitative analysis show the efficiency of the proposed two techniques.

READ FULL TEXT
research
11/17/2021

Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Multimodal emotion recognition is a challenging task in emotion computin...
research
01/15/2019

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition

Automatic emotion recognition (AER) is a challenging task due to the abs...
research
04/03/2017

Spatiotemporal Networks for Video Emotion Recognition

Our experiment adapts several popular deep learning methods as well as s...
research
02/20/2020

Audio-video Emotion Recognition in the Wild using Deep Hybrid Networks

This paper presents an audiovisual-based emotion recognition hybrid netw...
research
04/28/2020

Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition

Multimodal dimensional emotion recognition has drawn a great attention f...
research
10/27/2017

Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition

Long short-term memory (LSTM) is normally used in recurrent neural netwo...
research
09/02/2022

TB or not TB? Acoustic cough analysis for tuberculosis classification

In this work, we explore recurrent neural network architectures for tube...

Please sign up or login with your details

Forgot password? Click here to reset