Framewise approach in multimodal emotion recognition in OMG challenge

05/03/2018
by   Grigoriy Sterling, et al.
0

In this report we described our approach achieves 53% of unweighted accuracy over 7 emotions and 0.05 and 0.09 mean squared errors for arousal and valence in OMG emotion recognition challenge. Our results were obtained with ensemble of single modality models trained on voice and face data from video separately. We consider each stream as a sequence of frames. Next we estimated features from frames and handle it with recurrent neural network. As audio frame we mean short 0.4 second spectrogram interval. For features estimation for face pictures we used own ResNet neural network pretrained on AffectNet database. Each short spectrogram was considered as a picture and processed by convolutional network too. As a base audio model we used ResNet pretrained in speaker recognition task. Predictions from both modalities were fused on decision level and improve single-channel approaches by a few percent

READ FULL TEXT
research
08/24/2022

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

With the fast development of artificial intelligence and short videos, e...
research
11/13/2017

Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video

In this paper we describe a solution to our entry for the emotion recogn...
research
03/05/2015

EmoNets: Multimodal deep learning approaches for emotion recognition in video

The task of the emotion recognition in the wild (EmotiW) Challenge is to...
research
11/10/2021

Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention

Classifying group-level emotions is a challenging task due to complexity...
research
11/17/2021

Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Multimodal emotion recognition is a challenging task in emotion computin...
research
05/03/2018

Dimensional emotion recognition using visual and textual cues

This paper addresses the problem of automatic emotion recognition in the...
research
07/14/2020

DeepMSRF: A novel Deep Multimodal Speaker Recognition framework with Feature selection

For recognizing speakers in video streams, significant research studies ...

Please sign up or login with your details

Forgot password? Click here to reset