An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

08/08/2018
by   Valentin Vielzeuf, et al.
0

This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 he Emotion in the Wild 2018 challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2019

Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition

This paper presents a novel deep neural network (DNN) for multimodal fus...
research
05/03/2018

A Multi-component CNN-RNN Approach for Dimensional Emotion Recognition in-the-wild

This paper presents our approach to the One-Minute Gradual-Emotion Recog...
research
03/30/2016

Exploiting Facial Landmarks for Emotion Recognition in the Wild

In this paper, we describe an entry to the third Emotion Recognition in ...
research
09/12/2017

Emotion Recognition in the Wild using Deep Neural Networks and Bayesian Classifiers

Group emotion recognition in the wild is a challenging problem, due to t...
research
04/14/2021

Unsupervised low-rank representations for speech emotion recognition

We examine the use of linear and non-linear dimensionality reduction alg...
research
11/28/2018

Non-Volume Preserving-based Feature Fusion Approach to Group-Level Expression Recognition on Crowd Videos

Group-level emotion recognition (ER) is a growing research area as the d...
research
02/05/2022

LEAPMood: Light and Efficient Architecture to Predict Mood with Genetic Algorithm driven Hyperparameter Tuning

Accurate and automatic detection of mood serves as a building block for ...

Please sign up or login with your details

Forgot password? Click here to reset