EEV Dataset: Predicting Expressions Evoked by Diverse Videos

01/15/2020
by   Jennifer J. Sun, et al.
5

When we watch videos, the visual and auditory information we experience can evoke a range of affective responses. The ability to automatically predict evoked affect from videos can help recommendation systems and social machines better interact with their users. Here, we introduce the Evoked Expressions in Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos based on their facial expressions. The dataset consists of a total of 4.8 million annotations of viewer facial reactions to 18,541 videos. We use a publicly available video corpus to obtain a diverse set of video content. The training split is fully machine-annotated, while the validation and test splits have both human and machine annotations. We verify the performance of our machine annotations with human raters to have an average precision of 73.3 We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Our results show that affective information can be learned from EEV, but with a MAP of 20.32 This gap motivates the need for new approaches for understanding affective content. Our transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing.

READ FULL TEXT

page 5

page 6

research
10/18/2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

In recent years, deep neural networks have demonstrated increasingly str...
research
10/13/2021

NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Deep learning has shown remarkable progress in a wide range of problems....
research
02/08/2018

Learning to score and summarize figure skating sport videos

This paper focuses on fully understanding the figure skating sport video...
research
06/27/2021

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

This paper introduces a new video-and-language dataset with human action...
research
08/27/2020

Rate distortion optimization over large scale video corpus with machine learning

We present an efficient codec-agnostic method for bitrate allocation ove...
research
06/16/2021

Temporal Convolution Networks with Positional Encoding for Evoked Expression Estimation

This paper presents an approach for Evoked Expressions from Videos (EEV)...
research
03/13/2015

The YLI-MED Corpus: Characteristics, Procedures, and Plans

The YLI Multimedia Event Detection corpus is a public-domain index of vi...

Please sign up or login with your details

Forgot password? Click here to reset