A Multi-Modal Approach to Infer Image Affect

03/13/2018
by   Ashok Sundaresan, et al.
0

The group affect or emotion in an image of people can be inferred by extracting features about both the people in the picture and the overall makeup of the scene. The state-of-the-art on this problem investigates a combination of facial features, scene extraction and even audio tonality. This paper combines three additional modalities, namely, human pose, text-based tagging and CNN extracted features / predictions. To the best of our knowledge, this is the first time all of the modalities were extracted using deep neural networks. We evaluate the performance of our approach against baselines and identify insights throughout this paper.

READ FULL TEXT

page 3

page 5

research
10/12/2016

Analyzing the Affect of a Group of People Using Multi-modal Framework

Millions of images on the web enable us to explore images from social ev...
research
04/11/2019

Learning to Take Good Pictures of People with a Robot Photographer

We present a robotic system capable of navigating autonomously by follow...
research
02/26/2020

Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling

Continuous affect prediction in the wild is a very interesting problem a...
research
05/17/2018

Affective computing using speech and eye gaze: a review and bimodal system proposal for continuous affect prediction

Speech has been a widely used modality in the field of affective computi...
research
09/23/2020

Attention Driven Fusion for Multi-Modal Emotion Recognition

Deep learning has emerged as a powerful alternative to hand-crafted meth...
research
02/27/2019

Dynamic Deep Multi-modal Fusion for Image Privacy Prediction

With millions of images that are shared online on social networking site...

Please sign up or login with your details

Forgot password? Click here to reset