Comparing Visual Reasoning in Humans and AI

04/29/2021
by   Shravan Murlidaran, et al.
0

Recent advances in natural language processing and computer vision have led to AI models that interpret simple scenes at human levels. Yet, we do not have a complete understanding of how humans and AI models differ in their interpretation of more complex scenes. We created a dataset of complex scenes that contained human behaviors and social interactions. AI and humans had to describe the scenes with a sentence. We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene. Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes. Using an experimental manipulation that occludes different spatial regions of the scenes, we assessed how machines and humans vary in utilizing regions of images to understand the scenes. Together, our results are a first step toward understanding how machines fall short of human visual reasoning with complex scenes depicting human behaviors.

READ FULL TEXT

page 2

page 4

research
12/14/2015

We Are Humor Beings: Understanding and Predicting Visual Humor

Humor is an integral part of human lives. Despite being tremendously imp...
research
05/23/2015

Text to 3D Scene Generation with Rich Lexical Grounding

The ability to map descriptions of scenes to 3D geometric representation...
research
01/27/2023

Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?

An important milestone for AI is the development of algorithms that can ...
research
09/04/2018

Text2Scene: Generating Abstract Scenes from Textual Descriptions

In this paper, we propose an end-to-end model that learns to interpret n...
research
09/24/2022

Deep Neural Networks for Visual Reasoning

Visual perception and language understanding are - fundamental component...
research
03/13/2023

Contextually-rich human affect perception using multimodal scene information

The process of human affect understanding involves the ability to infer ...
research
09/23/2022

Semantic scene descriptions as an objective of human vision

Interpreting the meaning of a visual scene requires not only identificat...

Please sign up or login with your details

Forgot password? Click here to reset