Understanding the computational demands underlying visual reasoning

08/08/2021
by   Mohit Vaishnav, et al.
0

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the Synthetic Visual Reasoning Test (SVRT) challenge, a collection of twenty-three visual reasoning problems. Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in human's visual reasoning ability. To test this, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of experiments, we evaluated the ability of these attention networks to learn to solve the SVRT challenge and found the resulting architectures to be much more efficient at solving the hardest of these visual reasoning tasks. Most importantly, the corresponding improvements on individual tasks partially explained the taxonomy. Overall, this work advances our understanding of visual reasoning and yields testable Neuroscience predictions regarding the need for feature-based vs. spatial attention in visual reasoning.

READ FULL TEXT
research
06/26/2023

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

We investigate the role of attention and memory in complex reasoning tas...
research
02/09/2018

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

The robust and efficient recognition of visual relations in images is a ...
research
06/11/2022

A Benchmark for Compositional Visual Reasoning

A fundamental component of human vision is our ability to parse complex ...
research
06/10/2022

GAMR: A Guided Attention Model for (visual) Reasoning

Humans continue to outperform modern AI systems in their ability to flex...
research
11/14/2019

Attention on Abstract Visual Reasoning

Attention mechanisms have been boosting the performance of deep learning...
research
04/14/2023

The role of object-centric representations, guided attention, and external memory on generalizing visual relations

Visual reasoning is a long-term goal of vision research. In the last dec...
research
12/07/2017

Broadcasting Convolutional Network

While convolutional neural networks (CNNs) are widely used for handling ...

Please sign up or login with your details

Forgot password? Click here to reset