Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

02/09/2018
by   Matthew Ricci, et al.
0

The robust and efficient recognition of visual relations in images is a hallmark of biological vision. Here, we argue that, despite recent progress in visual recognition, modern machine vision algorithms are severely limited in their ability to learn visual relations. Through controlled experiments, we demonstrate that visual-relation problems strain convolutional neural networks (CNNs). The networks eventually break altogether when rote memorization becomes impossible such as when the intra-class variability exceeds their capacity. We further show that another type of feedforward network, called a relational network (RN), which was shown to successfully solve seemingly difficult visual question answering (VQA) problems on the CLEVR datasets, suffers similar limitations. Motivated by the comparable success of biological vision, we argue that feedback mechanisms including working memory and attention are the key computational components underlying abstract visual reasoning.

READ FULL TEXT
research
08/08/2021

Understanding the computational demands underlying visual reasoning

Visual understanding requires comprehending complex visual relations bet...
research
06/05/2017

A simple neural network module for relational reasoning

Relational reasoning is a central component of generally intelligent beh...
research
03/29/2019

Relation-aware Graph Attention Network for Visual Question Answering

In order to answer semantically-complicated questions about an image, a ...
research
03/07/2019

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Dramatic progress has been witnessed in basic vision tasks involving low...
research
07/31/2017

Capacity limitations of visual search in deep convolutional neural network

Deep convolutional neural networks follow roughly the architecture of bi...
research
04/14/2023

The role of object-centric representations, guided attention, and external memory on generalizing visual relations

Visual reasoning is a long-term goal of vision research. In the last dec...
research
12/07/2017

Broadcasting Convolutional Network

While convolutional neural networks (CNNs) are widely used for handling ...

Please sign up or login with your details

Forgot password? Click here to reset