Learning Visual Reasoning Without Strong Priors

07/10/2017
by   Ethan Perez, et al.
0

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4 error rate. We outperform the next best end-to-end method (4.5 methods that use extra supervision (3.1 how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

READ FULL TEXT
research
09/22/2017

FiLM: Visual Reasoning with a General Conditioning Layer

We introduce a general-purpose conditioning method for neural networks c...
research
07/27/2020

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Visual question answering (VQA) is a challenging multi-modal task that r...
research
04/12/2021

Visual Goal-Step Inference using wikiHow

Procedural events can often be thought of as a high level goal composed ...
research
06/30/2023

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

Large language models have recently shown human level performance on a v...
research
06/14/2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Recent research on Large Language Models (LLMs) has led to remarkable ad...
research
04/18/2018

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g.,...
research
04/02/2021

VisQA: X-raying Vision and Language Reasoning in Transformers

Visual Question Answering systems target answering open-ended textual qu...

Please sign up or login with your details

Forgot password? Click here to reset