Neural Module Networks

11/09/2015
by   Jacob Andreas, et al.
0

Visual question answering is fundamentally compositional in nature---a question like "where is the dog?" shares substructure with questions like "what color is the dog?" and "where is the cat?" This paper seeks to simultaneously exploit the representational capacity of deep networks and the compositional linguistic structure of questions. We describe a procedure for constructing and learning *neural module networks*, which compose collections of jointly-trained neural "modules" into deep networks for question answering. Our approach decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). The resulting compound networks are jointly trained. We evaluate our approach on two challenging datasets for visual question answering, achieving state-of-the-art results on both the VQA natural image dataset and a new dataset of complex questions about abstract shapes.

READ FULL TEXT

page 4

page 8

research
09/23/2018

Textually Enriched Neural Module Networks for Visual Question Answering

Problems at the intersection of language and vision, like visual questio...
research
05/15/2021

Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

Visual Question Answering (VQA) models have achieved significant success...
research
04/29/2019

Routing Networks and the Challenges of Modular and Compositional Computation

Compositionality is a key strategy for addressing combinatorial complexi...
research
12/10/2015

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

In this paper we consider the problem of continuously discovering image ...
research
05/03/2021

Iterated learning for emergent systematicity in VQA

Although neural module networks have an architectural bias towards compo...
research
01/07/2016

Learning to Compose Neural Networks for Question Answering

We describe a question answering model that applies to both images and s...
research
07/21/2022

Semantic-aware Modular Capsule Routing for Visual Question Answering

Visual Question Answering (VQA) is fundamentally compositional in nature...

Please sign up or login with your details

Forgot password? Click here to reset