Interpretable Neural Computation for Real-World Compositional Visual Question Answering

10/10/2020
by   Ruixue Tang, et al.
0

There are two main lines of research on visual question answering (VQA): compositional model with explicit multi-hop reasoning, and monolithic network with implicit reasoning in the latent feature space. The former excels in interpretability and compositionality but fails on real-world images, while the latter usually achieves better performance due to model flexibility and parameter efficiency. We aim to combine the two to build an interpretable framework for real-world compositional VQA. In our framework, images and questions are disentangled into scene graphs and programs, and a symbolic program executor runs on them with full transparency to select the attention regions, which are then iteratively passed to a visual-linguistic pre-trained encoder to predict answers. Experiments conducted on the GQA benchmark demonstrate that our framework outperforms the compositional prior arts and achieves competitive accuracy among monolithic ones. With respect to the validity, plausibility and distribution metrics, our framework surpasses others by a considerable margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Visual reasoning tasks such as visual question answering (VQA) require a...
research
10/08/2019

Meta Module Network for Compositional Visual Reasoning

There are two main lines of research on visual reasoning: neural module ...
research
10/08/2019

Modulated Self-attention Convolutional Network for VQA

As new data-sets for real-world visual reasoning and compositional quest...
research
06/28/2021

Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs

With the expressed goal of improving system transparency and visual grou...
research
02/25/2019

GQA: a new dataset for compositional question answering over real-world images

We introduce GQA, a new dataset for real-world visual reasoning and comp...
research
10/03/2022

A Hybrid Compositional Reasoning Approach for Interactive Robot Manipulation

In this paper we present a neuro-symbolic (hybrid) compositional reasoni...
research
02/21/2019

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

We propose a new class of probabilistic neural-symbolic models, that hav...

Please sign up or login with your details

Forgot password? Click here to reset