Interpretable Visual Question Answering by Reasoning on Dependency Trees

09/06/2018
by   Qingxing Cao, et al.
0

Collaborative reasoning for understanding each image-question pair is very critical but underexplored for an interpretable visual question answering system. Although very recent works also attempted to use explicit compositional processes to assemble multiple subtasks embedded in the questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, leading to either heavy workloads or poor performance on composition reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question, and we thus phrase our model as parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module to exploit the local visual evidence for each word parsed from the question, ii) a gated residual composition module to compose the previously mined evidence, and iii) a parse-tree-guided propagation module to pass the mined evidence along the parse tree. Our PTGRN is thus capable of building an interpretable VQA system that gradually derives the image cues following a question-driven parse-tree reasoning route. Experiments on relational datasets demonstrate the superiority of our PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.

READ FULL TEXT
research
03/31/2018

Visual Question Reasoning on General Dependency Tree

The collaborative reasoning for understanding each image-question pair i...
research
03/23/2020

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

Recently, studies of visual question answering have explored various arc...
research
09/07/2023

Interpretable Visual Question Answering via Reasoning Supervision

Transformer-based architectures have recently demonstrated remarkable pe...
research
05/15/2021

Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

Visual Question Answering (VQA) models have achieved significant success...
research
09/17/2023

Syntax Tree Constrained Graph Network for Visual Question Answering

Visual Question Answering (VQA) aims to automatically answer natural lan...
research
06/06/2018

Progressive Reasoning by Module Composition

Humans learn to solve tasks of increasing complexity by building on top ...
research
03/14/2018

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Visual question answering requires high-order reasoning about an image, ...

Please sign up or login with your details

Forgot password? Click here to reset