Learning Conditioned Graph Structures for Interpretable Visual Question Answering

06/19/2018
by   Will Norcliffe-Brown, et al.
0

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are consequently merged using a variety of techniques. Nonetheless, very few rely on higher level image representations, which allow to capture semantic and spatial relationships. In this paper, we propose a novel graph-based approach for Visual Question Answering. Our method combines a graph learner module, which learns a question specific graph representation of the input image, with the recent concept of graph convolutions, aiming to learn image representations that capture question specific interactions. We test our approach on the VQA v2 dataset using a simple baseline architecture enhanced by the proposed graph learner module. We obtain state of the art results with 65.77% accuracy and demonstrate the interpretability of the proposed method.

READ FULL TEXT

page 2

page 7

page 10

page 11

research
07/02/2020

Scene Graph Reasoning for Visual Question Answering

Visual question answering is concerned with answering free-form question...
research
07/13/2021

Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering

Visual Question Answering (VQA) is concerned with answering free-form qu...
research
03/06/2022

Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (VQA) is a vision-language tas...
research
12/07/2015

Simple Baseline for Visual Question Answering

We describe a very simple bag-of-words baseline for visual question answ...
research
04/29/2021

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

This paper presents a novel method, termed Bridge to Answer, to infer co...
research
02/15/2022

Privacy Preserving Visual Question Answering

We introduce a novel privacy-preserving methodology for performing Visua...
research
07/21/2022

Semantic-aware Modular Capsule Routing for Visual Question Answering

Visual Question Answering (VQA) is fundamentally compositional in nature...

Please sign up or login with your details

Forgot password? Click here to reset