REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

07/27/2020
by   Siwen Luo, et al.
0

Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer. So far, most successful attempts in VQA have been focused on only one aspect, either the interaction of visual pixel features of images and word features of questions, or the reasoning process of answering the question in an image with simple objects. In this paper, we propose a deep reasoning VQA model with explicit visual structure-aware textual information, and it works well in capturing step-by-step reasoning process and detecting a complex object-relationship in photo-realistic images. REXUP network consists of two branches, image object-oriented and scene graph oriented, which jointly works with super-diagonal fusion compositional attention network. We quantitatively and qualitatively evaluate REXUP on the GQA dataset and conduct extensive ablation studies to explore the reasons behind REXUP's effectiveness. Our best model significantly outperforms the precious state-of-the-art, which delivers 92.7

READ FULL TEXT
research
06/25/2019

Deep Modular Co-Attention Networks for Visual Question Answering

Visual Question Answering (VQA) requires a fine-grained and simultaneous...
research
03/06/2022

Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (VQA) is a vision-language tas...
research
07/23/2019

Graph Reasoning Networks for Visual Question Answering

The interaction between language and visual information has been emphasi...
research
09/06/2018

Cascaded Mutual Modulation for Visual Reasoning

Visual reasoning is a special visual question answering problem that is ...
research
07/10/2017

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-rela...
research
09/20/2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

Charts are common in literature across different scientific fields, conv...
research
03/15/2022

Can you even tell left from right? Presenting a new challenge for VQA

Visual Question Answering (VQA) needs a means of evaluating the strength...

Please sign up or login with your details

Forgot password? Click here to reset