Relation-aware Graph Attention Network for Visual Question Answering

03/29/2019
by   Linjie Li, et al.
0

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. Two types of visual object relations are explored: (i) Explicit Relations that represent geometric positions and semantic interactions between objects; and (ii) Implicit Relations that capture the hidden dynamics between image regions. Experiments demonstrate that ReGAT outperforms prior state-of-the-art approaches on both VQA 2.0 and VQA-CP v2 datasets. We further show that ReGAT is compatible to existing VQA architectures, and can be used as a generic relation encoder to boost the model performance for VQA.

READ FULL TEXT
research
12/16/2022

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

Most TextVQA approaches focus on the integration of objects, scene texts...
research
04/03/2022

Question-Driven Graph Fusion Network For Visual Question Answering

Existing Visual Question Answering (VQA) models have explored various vi...
research
04/03/2018

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

A key solution to visual question answering (VQA) exists in how to fuse ...
research
11/11/2021

Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture

Previous studies such as VizWiz find that Visual Question Answering (VQA...
research
02/09/2018

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

The robust and efficient recognition of visual relations in images is a ...
research
01/25/2022

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Visual Question Answering (VQA) attracts much attention from both indust...
research
07/28/2023

Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

Visual question answering (VQA) has the potential to make the Internet m...

Please sign up or login with your details

Forgot password? Click here to reset