REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

06/02/2022
by   Yuanze Lin, et al.
5

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent. Based on these observations, we propose a new knowledge-based VQA method REVIVE, which tries to utilize the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. The key motivation is that object regions and inherent relationships are important for knowledge-based VQA. We perform extensive experiments on the standard OK-VQA dataset and achieve new state-of-the-art performance, i.e., 58.0 state-of-the-art method by a large margin (+3.6 analysis and show the necessity of regional information in different framework components for knowledge-based VQA.

READ FULL TEXT

page 9

page 13

research
03/01/2019

Answer Them All! Toward Universal Visual Question Answering Models

Visual Question Answering (VQA) research is split into two camps: the fi...
research
10/18/2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ...
research
10/29/2019

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors b...
research
09/17/2019

Inverse Visual Question Answering with Multi-Level Attentions

In this paper, we propose a novel deep multi-level attention model to ad...
research
11/11/2022

MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering

There is a key problem in the medical visual question answering task tha...
research
03/09/2023

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

State recognition of objects and environment in robots has been conducte...
research
06/04/2022

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

Recently, attention-based Visual Question Answering (VQA) has achieved g...

Please sign up or login with your details

Forgot password? Click here to reset