Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

10/18/2022
by   Jialin Wu, et al.
0

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge given the visual question and then predicts the answer based on the retrieved content. However, the retrieved knowledge is often inadequate. Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question. Also, the naturally available supervision (whether the passage contains the correct answer) is weak and does not guarantee question relevancy. To address these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge. Experiments show that our EnFoRe model achieves superior retrieval performance on OK-VQA, the currently largest outside-knowledge VQA dataset. We also combine the retrieved knowledge with state-of-the-art VQA models, and achieve a new state-of-the-art performance on OK-VQA.

READ FULL TEXT

page 1

page 9

page 12

research
10/07/2022

Retrieval Augmented Visual Question Answering with Outside Knowledge

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQ...
research
06/02/2022

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

This paper revisits visual representation in knowledge-based visual ques...
research
03/03/2023

Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (VQA) requires external knowle...
research
03/08/2023

Interpretable Visual Question Answering Referring to Outside Knowledge

We present a novel multimodal interpretable VQA model that can answer th...
research
04/26/2023

A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering

Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answeri...
research
03/24/2022

Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Transformer-based approaches have shown great success in visual question...
research
06/15/2023

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

We propose Encyclopedic-VQA, a large scale visual question answering (VQ...

Please sign up or login with your details

Forgot password? Click here to reset