Retrieval Augmented Visual Question Answering with Outside Knowledge

10/07/2022
by   Weizhe Lin, et al.
0

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation, introducing a potential limit on the overall system performance. Instead, we propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion. Our experiments show that our scheme outperforms recent OK-VQA systems with strong DPR for retrieval. We also introduce new diagnostic metrics to analyze how retrieval and generation interact. The strong retrieval ability of our model significantly reduces the number of retrieved documents needed in training, yielding significant benefits in answer quality and computation required for training.

READ FULL TEXT
research
10/18/2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ...
research
06/28/2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

This paper studies a category of visual question answering tasks, in whi...
research
03/03/2023

Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (VQA) requires external knowle...
research
09/10/2021

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Knowledge-based visual question answering (VQA) involves answering quest...
research
04/17/2021

Joint Passage Ranking for Diverse Multi-Answer Retrieval

We study multi-answer retrieval, an under-explored problem that requires...
research
04/26/2023

A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering

Knowledge-Intensive Visual Question Answering (KI-VQA) refers to answeri...
research
12/16/2016

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

One of the most intriguing features of the Visual Question Answering (VQ...

Please sign up or login with your details

Forgot password? Click here to reset