Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

03/03/2023
by   Zhenwei Shao, et al.
0

Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have sought to use a large language model (i.e., GPT-3) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of GPT-3 as the provided input information is insufficient. In this paper, we present Prophet – a conceptually simple framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. Specifically, we first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge. After that, we extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. Finally, the two types of answer heuristics are encoded into the prompts to enable GPT-3 to better comprehend the task thus enhancing its capacity. Prophet significantly outperforms all existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1 testing sets, respectively.

READ FULL TEXT

page 3

page 15

page 16

research
05/31/2019

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Visual Question Answering (VQA) in its ideal form lets us study reasonin...
research
08/30/2023

Prompting Vision Language Model with Knowledge from Large Language Model for Knowledge-Based VQA

Knowledge-based visual question answering is a very challenging and wide...
research
09/10/2021

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Knowledge-based visual question answering (VQA) involves answering quest...
research
06/30/2022

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Knowledge-based Visual Question Answering (VQA) expects models to rely o...
research
10/18/2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ...
research
10/07/2022

Retrieval Augmented Visual Question Answering with Outside Knowledge

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQ...
research
05/30/2023

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

The open-ended Visual Question Answering (VQA) task requires AI models t...

Please sign up or login with your details

Forgot password? Click here to reset