Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering

06/16/2023
by   Rabiul Awal, et al.
0

Visual question answering (VQA) is a challenging task that requires the ability to comprehend and reason with visual information. While recent vision-language models have made strides, they continue to struggle with zero-shot VQA, particularly in handling complex compositional questions and adapting to new domains i.e. knowledge-based reasoning. This paper explores the use of various prompting strategies, focusing on the BLIP2 model, to enhance zero-shot VQA performance. We conduct a comprehensive investigation across several VQA datasets, examining the effectiveness of different question templates, the role of few-shot exemplars, the impact of chain-of-thought (CoT) reasoning, and the benefits of incorporating image captions as additional visual cues. Despite the varied outcomes, our findings demonstrate that carefully designed question templates and the integration of additional visual cues, like image captions, can contribute to improved VQA performance, especially when used in conjunction with few-shot examples. However, we also identify a limitation in the use of chain-of-thought rationalization, which negatively affects VQA accuracy. Our study thus provides critical insights into the potential of prompting for improving zero-shot VQA performance.

READ FULL TEXT
research
10/17/2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

Visual question answering (VQA) is a hallmark of vision and language rea...
research
05/31/2023

Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

Visual Question Answering is a challenging task, as it requires seamless...
research
09/14/2022

MUST-VQA: MUltilingual Scene-text VQA

In this paper, we present a framework for Multilingual Scene Text Visual...
research
05/27/2023

Modularized Zero-shot VQA with Pre-trained Models

Large-scale pre-trained models (PTMs) show great zero-shot capabilities....
research
05/04/2023

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets

Emergent chain-of-thought (CoT) reasoning capabilities promise to improv...
research
11/02/2018

Zero-Shot Transfer VQA Dataset

Acquiring a large vocabulary is an important aspect of human intelligenc...
research
08/01/2023

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

The recent progress in large language models (LLMs), especially the inve...

Please sign up or login with your details

Forgot password? Click here to reset