Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

02/23/2023
by   Yang Chen, et al.
0

Large language models have demonstrated an emergent capability in answering knowledge intensive questions. With recent progress on web-scale visual and language pre-training, do these models also understand how to answer visual information seeking questions? To answer this question, we present InfoSeek, a Visual Question Answering dataset that focuses on asking information-seeking questions, where the information can not be answered by common sense knowledge. We perform a multi-stage human annotation to collect a natural distribution of high-quality visual information seeking question-answer pairs. We also construct a large-scale, automatically collected dataset by combining existing visual entity recognition datasets and Wikidata, which provides over one million examples for model fine-tuning and validation. Based on InfoSeek, we analyzed various pre-trained Visual QA systems to gain insights into the characteristics of different pre-trained models. Our analysis shows that it is challenging for the state-of-the-art multi-modal pre-trained models to answer visual information seeking questions, but this capability is improved through fine-tuning on the automated InfoSeek dataset. We hope our analysis paves the way to understand and develop the next generation of multi-modal pre-training.

READ FULL TEXT

page 1

page 6

page 7

page 9

page 14

page 15

research
10/14/2021

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

With the rise of large-scale pre-trained language models, open-domain qu...
research
02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...
research
05/01/2021

When to Fold'em: How to answer Unanswerable questions

We present 3 different question-answering models trained on the SQuAD2.0...
research
12/23/2022

Learning to Generate Questions by Enhancing Text Generation with Sentence Selection

We introduce an approach for the answer-aware question generation proble...
research
05/25/2023

Understanding the Capabilities of Large Language Models for Automated Planning

Automated planning is concerned with developing efficient algorithms to ...
research
11/25/2020

AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments

Multiple-choice questions (MCQs) offer the most promising avenue for ski...
research
12/09/2020

Improving Knowledge Tracing via Pre-training Question Embeddings

Knowledge tracing (KT) defines the task of predicting whether students c...

Please sign up or login with your details

Forgot password? Click here to reset