Zero-Shot Transfer VQA Dataset

11/02/2018
by   Yuanpeng Li, et al.
0

Acquiring a large vocabulary is an important aspect of human intelligence. Onecommon approach for human to populating vocabulary is to learn words duringreading or listening, and then use them in writing or speaking. This ability totransfer from input to output is natural for human, but it is difficult for machines.Human spontaneously performs this knowledge transfer in complicated multimodaltasks, such as Visual Question Answering (VQA). In order to approach human-levelArtificial Intelligence, we hope to equip machines with such ability. Therefore, toaccelerate this research, we propose a newzero-shot transfer VQA(ZST-VQA)dataset by reorganizing the existing VQA v1.0 dataset in the way that duringtraining, some words appear only in one module (i.e. questions) but not in theother (i.e. answers). In this setting, an intelligent model should understand andlearn the concepts from one module (i.e. questions), and at test time, transfer themto the other (i.e. predict the concepts as answers). We conduct evaluation on thisnew dataset using three existing state-of-the-art VQA neural models. Experimentalresults show a significant drop in performance on this dataset, indicating existingmethods do not address the zero-shot transfer problem. Besides, our analysis findsthat this may be caused by the implicit bias learned during training.

READ FULL TEXT
research
11/17/2016

Zero-Shot Visual Question Answering

Part of the appeal of Visual Question Answering (VQA) is its promise to ...
research
07/12/2021

Zero-shot Visual Question Answering using Knowledge Graph

Incorporating external knowledge to Visual Question Answering (VQA) has ...
research
05/04/2022

All You May Need for VQA are Image Captions

Visual Question Answering (VQA) has benefited from increasingly sophisti...
research
06/16/2023

Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering

Visual question answering (VQA) is a challenging task that requires the ...
research
12/21/2022

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Large language models (LLMs) have demonstrated excellent zero-shot gener...
research
02/21/2021

Learning Compositional Representation for Few-shot Visual Question Answering

Current methods of Visual Question Answering perform well on the answers...
research
09/21/2022

Continual VQA for Disaster Response Systems

Visual Question Answering (VQA) is a multi-modal task that involves answ...

Please sign up or login with your details

Forgot password? Click here to reset