A survey on VQA_Datasets and Approaches

05/02/2021
by   Yeyun Zou, et al.
0

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2016

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Visual Question Answering (VQA) is a recent problem in computer vision a...
research
06/03/2018

On the Flip Side: Identifying Counterexamples in Visual Question Answering

Visual question answering (VQA) models respond to open-ended natural lan...
research
03/01/2019

Answer Them All! Toward Universal Visual Question Answering Models

Visual Question Answering (VQA) research is split into two camps: the fi...
research
01/15/2021

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Video Question Answering (VQA) is a recent emerging challenging task in ...
research
02/12/2020

Component Analysis for Visual Question Answering Architectures

Recent research advances in Computer Vision and Natural Language Process...
research
11/29/2018

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of bo...
research
09/24/2021

How to find a good image-text embedding for remote sensing visual question answering?

Visual question answering (VQA) has recently been introduced to remote s...

Please sign up or login with your details

Forgot password? Click here to reset