Survey of Recent Advances in Visual Question Answering

09/24/2017
by   Supriya Pandhre, et al.
0

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. The algorithm further needs to learn how to perform reasoning over this multi-modal representation so it can answer the questions correctly. This paper presents a survey of different approaches proposed to solve the problem of Visual Question Answering. We also describe the current state of the art model in later part of paper. In particular, the paper describes the approaches taken by various algorithms to extract image features, text features and the way these are employed to predict answers. We also briefly discuss the experiments performed to evaluate the VQA models and report their performances on diverse datasets including newly released VQA2.0[8].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2016

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Visual Question Answering (VQA) is a recent problem in computer vision a...
research
01/29/2018

Object-based reasoning in VQA

Visual Question Answering (VQA) is a novel problem domain where multi-mo...
research
08/10/2022

CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

We introduce CLEVR-Math, a multi-modal math word problems dataset consis...
research
04/11/2017

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

This paper presents a new baseline for visual question answering task. G...
research
06/01/2023

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

The impressive advances and applications of large language and joint lan...
research
03/14/2022

ScienceWorld: Is your Agent Smarter than a 5th Grader?

This paper presents a new benchmark, ScienceWorld, to test agents' scien...
research
05/18/2023

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Visual Question Answering (VQA) is an emerging area of interest for rese...

Please sign up or login with your details

Forgot password? Click here to reset