Object-based reasoning in VQA

01/29/2018
by   Mikyas T. Desta, et al.
0

Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task given in the form of a natural language. As the solutions inherently require to combine visual and natural language processing with abstract reasoning, the problem is considered as AI-complete. Recent advances indicate that using high-level, abstract facts extracted from the inputs might facilitate reasoning. Following that direction we decided to develop a solution combining state-of-the-art object detection and reasoning modules. The results, achieved on the well-balanced CLEVR dataset, confirm the promises and show significant, few percent improvements of accuracy on the complex "counting" task.

READ FULL TEXT

page 5

page 6

research
09/24/2017

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requir...
research
06/01/2023

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

The impressive advances and applications of large language and joint lan...
research
05/23/2023

Image Manipulation via Multi-Hop Instructions – A New Dataset and Weakly-Supervised Neuro-Symbolic Approach

We are interested in image manipulation via natural language text – a ta...
research
04/18/2018

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g.,...
research
11/16/2015

Yin and Yang: Balancing and Answering Binary Visual Questions

The complex compositional structure of language makes problems at the in...
research
11/29/2022

Abstract Visual Reasoning with Tangram Shapes

We introduce KiloGram, a resource for studying abstract visual reasoning...
research
10/29/2018

TallyQA: Answering Complex Counting Questions

Most counting questions in visual question answering (VQA) datasets are ...

Please sign up or login with your details

Forgot password? Click here to reset