Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures

10/25/2018
by   Ben Zion Vatashsky, et al.
2

An image related question defines a specific visual task that is required in order to produce an appropriate answer. The answer may depend on a minor detail in the image and require complex reasoning and use of prior knowledge. When humans perform this task, they are able to do it in a flexible and robust manner, integrating modularly any novel visual capability with diverse options for various elaborations of the task. In contrast, current approaches to solve this problem by a machine are based on casting the problem as an end-to-end learning problem, which lacks such abilities. We present a different approach, inspired by the aforementioned human capabilities. The approach is based on the compositional structure of the question. The underlying idea is that a question has an abstract representation based on its structure, which is compositional in nature. The question can consequently be answered by a composition of procedures corresponding to its substructures. The basic elements of the representation are logical patterns, which are put together to represent the question. These patterns include a parametric representation for object classes, properties and relations. Each basic pattern is mapped into a basic procedure that includes meaningful visual tasks, and the patterns are composed to produce the overall answering procedure. The UnCoRd (Understand Compose and Respond) system, based on this approach, integrates existing detection and classification schemes for a set of object classes, properties and relations. These schemes are incorporated in a modular manner, providing elaborated answers and corrections for negative answers. In addition, an external knowledge base is queried for required common-knowledge. We performed a qualitative analysis of the system, which demonstrates its representation capabilities and provide suggestions for future developments.

READ FULL TEXT

page 13

page 14

page 21

page 22

page 23

page 27

page 28

page 29

research
11/20/2018

VQA with no questions-answers training

Methods for teaching machines to answer visual questions have made signi...
research
11/09/2015

Explicit Knowledge-based Reasoning for Visual Question Answering

We describe a method for visual question answering which is capable of r...
research
08/12/2019

Why Does a Visual Question Have Different Answers?

Visual question answering is the task of returning the answer to a quest...
research
03/06/2019

Multi-Instance Learning for End-to-End Knowledge Base Question Answering

End-to-end training has been a popular approach for knowledge base quest...
research
08/29/2018

Neural Compositional Denotational Semantics for Question Answering

Answering compositional questions requiring multi-step reasoning is chal...
research
06/27/2017

A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending

A descriptive approach for automatic generation of visual blends is pres...
research
04/18/2021

Case-based Reasoning for Natural Language Queries over Knowledge Bases

It is often challenging for a system to solve a new complex problem from...

Please sign up or login with your details

Forgot password? Click here to reset