Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6 that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1 trained on just 10

READ FULL TEXT

page 1

page 2

research
03/23/2018

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Many vision and language tasks require commonsense reasoning beyond data...
research
05/16/2022

A Neuro-Symbolic ASP Pipeline for Visual Question Answering

We present a neuro-symbolic visual question answering (VQA) pipeline for...
research
02/19/2020

VQA-LOL: Visual Question Answering under the Lens of Logic

Logical connectives and their implications on the meaning of a natural l...
research
05/24/2023

Interpretable by Design Visual Question Answering

Model interpretability has long been a hard problem for the AI community...
research
05/01/2022

Deep Learning with Logical Constraints

In recent years, there has been an increasing interest in exploiting log...
research
12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...
research
02/18/2023

Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

In this paper, a bridge member damage cause estimation framework is prop...

Please sign up or login with your details

Forgot password? Click here to reset