Interpretable by Design Visual Question Answering

05/24/2023
by   XingYu Fu, et al.
0

Model interpretability has long been a hard problem for the AI community especially in the multimodal setting, where vision and language need to be aligned and reasoned at the same time. In this paper, we specifically focus on the problem of Visual Question Answering (VQA). While previous researches try to probe into the network structures of black-box multimodal models, we propose to tackle the problem from a different angle – to treat interpretability as an explicit additional goal. Given an image and question, we argue that an interpretable VQA model should be able to tell what conclusions it can get from which part of the image, and show how each statement help to arrive at an answer. We introduce InterVQA: Interpretable-by-design VQA, where we design an explicit intermediate dynamic reasoning structure for VQA problems and enforce symbolic reasoning that only use the structure for final answer prediction to take place. InterVQA produces high-quality explicit intermediate reasoning steps, while maintaining similar to the state-of-the-art (sota) end-task performance.

READ FULL TEXT
research
11/09/2022

Towards Reasoning-Aware Explainable VQA

The domain of joint vision-language understanding, especially in the con...
research
12/16/2021

KAT: A Knowledge Augmented Transformer for Vision-and-Language

The primary focus of recent work with largescale transformers has been o...
research
03/14/2018

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

Visual question answering requires high-order reasoning about an image, ...
research
02/21/2019

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

We propose a new class of probabilistic neural-symbolic models, that hav...
research
04/02/2017

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

An important goal of computer vision is to build systems that learn visu...
research
11/08/2021

Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in th...
research
03/27/2023

Curriculum Learning for Compositional Visual Reasoning

Visual Question Answering (VQA) is a complex task requiring large datase...

Please sign up or login with your details

Forgot password? Click here to reset