Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

10/24/2020
by   Radhika Dua, et al.
10

Visual Question Answering is a multi-modal task that aims to measure high-level visual understanding. Contemporary VQA models are restrictive in the sense that answers are obtained via classification over a limited vocabulary (in the case of open-ended VQA), or via classification over a set of multiple-choice-type answers. In this work, we present a completely generative formulation where a multi-word answer is generated for a visual query. To take this a step forward, we introduce a new task: ViQAR (Visual Question Answering and Reasoning), wherein a model must generate the complete answer and a rationale that seeks to justify the generated answer. We propose an end-to-end architecture to solve this task and describe how to evaluate it. We show that our model generates strong answers and rationales through qualitative and quantitative evaluation, as well as through a human Turing Test.

READ FULL TEXT

page 8

page 10

page 15

page 16

page 17

page 18

page 19

page 20

research
05/03/2015

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answerin...
research
04/04/2020

Generating Rationales in Visual Question Answering

Despite recent advances in Visual QuestionAnswering (VQA), it remains a ...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...
research
08/18/2023

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Video Question Answering (VideoQA) is a challenging task that entails co...
research
10/21/2019

Enforcing Reasoning in Visual Commonsense Reasoning

The task of Visual Commonsense Reasoning is extremely challenging in the...
research
11/21/2019

ChartNet: Visual Reasoning over Statistical Charts using MAC-Networks

Despite the improvements in perception accuracies brought about via deep...
research
09/12/2022

Towards Multi-Lingual Visual Question Answering

Visual Question Answering (VQA) has been primarily studied through the l...

Please sign up or login with your details

Forgot password? Click here to reset