TAB-VCR: Tags and Attributes based VCR Baselines

10/31/2019
by   Jingxiang Lin, et al.
3

Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning(VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler effective baseline, TAB-VCR. We show that this approach results in a 5.3 4.4 question answering, answer justification and holistic VCR.

READ FULL TEXT

page 2

page 4

page 8

page 17

page 18

research
12/20/2016

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer...
research
09/07/2023

Interpretable Visual Question Answering via Reasoning Supervision

Transformer-based architectures have recently demonstrated remarkable pe...
research
12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...
research
02/24/2022

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

How can we measure the reasoning capabilities of intelligence systems? V...
research
11/12/2018

Blindfold Baselines for Embodied QA

We explore blindfold (question-only) baselines for Embodied Question Ans...
research
10/21/2019

Enforcing Reasoning in Visual Commonsense Reasoning

The task of Visual Commonsense Reasoning is extremely challenging in the...
research
11/17/2016

Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic

In this work, we explore a genre of puzzles ("image riddles") which invo...

Please sign up or login with your details

Forgot password? Click here to reset