Shifting the Baseline: Single Modality Performance on Visual Navigation & QA

11/01/2018
by   Jesse Thomason, et al.
0

Language-and-vision navigation and question answering (QA) are exciting AI tasks situated at the intersection of natural language understanding, computer vision, and robotics. Researchers from all of these fields have begun creating datasets and model architectures for these domains. It is, however, not always clear if strong performance is due to advances in multimodal reasoning or if models are learning to exploit biases and artifacts of the data. We present single modality models and explore the linguistic, visual, and structural biases of these benchmarks. We find that single modality models often outperform published baselines that accompany multimodal task datasets, suggesting a need for change in community best practices moving forward. In light of this, we recommend presenting single modality baselines alongside new multimodal models to provide a fair comparison of information gained over dataset biases when considering multimodal input.

READ FULL TEXT

page 1

page 4

page 8

research
07/07/2020

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

Question answering biases in video QA datasets can mislead multimodal mo...
research
07/25/2023

MAEA: Multimodal Attribution for Embodied AI

Understanding multimodal perception for embodied AI is an open question ...
research
09/11/2018

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Visual QA is a pivotal challenge for higher-level reasoning, requiring u...
research
09/01/2021

WebQA: Multihop and Multimodal QA

Web search is fundamentally multimodal and multihop. Often, even before ...
research
01/22/2020

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

We present a new multimodal question answering challenge, ManyModalQA, i...
research
12/18/2020

On Modality Bias in the TVQA Dataset

TVQA is a large scale video question answering (video-QA) dataset based ...
research
04/04/2022

On Explaining Multimodal Hateful Meme Detection Models

Hateful meme detection is a new multimodal task that has gained signific...

Please sign up or login with your details

Forgot password? Click here to reset