Diverse Visuo-Lingustic Question Answering (DVLQA) Challenge

by   Shailaja Sampat, et al.

Existing question answering datasets mostly contain homogeneous contexts, based on either textual or visual information alone. On the other hand, digitalization has evolved the nature of reading which often includes integrating information across multiple heterogeneous sources. To bridge the gap between two, we compile a Diverse Visuo-Lingustic Question Answering (DVLQA) challenge corpus, where the task is to derive joint inference about the given image-text modality in a question answering setting. Each dataset item consists of an image and a reading passage, where questions are designed to combine both visual and textual information, i.e. ignoring either of them would make the question unanswerable. We first explore the combination of best existing deep learning architectures for visual question answering and machine comprehension to solve DVLQA subsets and show that they are unable to reason well on the joint task. We then develop a modular method which demonstrates slightly better baseline performance and offers more transparency for interpretation of intermediate outputs. However, this is still far behind the human performance, therefore we believe DVLQA will be a challenging benchmark for question answering involving reasoning over visuo-linguistic context. The dataset, code and public leaderboard will be made available at https://github.com/shailaja183/DVLQA.


page 1

page 2

page 4

page 9

page 10


CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

Most existing research on visual question answering (VQA) is limited to ...

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

Existing question answering datasets focus on dealing with homogeneous i...

TheoremQA: A Theorem-driven Question Answering dataset

The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in s...

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

Synthetic datasets have successfully been used to probe visual question-...

Incremental Reading for Question Answering

Any system which performs goal-directed continual learning must not only...

Contextual Aware Joint Probability Model Towards Question Answering System

In this paper, we address the question answering challenge with the SQuA...

GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level

Scenario-based question answering (SQA) has attracted increasing researc...

Please sign up or login with your details

Forgot password? Click here to reset