CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

04/05/2022
by   Leonard Salewski, et al.
21

Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at <https://explainableml.github.io/CLEVR-X/>.

READ FULL TEXT

page 2

page 10

research
02/15/2019

Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention

In this paper, we present a novel approach for the task of eXplainable Q...
research
12/08/2022

Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

Natural language explanations promise to offer intuitively understandabl...
research
02/09/2023

Explanation Selection Using Unlabeled Data for In-Context Learning

Recent work has addressed textual reasoning tasks by prompting large lan...
research
10/04/2022

Affection: Learning Affective Explanations for Real-World Visual Data

In this work, we explore the emotional reactions that real-world images ...
research
08/17/2023

Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

Natural Language Explanations (NLE) aim at supplementing the prediction ...
research
03/08/2023

Interpretable Visual Question Answering Referring to Outside Knowledge

We present a novel multimodal interpretable VQA model that can answer th...
research
10/07/2019

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

To increase trust in artificial intelligence systems, a growing amount o...

Please sign up or login with your details

Forgot password? Click here to reset