e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

05/08/2021
by   Maxime Kayser, et al.
4

Recently, an increasing number of works have introduced models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing because they can provide human-friendly and comprehensive explanations. However, there is still a lack of unified evaluation approaches for the explanations generated by these models. Moreover, there are currently only few datasets of NLEs for VL tasks. In this work, we introduce e-ViL, a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. e-ViL spans four models and three datasets. Both automatic metrics and human evaluation are used to assess model-generated explanations. We also introduce e-SNLI-VE, the largest existing VL dataset with NLEs (over 430k instances). Finally, we propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state-of-the-art by a large margin across all datasets.

READ FULL TEXT

page 2

page 3

page 8

page 14

page 15

page 18

page 23

page 24

research
06/25/2021

Rationale-Inspired Natural Language Explanations with Commonsense

Explainable machine learning models primarily justify predicted labels u...
research
08/17/2023

Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

Natural Language Explanations (NLE) aim at supplementing the prediction ...
research
10/07/2019

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

To increase trust in artificial intelligence systems, a growing amount o...
research
10/08/2020

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Data collection for natural language (NL) understanding tasks has increa...
research
12/24/2020

To what extent do human explanations of model behavior align with actual model behavior?

Given the increasingly prominent role NLP models (will) play in our live...
research
12/21/2022

Tracing and Removing Data Errors in Natural Language Generation Datasets

Recent work has identified noisy and misannotated data as a core cause o...
research
09/09/2021

SPECTRA: Sparse Structured Text Rationalization

Selective rationalization aims to produce decisions along with rationale...

Please sign up or login with your details

Forgot password? Click here to reset