DeepAI
Log In Sign Up

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

05/08/2021
by   Maxime Kayser, et al.
4

Recently, an increasing number of works have introduced models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing because they can provide human-friendly and comprehensive explanations. However, there is still a lack of unified evaluation approaches for the explanations generated by these models. Moreover, there are currently only few datasets of NLEs for VL tasks. In this work, we introduce e-ViL, a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. e-ViL spans four models and three datasets. Both automatic metrics and human evaluation are used to assess model-generated explanations. We also introduce e-SNLI-VE, the largest existing VL dataset with NLEs (over 430k instances). Finally, we propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state-of-the-art by a large margin across all datasets.

READ FULL TEXT

page 2

page 3

page 8

page 14

page 15

page 18

page 23

page 24

06/25/2021

Rationale-Inspired Natural Language Explanations with Commonsense

Explainable machine learning models primarily justify predicted labels u...
10/07/2019

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

To increase trust in artificial intelligence systems, a growing amount o...
03/15/2021

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

As transparency becomes key for robotics and AI, it will be necessary to...
10/08/2020

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Data collection for natural language (NL) understanding tasks has increa...
11/11/2021

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

NLP researchers need more, higher-quality text datasets. Human-labeled d...
12/21/2022

Tracing and Removing Data Errors in Natural Language Generation Datasets

Recent work has identified noisy and misannotated data as a core cause o...
09/09/2021

SPECTRA: Sparse Structured Text Rationalization

Selective rationalization aims to produce decisions along with rationale...

Code Repositories