DeepAI
Log In Sign Up

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

03/15/2021
by   Miruna Clinciu, et al.
0

As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations. Here, we explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG). Specifically, we investigate which of the NLG evaluation measures map well to explanations. We present the ExBAN corpus: a crowd-sourced corpus of NL explanations for Bayesian Networks. We run correlations comparing human subjective ratings with NLG automatic measures. We find that embedding-based automatic NLG evaluation methods, such as BERTScore and BLEURT, have a higher correlation with human ratings, compared to word-overlap metrics, such as BLEU and ROUGE. This work has implications for Explainable AI and transparent robotic and autonomous systems.

READ FULL TEXT
08/18/2021

I don't understand! Evaluation Methods for Natural Language Explanations

Explainability of intelligent systems is key for future adoption. While ...
05/08/2021

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

Recently, an increasing number of works have introduced models capable o...
09/16/2019

Communication-based Evaluation for Natural Language Generation

Natural language generation (NLG) systems are commonly evaluated using n...
04/12/2021

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Human ratings are one of the most prevalent methods to evaluate the perf...
03/15/2018

RankME: Reliable Human Ratings for Natural Language Generation

Human evaluation for natural language generation (NLG) often suffers fro...
10/08/2020

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Data collection for natural language (NL) understanding tasks has increa...
10/02/2020

AI pptX: Robust Continuous Learning for Document Generation with AI Insights

Business analysts create billions of slide decks, reports and documents ...