DeepAI
Log In Sign Up

How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations

01/21/2021
by   Sérgio Jesus, et al.
0

There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular post-hoc explanation methods – LIME, SHAP, and TreeInterpreter – on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/07/2019

A Human-Grounded Evaluation of SHAP for Alert Processing

In the past years, many new explanation methods have been proposed to ac...
01/30/2018

The Intriguing Properties of Model Explanations

Linear approximations to the decision boundary of a complex model have b...
12/04/2020

Challenging common interpretability assumptions in feature attribution explanations

As machine learning and algorithmic decision making systems are increasi...
01/11/2022

Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems

Intelligent decision support (IDS) systems leverage artificial intellige...
07/08/2020

Just in Time: Personal Temporal Insights for Altering Model Decisions

The interpretability of complex Machine Learning models is coming to be ...
06/24/2022

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Machine Learning (ML) models now inform a wide range of human decisions,...