Log In Sign Up

How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations

by   Sérgio Jesus, et al.

There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular post-hoc explanation methods – LIME, SHAP, and TreeInterpreter – on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case.


page 1

page 2

page 3

page 4


A Human-Grounded Evaluation of SHAP for Alert Processing

In the past years, many new explanation methods have been proposed to ac...

The Intriguing Properties of Model Explanations

Linear approximations to the decision boundary of a complex model have b...

Challenging common interpretability assumptions in feature attribution explanations

As machine learning and algorithmic decision making systems are increasi...

Subgoal-Based Explanations for Unreliable Intelligent Decision Support Systems

Intelligent decision support (IDS) systems leverage artificial intellige...

Just in Time: Personal Temporal Insights for Altering Model Decisions

The interpretability of complex Machine Learning models is coming to be ...

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Machine Learning (ML) models now inform a wide range of human decisions,...