Evaluation Criteria for Instance-based Explanation

by   Kazuaki Hanawa, et al.

Explaining predictions made by complex machine learning models helps users understand and accept the predicted outputs with confidence. Instance-based explanation provides such help by identifying relevant instances as evidence to support a model's prediction result. To find relevant instances, several relevance metrics have been proposed. In this study, we ask the following research question: "Do the metrics actually work in practice?" To address this question, we propose two sanity check criteria that valid metrics should pass, and two additional criteria to evaluate the practical utility of the metrics. All criteria are designed in terms of whether the metric can pick up instances of desirable properties that the users expect in practice. Through experiments, we obtained two insights. First, some popular relevance metrics do not pass sanity check criteria. Second, some metrics based on cosine similarity perform better than other metrics, which would be recommended choices in practice. We also analyze why some metrics are successful and why some are not. We expect our insights to help further researches such as developing better explanation methods or designing new evaluation criteria.


page 2

page 8


The Solvability of Interpretability Evaluation Metrics

Feature attribution methods are popular for explaining neural network pr...

Guidelines and evaluation for clinical explainable AI on medical image analysis

Explainable artificial intelligence (XAI) is essential for enabling clin...

Which is the least complex explanation? Abduction and complexity

It may happen that for a certain abductive problem there are several pos...

First Study on Data Readiness Level

We introduce the idea of Data Readiness Level (DRL) to measure the relat...

Can We Trust Your Explanations? Sanity Checks for Interpreters in Android Malware Analysis

With the rapid growth of Android malware, many machine learning-based ma...

Evaluation metrics for behaviour modeling

A primary difficulty with unsupervised discovery of structure in large d...

Quantitative Evaluations on Saliency Methods: An Experimental Study

It has been long debated that eXplainable AI (XAI) is an important topic...