Evaluation of FEM and MLFEM AI-explainers in Image Classification tasks with reference-based and no-reference metrics

12/02/2022
by   A. Zhukov, et al.
0

The most popular methods and algorithms for AI are, for the vast majority, black boxes. Black boxes can be an acceptable solution to unimportant problems (in the sense of the degree of impact) but have a fatal flaw for the rest. Therefore the explanation tools for them have been quickly developed. The evaluation of their quality remains an open research question. In this technical report, we remind recently proposed post-hoc explainers FEM and MLFEM which have been designed for explanations of CNNs in image and video classification tasks. We also propose their evaluation with reference-based and no-reference metrics. The reference-based metrics are Pearson Correlation coefficient and Similarity computed between the explanation maps and the ground truth, which is represented by Gaze Fixation Density Maps obtained due to a psycho-visual experiment. As a no-reference metric we use "stability" metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kind of degradations on input images, this metric is in agreement with reference-based ones. Therefore it can be used for evaluation of the quality of explainers when the ground truth is not available.

READ FULL TEXT

page 7

page 8

page 9

page 10

page 12

page 15

page 16

page 17

research
06/15/2021

On the Objective Evaluation of Post Hoc Explainers

Many applications of data-driven models demand transparency of decisions...
research
08/01/2017

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

With the increased focus on visual attention (VA) in the last decade, a ...
research
09/23/2021

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

In some problem spaces, the high cost of obtaining ground truth labels n...
research
06/02/2020

Monitoring Data Distribution and Exploitation in a Global-Scale Microservice Artefact Observatory

Reusable microservice artefacts are often deployed as black or grey boxe...
research
05/13/2022

Comparison of attention models and post-hoc explanation methods for embryo stage identification: a case study

An important limitation to the development of AI-based solutions for In ...
research
09/01/2021

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Benchmark datasets that measure camera pose accuracy have driven progres...
research
08/28/2021

QACE: Asking Questions to Evaluate an Image Caption

In this paper, we propose QACE, a new metric based on Question Answering...

Please sign up or login with your details

Forgot password? Click here to reset