Fixing confirmation bias in feature attribution methods via semantic match

07/03/2023
by   Giovanni Cinà, et al.
0

Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cinà et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). We couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI.

READ FULL TEXT
research
01/05/2023

Semantic match: Debugging feature attribution methods in XAI for healthcare

The recent spike in certified Artificial Intelligence (AI) tools for hea...
research
06/13/2022

Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure

This paper presents a new efficient black-box attribution method based o...
research
10/16/2020

Evaluating Attribution Methods using White-Box LSTMs

Interpretability methods for neural networks are difficult to evaluate b...
research
04/04/2021

Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models

Convolutional neural networks are showing promise in the automatic diagn...
research
11/25/2020

Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations

Most explanation methods in deep learning map importance estimates for a...
research
06/25/2019

Learning Explainable Models Using Attribution Priors

Two important topics in deep learning both involve incorporating humans ...

Please sign up or login with your details

Forgot password? Click here to reset