A Learning-Theoretic Framework for Certified Auditing of Machine Learning Models

06/09/2022
by   Chhavi Yadav, et al.
6

Responsible use of machine learning requires that models be audited for undesirable properties. However, how to do principled auditing in a general setting has remained ill-understood. In this paper, we propose a formal learning-theoretic framework for auditing. We propose algorithms for auditing linear classifiers for feature sensitivity using label queries as well as different kinds of explanations, and provide performance guarantees. Our results illustrate that while counterfactual explanations can be extremely helpful for auditing, anchor explanations may not be as beneficial in the worst case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2019

Efficient computation of counterfactual explanations of LVQ models

With the increasing use of machine learning in practice and because of l...
research
03/25/2023

Learning with Explanation Constraints

While supervised learning assumes the presence of labeled data, we may h...
research
07/13/2023

On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations

Explainable Artificial Intelligence (XAI) has received widespread intere...
research
07/13/2018

Model Reconstruction from Model Explanations

We show through theory and experiment that gradient-based explanations o...
research
07/08/2022

String Diagrams for Layered Explanations

We propose a categorical framework to reason about scientific explanatio...
research
06/01/2023

The Risks of Recourse in Binary Classification

Algorithmic recourse provides explanations that help users overturn an u...
research
09/18/2020

On the Tractability of SHAP Explanations

SHAP explanations are a popular feature-attribution mechanism for explai...

Please sign up or login with your details

Forgot password? Click here to reset