Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

05/31/2019
by   Gregory Plumb, et al.
5

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, which lack guarantees about their explanation quality. We propose an alternative to these approaches by directly regularizing a black-box model for interpretability at training time. Our approach explicitly connects three key aspects of interpretable machine learning: (i) the model's innate explainability, (ii) the explanation system used at test time, and (iii) the metrics that measure explanation quality. Our regularization results in substantial improvement in terms of the explanation fidelity and stability metrics across a range of datasets and black-box explanation systems while slightly improving accuracy. Further, if the resulting model is still not sufficiently interpretable, the weight of the regularization term can be adjusted to achieve the desired trade-off between accuracy and interpretability. Finally, we justify theoretically that the benefits of explanation-based regularization generalize to unseen points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2019

Regularizing Black-box Models for Improved Interpretability

Most work on interpretability in machine learning has focused on designi...
research
12/02/2022

COmic: Convolutional Kernel Networks for Interpretable End-to-End Learning on (Multi-)Omics Data

Motivation: The size of available omics datasets is steadily increasing ...
research
03/10/2020

Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data

Machine learning using behavioral and text data can result in highly acc...
research
05/13/2022

Comparison of attention models and post-hoc explanation methods for embryo stage identification: a case study

An important limitation to the development of AI-based solutions for In ...
research
02/01/2022

Framework for Evaluating Faithfulness of Local Explanations

We study the faithfulness of an explanation system to the underlying pre...
research
06/08/2023

Sound Explanation for Trustworthy Machine Learning

We take a formal approach to the explainability problem of machine learn...
research
11/09/2022

Mapping the Ictal-Interictal-Injury Continuum Using Interpretable Machine Learning

IMPORTANCE: An interpretable machine learning model can provide faithful...

Please sign up or login with your details

Forgot password? Click here to reset