Explaining Neural Networks without Access to Training Data

06/10/2022
by   Sascha Marton, et al.
0

We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, ℐ-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the ℐ-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding ℐ-Net output layers. Furthermore, we make ℐ-Nets applicable to real-world tasks by considering more realistic distributions when generating the ℐ-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.

READ FULL TEXT

page 4

page 5

page 8

page 11

page 12

page 13

page 22

research
11/01/2021

Provably efficient, succinct, and precise explanations

We consider the problem of explaining the predictions of an arbitrary bl...
research
08/15/2023

Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening

Machine unlearning, the ability for a machine learning model to forget, ...
research
11/27/2017

Distilling a Neural Network Into a Soft Decision Tree

Deep neural networks have proved to be a very effective way to perform c...
research
08/08/2022

Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers

Many problems in computer vision have recently been tackled using models...
research
08/29/2023

Probabilistic Dataset Reconstruction from Interpretable Models

Interpretability is often pointed out as a key requirement for trustwort...
research
12/22/2017

Inverse Classification for Comparison-based Interpretability in Machine Learning

In the context of post-hoc interpretability, this paper addresses the ta...
research
06/29/2018

Posthoc Interpretability of Learning to Rank Models using Secondary Training Data

Predictive models are omnipresent in automated and assisted decision mak...

Please sign up or login with your details

Forgot password? Click here to reset