Merlin-Arthur Classifiers: Formal Interpretability with Interactive Black Boxes

06/01/2022
by   Stephan Wäldchen, et al.
0

We present a new theoretical framework for making black box classifiers such as Neural Networks interpretable, basing our work on clear assumptions and guarantees. In our setting, which is inspired by the Merlin-Arthur protocol from Interactive Proof Systems, two functions cooperate to achieve a classification together: the prover selects a small set of features as a certificate and presents it to the classifier. Including a second, adversarial prover allows us to connect a game-theoretic equilibrium to information-theoretic guarantees on the exchanged features. We define notions of completeness and soundness that enable us to lower bound the mutual information between features and class. To demonstrate good agreement between theory and practice, we support our framework by providing numerical experiments for Neural Network classifiers, explicitly calculating the mutual information of features with respect to the class.

READ FULL TEXT

page 18

page 21

research
09/21/2022

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

Deep learning systems have been reported to achieve state-of-the-art per...
research
06/07/2023

Hardness of Deceptive Certificate Selection

Recent progress towards theoretical interpretability guarantees for AI h...
research
09/23/2020

Information-Theoretic Visual Explanation for Black-Box Classifiers

In this work, we attempt to explain the prediction of any black-box clas...
research
02/19/2021

Sequential- and Parallel- Constrained Max-value Entropy Search via Information Lower Bound

Recently, several Bayesian optimization (BO) methods have been extended ...
research
10/07/2019

Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

Despite great popularity of applying softmax to map the non-normalised o...
research
03/22/2020

Invariant Rationalization

Selective rationalization improves neural network interpretability by id...
research
09/07/2020

Mutual Information for Explainable Deep Learning of Multiscale Systems

Timely completion of design cycles for multiscale and multiphysics syste...

Please sign up or login with your details

Forgot password? Click here to reset