Layerwise Knowledge Extraction from Deep Convolutional Networks

by   Simon Odense, et al.
City, University of London

Knowledge extraction is used to convert neural networks into symbolic descriptions with the objective of producing more comprehensible learning models. The central challenge is to find an explanation which is more comprehensible than the original model while still representing that model faithfully. The distributed nature of deep networks has led many to believe that the hidden features of a neural network cannot be explained by logical descriptions simple enough to be comprehensible. In this paper, we propose a novel layerwise knowledge extraction method using M-of-N rules which seeks to obtain the best trade-off between the complexity and accuracy of rules describing the hidden features of a deep network. We show empirically that this approach produces rules close to an optimal complexity-error tradeoff. We apply this method to a variety of deep networks and find that in the internal layers we often cannot find rules with a satisfactory complexity and accuracy, suggesting that rule extraction as a general purpose method for explaining the internal logic of a neural network may be impossible. However, we also find that the softmax layer in Convolutional Neural Networks and Autoencoders using either tanh or relu activation functions is highly explainable by rule extraction, with compact rules consisting of as little as 3 units out of 128 often reaching over 99 useful component for explaining parts (or modules) of a deep neural network.



There are no comments yet.


page 5


Rule Extraction from Binary Neural Networks with Convolutional Rules for Model Validation

Most deep neural networks are considered to be black boxes, meaning thei...

ExpDNN: Explainable Deep Neural Network

In recent years, deep neural networks have been applied to obtain high p...

Eclectic Extraction of Propositional Rules from Neural Networks

Artificial Neural Network is among the most popular algorithm for superv...

Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms

A simple approach is proposed to obtain complexity controls for neural n...

Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation

Self-explaining models are models that reveal decision making parameters...

A Deep Convolutional Neural Network for the Detection of Polyps in Colonoscopy Images

Computerized detection of colonic polyps remains an unsolved issue becau...

Diagnostic Rule Extraction Using Neural Networks

The neural networks have trained on incomplete sets that a doctor could ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently there has been an increase in interest in explainable Artificial Intelligence (AI). Although in the past decade there have been major advances in the performance of neural network models, these models tend not to be explainable

Wilson et al. (2017)

. In large part, this is due to the use of very large networks, specifically deep networks, which can contain thousands or even millions of hidden neurons. In contrast with symbolic AI, in which specific features are often hand picked for a problem, or symbolic Machine Learning (ML), which takes a localist approach

Richardson and Domingos (2006), the hidden neurons in a deep neural network do not necessarily correlate with obviously identifiable features of the data that a human would recognise.

Knowledge extraction seeks to increase the explainability of neural networks by attempting to uncover the knowledge that a neural network has learned implicitly in its weights. One way of doing this is to translate trained neural networks into a set of symbolic rules or decision trees similar to the ones found in symbolic AI, ML and logic programming

Russell and Norvig (2010). Over the years, many rule extraction techniques have been developed Towell and Shavlik (1993)Murphy and Pazzani (1991) Craven (1996) Tran and d’Avila Garcez (2016) d’Avila Garcez et al. (2001) but none have been able to completely solve the black box problem for neural networks. The main barrier to comprehensible rule extraction is the complexity of the extracted rules. Even if it is possible to find a symbolic system which exactly describes a neural network, it may contain too many rules to be understandable.

Perhaps the main reason this has proved to be a difficult problem is that reasoning in neural networks takes place in a distributed fashion LeCun et al. (2015). It has been argued that one of the fundamental properties of neural networks is that any abstract concepts it uses are represented in a distributed way, that is as patterns of activations across many hidden neurons rather than with a single hidden neuron Smolensky (1988).

The distributed nature of neural networks has led many to conclude that attempting to explain the hidden neurons of large neural networks using symbolic knowledge extraction is a dead end Frosst and Hinton (2017). Instead, alternative approaches to explanation have grown in popularity (see Guidotti et al. (2018) for a survey). Such approaches are so varied that four distinct explainability problems have been identified: global explanations, which attempt to give an explanation of a black box, local explanations, which attempt to give an explanation for a particular output of a black box, visualization, which gives a visual explanation of a latent feature or output, and transparent box design, which seeks to create new models which have some inherent explainability.

Recent trends have favoured model-agnostic methods which opt to use the input-output relationship of a model to generate an explanation rather than assigning any meaning to hidden variables. From the point of view of transparency this may be adequate, but understanding the exact reasoning that a neural network uses with respect to its representation could shine new light into the kinds of knowledge that a deep neural network learns and how it uses that knowledge Garcez et al. (2008). This has the potential to accelerate the development of more robust models by illuminating any deficiencies that exist in current models and their learning algorithms.

In this paper, we develop a rule extraction method that can control for the complexity of a rule via the scaling of an objective function. We do this by performing a parallel search through the space of M-of-N rules Towell and Shavlik (1993) and measuring the error and complexity of each rule. By restricting our search space and using parallel techniques we are able to apply our algorithm to much larger networks than more exhaustive search techniques. We evaluate our algorithm against an optimal search technique (CORELS Angelino et al. (2017)) on a series of small networks before applying it to the layers of deep convolutional networks. By selecting various error/complexity trade-offs, we are able to map out a rule extraction landscape which shows the relationship between how complex the extracted rules are allowed to be and how accurately they capture the behaviour of a network. We find that the relative explainability between layers differs greatly and that changes to the network such as activation function can affect whether or not rule extraction will be useful in certain layers.

In Section , we provide an overview of previous algorithms used for knowledge extraction. In Section , we give definitions of accuracy and complexity for M-of-N rules and present the extraction algorithm. In Section , experimental results are reported and discussed. Section concludes and discusses directions for future work.

2 Background and Related Work

Approaches to rule extraction can, in general, be identified as decompositional, in which the parameters of the network are used to generate rules, pedagogical, in which the behaviour of the network is used to generate rules, or eclectic which are techniques with both decompositional and pedagogical components Andrews et al. (1995). One of the first attempts at knowledge extraction used a decompositional approach applied to feedforward networks, in particular the Knowledge-based Artificial Neural Networks (KBANN) Towell and Shavlik (1994). This algorithm used the weights of a hidden variable to extract symbolic rules of the form IF M out of a set of N neurons (or concepts) are activated (or hold) THEN a given neuron (concept) is activated (holds), called M-of-N rules Towell and Shavlik (1993). This was followed by more sophisticated algorithms which generate binary trees in which each node is an M-of-N rule Murphy and Pazzani (1991) Craven (1996) (Notice that these binary trees can be reduced to IF-THEN propositional logic sentences as before). These more recent algorithms are pedagogical in that they select an M-of-N rule using the input units as the concepts (called literals in logic), based on the maximum information gain with respect to the output. Other algorithms extract rules in the form of decision sets which are another rule based structure equivalent to decision trees. Two level decision sets have been used to generate both local explanations Lakkaraju et al. (2016) and global explanations Lakkaraju et al. (2017)Angelino et al. (2017) but have only been done in a model-agnostic way with no attempt to explain the internal variables of a model such as the hidden neurons in a deep network.

Other methods abandon the knowledge extraction paradigm and opt for alternative techniques. In the context of computer vision, the use of visual importance methods might be preferred

Ribeiro et al. (2016)Samek et al. (2017). Another approach is to design models which are explainable by design Karpathy et al. (2015) Frosst and Hinton (2017) Ribeiro et al. (2018) Courbariaux and Bengio (2016). In the last example, we note the similarity of the restricted model to M-of-N rules, each hidden neuron in this case can be thought of as an M-of-N rule.

Most decompositional rule extraction has been applied only to shallow networks. The multiple hidden layers in a deep network mean that in order to explain an arbitrary hidden feature in terms of the input, a decompositional technique has to produce a hierarchy of rules (see Tran and d’Avila Garcez (2016) for an example of hierarchical rule extraction). With many hidden layers, the extracted rules can quickly grow far too complex for a human to understand, unless each constituent of the rule hierarchy is exceedingly simple. Thus, the use of decompositional techniques to explain the features of a deep network end-to-end seems impractical, as argued in Frosst and Hinton (2017). Nevertheless, experiments reported in this paper show that some layers of a deep network are associated with highly explainable rules opening up the possibility of rule extraction being used as a component in a modular explanation of network models.

3 Layerwise Knowledge Extraction

3.1 M-of-N Rules for Knowledge Representation:

In logic programming, a logical rule is an implication of the form , called if . The literal is called the head of the rule and stands for a conjunction of literals, called the body of the rule. Disjunctions in the body can be modelled simply as multiple rules having the same head. Most logic programs adopt a negation by failure approach whereby is if and only if is Fitting (2002). When using rules to explain a neural network, the literals will refer to the states of neurons. For example, if a neuron takes binary values {0,1} then we define the literal by if , and if . For neurons with continuous activation values, we can define a literal by including a threshold such that if , and otherwise. In other words, the literal is shorthand for the statement .

In neural networks, a hidden neuron is usually poorly described by a single conjunctive rule since there are many different input configurations which will activate a neuron. Rather than simply adding a rule for each input pattern that activates a neuron (which essentially turns the network into a large lookup table), we look for M-of-N rules which have been commonly used in rule extraction starting with Towell and Shavlik (1993). M-of-N rules soften the conjunctive constraint on the body of logical rules by requiring only of the variables in the body to be true for some specific value of (notice that when we are left with a conjunction). For example, the rule is equivalent to or or (, where stands for negation by failure.

M-of-N rules are an attractive candidate for rule extraction because they share a structural similarity with neural networks. Indeed every M-of-N rule can be thought of as a simple perception with binary weights and a threshold . M-of-N rules have been used in the early days of knowledge extraction but have since been forgotten. This paper brings M-of-N rules to the forefront of the debate on explainability again.

When networks have continuous activation values, in order to define the literals to use for rule extraction we must choose a splitting value for each neuron which will lead to a literal of the form . In order to choose such values for continuous neurons we use information gain Quinlan (1986)MacKay (2002) Given a target neuron that we wish to explain, we generate a literal for the target neuron by selecting a split based on the information gain with respect to the output labels of the network. That is, given a set of test examples, choose the value of the target neuron which splits the examples in such a way as to result in the maximum decrease in entropy of the network outputs on the test examples.

The input literals are then generated from the inputs to the target neuron by choosing splits for each input which maximize the information gain with respect to the target literal generated in the previous step. In practice this means that each target literal in a layer will have its own set of input literals, each corresponding to the same set of input neurons but with different splits.

In the case that the layer is convolutional, each feature map corresponds to a group of neurons, each with a different input patch. Rather than test every single neuron in the feature map we only test the one whose optimal split has the maximum information gain with respect to the network output. This gives us a single rule for each feature map rather than a collection of rules.

  Generate a split, , for by choosing the value which maximizes the information gain with respect to the network output. Use this to define the literal
  for Each neuron which is an input of do
     Generate a split for by choosing the value which maximizes the information gain with respect to . Use this value to define the literal if the connection between and is positive, and use it to define otherwise
  end for
  Order the input literals by the magnitude of their weights
  for  number of inputs do
     for  do
        Create an rule, , whose body consists of the first literals. Then compute ;
     end for
  end for
  Compute for the trivial rules and ;
  return  rule with the lowest value of .
Algorithm 1 Search procedure for finding M-of-N rules to explain a hidden feature

3.2 Soundness and Complexity Trade-off

The two metrics we are concerned with in rule extraction are comprehensibility and accuracy. For a given rule we can define accuracy in terms of a soundness measure. This is simply the expected difference between the predictions made by the rules and the network. More concretely given a neuron in a neural network with input neurons , we can use the network to compute the state of from the state of the input neurons which then determines the truth of literal . Thus we can use the network to determine the truth of , call this . Furthermore, if we have some rule relating variables and , we can use the state of the input to determine the value of the variables , and then use to determine the value of , call this . Given a set of input configurations to test (not necessarily from the test set of the network) we can measure the discrepancy between the output of the rules and the network as


In other words, we measure the average error of the rules when trying to predict the output of the network over a test set.

Comprehensibility is more difficult to define as there is a degree of subjectivity. The approach we take is to look at the complexity of a rule. Here, we think of complexity analogously to the Kolmogorov complexity which is determined by a minimal description. Thus we determine the complexity of a rule by the length of its body when expressed by a (minimal) rule in disjunctive normal form (DNF). For an M-of-N rule, the complexity is simply , where denotes the binomial coefficient. For our experiments we measure complexity in relative terms by normalizing w.r.t. a maximum complexity. Given possible input variables, the maximum complexity is , where denotes the ceiling function (rounding to the next highest integer). Finally in order to control for growth we take the logarithm giving the following normalized complexity measure.


As an example, suppose we have a simple perceptron with two binary visible units with weights

and and whose output has a bias of . Then consider the rule -of-. Over the entire input space we see that only when and giving us an error of . Furthermore, a rule is the most complex rule possible for variables as it has the longest DNF of any M-of-N rule giving us a complexity of .

Using Eqs. and

we define a loss function for a rule

as a weighted sum in which a parameter determines the trade-off between soundness and complexity.


By using a brute force search with various values of we are able to explicitly determine the relationship between the allowed complexity of a rule and its maximum accuracy. For the rule with the minimum loss will simply be the rule with minimum error regardless of complexity, and for large enough the rule with the minimum loss will be a rule with complexity, either a rule or one of the trivial rules which either always predicts true or always predicts false (these can be represented as M-of-N rules by and respectively).

3.3 Layerwise M-of-N Rule Extraction Algorithm

Given a neuron with input neurons , we generate splits for each neuron using the technique just described to obtain a set of literals and . Then, we negate the literals corresponding to neurons which have a negative weight to . Using these we search through M-of-N rules with variables in the body and in the head, which minimize

. To do this, as a heuristic, we reorder the variables according to the magnitude of the weight connecting

to (such that we have ). Then we consider the rule for each and each . The search procedure only relies on the ordering of the variables . By ordering the literals according to the magnitude of their weights we reduce an exponential search space to a polynomial one. In the ideal case the set of possible input values to a hidden neuron is (where is the set of values that each input neuron can possibly take); it can be easily proved that the weight-ordering will find an optimal solution. In practice however, certain inputs may be highly correlated. When this is the case there is no guarantee that the weight-ordering will find the optimal M-of-N rule. Thus in the general case the search procedure is heuristic. This heuristic allows us to run our search in parallel. We do this by using Spark in IBM Watson studio.

To illustrate the entire process, let us examine rule extraction from the first hidden layer in the CNN trained on the fashion MNIST data set. First we randomly select a set of examples and use them to compute the activations of each neuron in the CNN as well as the predicted labels of the network. With padding there are

neurons per feature in the first hidden layer, each corresponding to a different patch of the input image. We then find the optimal splitting value of each neuron by computing the information gain of each splitting choice with respect to the network’s predicted labels. We find that the neuron with the maximum information gain is neuron which has an information gain of when split on the value . This neuron corresponds to the image patch centered at . With this split we define the variable as iff .

Using this variable we define the input splits by choosing the values which result in the maximum information gain with respect to . We then search through the M-of-N rules whose bodies consist of the input variables defined by the splits to determine an optimal M-of-N rule explaining for various error-complexity tradeoffs. As we increase the complexity, three different rules are extracted which can be visualized in Figure 1. As can be seen, many of the weights are filtered out by the rules. The most complex rule is a -of- rule which has a error. A mild complexity penalty changes the optimal rule to the much simpler -of- rule, but raises the error to . And a heavy complexity penalty produces a -of- rule which has the significantly higher error of .

Figure 1: The leftmost image represents the weights of neuron . The next three images are obtained from rules of decreasing complexity extracted from the CNN explaining that neuron. If a literal is true (resp. false) it is shown in white (resp. black). Grey indicates that the input feature is not present in the M-of-N

rule. Notice how a rule can be seen as a discretization of the network into a three-valued logic, similar to what is proposed by binarized networks

Courbariaux and Bengio (2016) but without constraining the network training a priori.

4 Experimental Results

4.1 Small Fully Connected Networks

In order to compare our search procedure with CORELS as an optimal baseline Angelino et al. (2017), we evaluate both methods on a series of small fully connected networks. The first is a deep neural network with fully connected layers of and hidden neurons, respectively, with a rectified linear (ReLu) activation function, on the car evaluation dataset Bohanec and Rajkovic (1988). The second is a single layer network with hidden neurons with ReLU activations trained on the E. Coli dataset Horton and Nakai (1996). The final network is a single layer hidden unit network trained on the DNA promoter dataset Towell et al. (1990). Because the DNA promoter dataset is quite small, we produce synthetic examples to evaluate our rule extraction methods on the final network. We simply use the entire dataset for the other two networks.

CORELS produces optimal rules for a given set of parameters (maximum cardinality, minimum support and a regularization parameter) also seeking to penalize complexity. Maximum cardinality refers to the maximum number of literals in the body of a rule, the minimum support refers to the minimum number of training examples an antecedent must capture to be considered in the search, finally the regularization parameter is a scalar penalty on the complexity, equivalent to the parameter used in our M-of-N search.

Because our extraction algorithm uses an ordering on the literals, each rule can be evaluated independently so that the search procedure can run in parallel. This greatly speeds up the search compared to CORELS, which requires a sequential search. This faster search will allow us to apply the extraction algorithm to larger networks and to use more test examples. However, since we only search over M-of-N rules we are not guaranteed to find an optimal solution. For this reason we compare our layerwise results with CORELS to see how far from optimal our search method is. Since CORELS has multiple parameters to penalize complexity we run CORELS multiple times with different parameters to generate a set of rules with higher complexity and one with lower complexity and then compare these rules to rules of similar complexity found by our parallel search.

In Table 1 we can see that rules found via our M-of-N search are only marginally worse than a set of optimal rules with similar complexity found by CORELS and that CORELS can become quite slow when using too broad a search on a dataset with many inputs. When applied to the DNA promoter network CORELS runs out of memory and we were unable to produce a result showing that even for this relatively small network CORELS is too computationally demanding. Notice also that in this example the second hidden layer is much more explainable than the first, c.f. the large difference in accuracy between layers.

Method Comp Acc Network Time
CORELS(1/0.01/0.01) n/a n/a DNA promoter Layer 1 n/a
Parallel M-of-N 0.239 89% DNA promoter Layer 1 700s
CORELS (1/0.01/0.01) 0.124 93.4% Cars Layer 1
CORELS (2/0.05/0.05) 0.04 87.3% Cars Layer 1
Parallel M-of-N 0.131 90.3% Cars Layer 1
Parallel M-of-N 0.031 85.4% Cars Layer 1
CORELS (1/0.01/0.01) 0.053 99.05% Cars Layer 2
CORELS (3/0.02/0.02) 0.079 99.42% Cars Layer 2
Parallel M-of-N 0.057 98.4% Cars Layer 2
Parallel M-of-N 0.069 98.6% Cars Layer 2
CORELS (1/0.01/0.01) 0.165 91.6% E.COLI Layer 1
CORELS (2/0.005/0.001) 0.287 92.6 % E.COLI Layer 1
Parallel M-of-N 0.132 89.4% E.COLI Layer 1
Parallel M-of-N 0.189 90.2% E.COLI Layer 1
Table 1: Comparison of rules extracted from different layers of networks trained on various datasets using (sequential) CORELS with different values for cardinality/support/regularization and our Parallel M-of-N extraction using different values for . At a similar level of complexity (Comp), rules extracted by CORELS are only marginally more accurate (c.f. Acc) than M-of-N rules, despite CORELS searching over a much larger sequential rule space; refer to computation time (Time). Below, n/a is used when CORELS exits without terminating.

Finally, the rate of accuracy decrease vs. complexity of Parallel M-of-N seems to be lower than that of CORELS; this deserves further investigation. In summary, the above results show that a parallel M-of-N search can provide a good approximation of the complexity/error trade-off for the rules describing the network. Next, we apply Parallel M-of-N to much larger networks for which sequential or exhaustive methods become intractable.

(a) CNN-Relu
(b) CNN-Tanh
(c) CNN-AE
Figure 2: The Complexity/Error relationship for rules extracted from each layer of three different deep networks trained on MNIST. From left to right a CNN with relu activations trained end-to-end, a CNN with tanh activations trained end-to-end, a CNN with Relu activations trained as an autoencoder.

4.2 Deep Convolutional Networks

In order to evaluate the capability of compact M-of-N rules at explaining hidden features, we now apply the extraction algorithm to the hidden layers of three different networks trained on MNIST and compare results. Since applying extraction hierarchically can cause an accumulation of errors from previous layers, we use the network to compute the values of the inputs to the hidden layer that we wish to extract rules from. Hence, the errors from the extracted rules correspond to rule extraction at that layer. This allows us to examine the relative explainability at each layer. In practice, one could extract a hierarchical set of rules by choosing a single splitting value for each neuron.

Our three networks are identical save for the activation function and training procedure. The network architecture consists of two convolutional layers with and filters respectively, each with a

convolutional window and using max pooling. This is followed by a

-unit densely connected layer with linear activation followed by a softmax layer. The first network uses ReLu units in the first two layers and is trained end-to-end. The second network is trained identically to the first but uses the hyperbolic tangent (Tanh) activation function in the first two layers. The third network uses an autoencoder to train the first three layers unsupervised before training the final softmax layer separately. We evaluate the rules using examples from the test set

Comparing the network using ReLu to the one using Tanh shows that in both cases the minimum error for each layer remains approximately the same. However, the explainability in the Tanh network is greatly increased in the first three layers, rules extracted from the Tanh network can be made much less complex without significantly increasing the error. This applies not only to the first two layers but also to layer 3 which uses a linear activation in both cases. In both cases the third layer is much less explainable than the first two and the only layer which we are truly able to produce an acceptably accurate and comprehensible explanation is the final one in which we see rules with an average complexity of achieving an average error of .

In the third layer we believe that the higher minimum error is mainly the result of the number of input units. In these layers there appear to be a lot of input units which are not relevant enough alone to be included in an M-of-N rule, but collectively they add enough noise to have a significant effect on the output. Because our search procedure is heuristic, it’s possible that a more thorough search could produce rules which are simpler and more accurate but our results at least tentatively back up the idea that the distributed nature of neural networks makes rule extraction from the hidden layers impractical if not infeasible. We hypothesize that the difference in complexity between rules extracted from the Tanh network and the Relu network is due to the saturating effect of the tanh function. A hidden neuron in the tanh network may have fewer ‘marginally relevant’ features than in the Relu network. This would explain the steep decline in accuracy found in the Tanh network and the more gradual decline found in the Relu network.

The autoencoder has hidden features which are in general more explainable than either of the two previous networks. Compared to the ReLu network, the error of the extracted rules in the second layer is lower at every level of complexity. Compared to the Tanh network, the autoencoder has more accurate rules at medium levels of complexity ( error at complexity vs. error at complexity). However, as complexity is reduced the extracted rules in the Tanh network remain accurate for longer ( error at complexity vs. at complexity). Interestingly, in the autoencoder the second layer is slightly less explainable than the first. The third layer is more explainable than it is in the other two networks with significant increases in error only being seen with rules of average complexity less than . In the softmax layer trained on top of the autoencoder we see that one cannot extract accurate rules of any complexity. This points to something fundamentally different from the previous two networks in the way that softmax uses the representations from the final layer to predict the output. This is the subject of further investigation.

Our results indicate that, at least when it comes to extracting M-of-N rules with an assumption of weight-ordering, there are hard limitations to representing hidden units that cannot be overcome with any level of complexity. These limitations seem to be the result of the internal representations determined by the training procedure. Whether these limitations can be overcome by refining rule extraction methods or whether they are a fundamental part of the network is to be determined. However, we also find that the final layer of a CNN may be a promising target for rule extraction. We verify this by training more 4-layer CNNs on the Olivetti faces and fashion MNSIT dataset. The network trained on the Olivetti faces dataset consists of two convolutional layers with and filters respectively each with a window and followed by max pooling. Then a unit fully connected hidden layer with a linear activation followed by the softmax layer. The fashion MNIST network is larger. It has two convolutional layers with and filters with a window followed by max pooling. Then a unit fully connected layer followed by the softmax. Olivetti faces is evaluated using the entire dataset and fashion MNIST is evaluated with samples.

In Table 2 we can see that the Olivetti Faces dataset had the most accurate and interpretable rules of all, this is probably at least partially due to the smaller size of the dataset. In all cases one can see a large drop in the complexity with only a penalty of

resulting in a less than decrease in accuracy. This suggests that in the softmax layer, relatively few of the input neurons are being used to determine the output. This shows that rule extraction, and in particular M-of-N rule extraction can be an effective component in a multi-pronged approach to explainability. By extracting M-of-N rules from the final layer and using importance methods to explain the relevant hidden units, one should be able to reason about a network’s structure in ways that cannot be achieved with a strictly model-agnostic approach. Such a hybrid approach is expected to create explanations which can be accurate and yet less complex.

Dataset Comp. () Acc.() Comp. () Acc.()
Olivetti Faces 0.03 100% 0.024 99.9%
MNIST 0.7 99.6% 0.06 98.7%
Fashion MNIST 0.28 99.3% 0.06 98.8%
Table 2: Comparison of the complexity (Comp), and accuracy (Acc) of rules extracted from the final layer of three CNNs trained on different datasets. Repeated for complexity penalties of and

5 Conclusion and Future Work

The black box problem has been an issue for neural networks since their creation. As neural networks become more integrated into society, explainability has attracted considerably more attention. The success of knowledge extraction in this endeavor has overall been mixed with most large networks today remaining difficult to interpret and explain. Traditionally, rule extraction has been a commonly used paradigm and it has been applied to various tasks. Critics, however, point out that the distributed nature of neural networks makes the specific method of decompositional rule extraction unfeasible as individual latent features are unlikely to represent anything of significance. We test this claim by applying a novel search method of M-of-N rule extraction to generate explanations of varying complexity for hidden neurons in a deep network. We find that the complexity of neural representations does provide a barrier to comprehensible rule extraction from deep networks. However we also find that within the softmax layer rule extraction can be both highly accurate and simple to understand. This shows that rule extraction, including M-of-N rule extraction can be a useful tool to help explain parts of a deep network. As future work, softmax layer rule extraction will be combined with local explainability techniques. Additionally, our preliminary experiments suggest that replacing the output layer of a network with M-of-N rules may be more robust to certain adversarial attacks. Out of adversarial examples generated using FGSM Goodfellow et al. (2014) for the CNN trained on MNIST,

were classified correctly by the

M-of-N rules with maximum complexity by contrast with none classified correctly by the CNN. This is to be investigated next in comparison with various other defense methods.


  • [1] R. Andrews, J. Diederich, and A. B. Tickle (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems 8 (6), pp. 373 – 389. Cited by: §2.
  • [2] E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin (2017-04) Learning Certifiably Optimal Rule Lists for Categorical Data. arXiv e-prints, pp. arXiv:1704.01701. External Links: 1704.01701 Cited by: §1, §2, §4.1.
  • [3] M. Bohanec and V. Rajkovic (1988) Knowledge acquisition and explanation for multi-attribute decision making. In 8’th International Workshop “Expert Systems and Their Applications, Cited by: §4.1.
  • [4] M. Courbariaux and Y. Bengio (2016) BinaryNet: training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830. External Links: Link, 1602.02830 Cited by: §2, Figure 1.
  • [5] M. W. Craven (1996) Extracting comprehensible models from trained neural networks. Ph.D. Thesis, The University of Wisconsin - Madison. External Links: ISBN 0-591-14495-6 Cited by: §1, §2.
  • [6] A. d’Avila Garcez, K. Broda, and D. Gabbay (2001) Symbolic knowledge extraction from trained neural networks: a sound approach. Artificial Intelligence 125 (1), pp. 155 – 207. External Links: ISSN 0004-3702 Cited by: §1.
  • [7] M. Fitting (2002) Fixpoint semantics for logic programming a survey. Theoretical Computer Science 278 (1), pp. 25 – 51. Note: Mathematical Foundations of Programming Semantics 1996 External Links: ISSN 0304-3975 Cited by: §3.1.
  • [8] N. Frosst and G. Hinton (2017) Distilling a neural network into a soft decision tree. In Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML, pp. 879–888. Cited by: §1, §2, §2.
  • [9] A. S. d. Garcez, L. C. Lamb, and D. M. Gabbay (2008) Neural-symbolic cognitive reasoning. 1 edition, Springer. External Links: ISBN 3540732454, 9783540732457 Cited by: §1.
  • [10] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. External Links: 1412.6572 Cited by: §5.
  • [11] R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti (2018) A survey of methods for explaining black box models. CoRR abs/1802.01933. External Links: Link, 1802.01933 Cited by: §1.
  • [12] P. Horton and K. Nakai (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, pp. 109–115. External Links: ISBN 1-57735-002-2, Link Cited by: §4.1.
  • [13] A. Karpathy, J. Johnson, and F. Li (2015) Visualizing and understanding recurrent networks. CoRR abs/1506.02078. External Links: Link, 1506.02078 Cited by: §2.
  • [14] H. Lakkaraju, S. H. Bach, and J. Leskovec (2016) Interpretable decision sets: a joint framework for description and prediction. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, pp. 1675–1684. External Links: ISBN 978-1-4503-4232-2, Link, Document Cited by: §2.
  • [15] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec (2017) Interpretable & explorable approximations of black box models. CoRR abs/1707.01154. External Links: Link, 1707.01154 Cited by: §2.
  • [16] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (2), pp. 436––444. Cited by: §1.
  • [17] D. J. C. MacKay (2002) Information theory, inference & learning algorithms. Cambridge University Press, New York, NY, USA. External Links: ISBN 0521642981 Cited by: §3.1.
  • [18] P. M. Murphy and M. J. Pazzani (1991) ID2-of-3: constructive induction of m-of-n concepts for discriminators in decision trees. In In Proceedings of the Eighth International Workshop on Machine Learning, pp. 183–187. Cited by: §1, §2.
  • [19] J. R. Quinlan (1986-03-01) Induction of decision trees. Machine Learning 1 (1), pp. 81–106. External Links: ISSN 1573-0565 Cited by: §3.1.
  • [20] M. T. Ribeiro, S. Singh, and C. Guestrin (2016) "Why should I trust you?": explaining the predictions of any classifier. CoRR abs/1602.04938. External Links: Link, 1602.04938 Cited by: §2.
  • [21] M. T. Ribeiro, S. Singh, and C. Guestrin (2018) Anchors: high-precision model-agnostic explanations. In AAAI Conference on Artificial Intelligence (AAAI), Cited by: §2.
  • [22] M. Richardson and P. Domingos (2006-02) Markov logic networks. Mach. Learn. 62 (1-2), pp. 107–136. External Links: ISSN 0885-6125, Link, Document Cited by: §1.
  • [23] S. Russell and P. Norvig (2010) Artificial intelligence: a modern approach. 3 edition, Pearsonl. Cited by: §1.
  • [24] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. Müller (2017-11) Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems 28, pp. 2660–2673. External Links: Document Cited by: §2.
  • [25] P. Smolensky (1988) On the proper treatment of connectionism. Behavioral and Brain Sciences 11 (1), pp. 1–23. Cited by: §1.
  • [26] G. G. Towell, J. W. Shavlik, and M. O. Noordewier (1990) Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the Eighth National Conference on Artificial Intelligence - Volume 2, AAAI’90, pp. 861–866. External Links: ISBN 0-262-51057-X, Link Cited by: §4.1.
  • [27] G. G. Towell and J. W. Shavlik (1993) Extracting refined rules from knowledge-based neural networks. Machine Learning 13 (1), pp. 71–101. Cited by: §1, §1, §2, §3.1.
  • [28] G. G. Towell and J. W. Shavlik (1994) Knowledge-based artificial neural networks. Artif. Intell. 70, pp. 119–165. Cited by: §2.
  • [29] S. Tran and A. d’Avila Garcez (2016) Deep logic networks. IEEE Transactions on Neural Networks and Learning Systems 29 (2), pp. 246–258. Cited by: §1, §2.
  • [30] A. G. Wilson, J. Yosinski, P. Simard, R. Caruana, and W. Herlands (Eds.) (2017-12) Proceedings of nips 2017 symposium on interpretable machine learning. Cited by: §1.