Recent progress in artificial intelligence and the increased deployment of AI systems have highlighted the need forexplaining the decisions made by such systems; see, e.g., [1, 31, 33, 25, 35, 22, 10].111It is now recognized that opacity, or lack of explainability is “one of the biggest obstacles to widespread adoption of artificial intelligence” (The Wall Street Journal, August 10, 2017). For example, one may want to explain why
a classifier decided to turn down a loan application, or rejected an applicant for an academic program, or recommended surgery for a patient. Answering suchwhy? questions is particularly central to assigning blame and responsibility, which lies at the heart of legal systems and may be required in certain contexts.222Take for example the European Union general data protection regulation, which has a provision relating to explainability, http://www.privacy-regulation.eu/en/r71.htm. The formal verification of AI systems has also come into focus recently, particularly when such systems are deployed in safety-critical applications.
We propose a knowledge compilation approach for explaining and verifying the behavior of a neural network classifier. Knowledge compilation is a sub-field of AI that studies in part tractable Boolean circuits, and the trade-offs between succinctness and tractability [34, 5, 11, 16]. By enforcing different properties on the structure of a Boolean circuit, one can obtain greater tractability (the ability to perform certain queries and transformations in polytime) at the possible expense of succinctness (the size of the resulting circuits). Our goal is to compile the Boolean function specified by a neural network into a tractable Boolean circuit that facilitates explanation and verification.
We consider neural networks whose inputs are binary () and that use step activations. Such a network would have real-valued parameters, but the network itself induces a purely Boolean function. We seek a tractable Boolean circuit that represents this function, which we obtain in two steps. First, note that neurons with step activations and binary inputs then produce a binary output—each neuron induces its own Boolean function. Using, e.g., the algorithm of  we can obtain a tractable circuit for a given neuron’s Boolean function. The neural network then induces a Boolean circuit, although it may not be tractable. Thus, we compile this circuit into a tractable one by enforcing additional properties on the circuit until certain operations become tractable, as done in the field of knowledge compilation. We then explain the decisions and verify the properties of this circuit, as done in [35, 36]; cf. .
Our approach follows a recent trend in analyzing machine learning models using symbolic approaches such as satisfiability and satisfiability modulo theory; see, e.g.,[23, 24, 28, 35, 21, 22, 37]. While machine learning and statistical methods are key for learning classifiers, it is evident that symbolic and logical approaches, which are independent of any of the models parameters, are key for analyzing and reasoning about them. Our approach, based on compilation into a tractable Boolean circuit, can go beyond queries based on (for example) satisfiability, as we shall show.
This paper is organized as follows. In Section 2 we review relevant background material. In Section 3 we show how to reduce neural networks to Boolean circuits by compiling each neuron into a Boolean circuit. In Section 4 we discuss how to obtain tractable circuits, via knowledge compilation. In Section 5, we show how a tractable circuit enables one to reason about the robustness of a neural network. We provide a case study in Section 6, empirically evaluate our proposed compiler in Section 7, and finally conclude in Section 8.
2 Technical Preliminaries
respectively. A ReLU activationoutputs 0 if and outputs otherwise.
A feedforward neural network is a directed acyclic graph (DAG); see Figure 0(a). The roots of the DAG are the neural network inputs, call them . The leaves of the DAG are the neural network outputs, call them . Each node in the DAG is called a neuron and contains an activation function ; see Figure 0(b). Each edge in the DAG has a weight attached to it. The weights of a neural network are its parameters, which are learned from data.
In this paper, we assume that the network inputs are either or . We further assume step activation functions:
A neuron with a step activation function has outputs that are also or . If the network inputs are also or , then this means that the inputs to all neurons are or . Moreover, the output of the neural network is also or . Hence, each neuron and the network itself can be viewed as a function mapping binary inputs to a binary output, i.e., a Boolean function. For each neuron, we shall simply refer to this function as the neuron’s Boolean function. When there is a single output , we will simply refer to the corresponding function as the network’s Boolean function.
3 From Neural Networks to Boolean Circuits
Consider a neuron with step activation function , inputs , weights and bias . The output of this neuron is simply
As an example, consider a neuron with 3 inputs and with weights and and a bias of . This neuron outputs 1 iff:
Treating a value of 1 as and a value of 0 as , we can view this neuron as a Boolean function whose output matches that of the neuron, on inputs and . Figure 2 highlights two logically equivalent representations of this neuron’s Boolean function. Figure 1(a) highlights an Ordered Binary Decision Diagram (OBDD) representation333An Ordered Binary Decision Diagram (OBDD) is a rooted DAG with two sinks: a -sink and a -sink. An OBDD is a graphical representation of a Boolean function on variables . Every OBDD node (but the sinks) is labeled with a variable and has two labeled outgoing edges: a -edge and a -edge. The labeling of the OBDD nodes respects a global ordering of the variables : if there is an edge from a node labeled to a node labeled , then must come before in the ordering. To evaluate the OBDD on an instance , start at the root node of the OBDD and let be the value of variable that labels the current node. Repeatedly follow the -edge of the current node, until a sink node is reached. Reaching the -sink means is evaluated to 1 and reaching the -sink means is evaluated to 0 by the OBDD. and Figure 1(b) highlights a circuit representation. These functions are equivalent to the sentence:
i.e., if is then or must be 1 to meet or surpass the threshold , and if is 1 then both and must be 1.
OBDDs, as in Figure 1(a), are tractable representations—they support many operations in time polynomial (and typically linear) in the size of the OBDD [4, 26, 40]. Circuits, as in Figure 1(b), are not in general tractable as OBDDs, although we will later seek to obtain tractable circuits through knowledge compilation, a subject which we will revisit in more depth in Section 4. Note further that OBDDs are also circuits that are notated more compactly.444An OBDD node labeled by variable and with children and is equivalent to the circuit fragment
Our goal now is to obtain a tractable circuit representation of a given neuron. First, consider the following class of threshold-based linear classifiers.
Let be a set of binary features where each feature in has a value . Let denote an instantiation of variables . Consider functions that map instantiations to a value in . We call a linear classifier if it has the following form:
where is a threshold, is the value of variable in instantiation , and where is the real-valued weight associated with value of variable .
Note that such classifiers are also Boolean functions. The following result, due to , gives us a way of obtaining a tractable circuit representing the Boolean function of such classifiers.
A linear classifier in the form of Equation 2 can be represented by an OBDD of size nodes, which can be computed in time.
, although much more efficiently than what the bounds suggest. It was originally for compiling naive Bayes classifiers to Ordered Decision Diagrams (ODDs). However, this algorithm applies to any classifier of the form given by Equation2
, which includes naive Bayes classifiers, but also logistic regression classifiers, as well as neurons with step activation functions.
Compiling a linear classifier such as a neuron or a naive Bayes classifier is NP-hard , hence algorithms, such as the one from , are unlikely to have much tighter bounds. However, we can significantly tighten this bound if we make additional assumptions about the classifier’s parameters.
Consider a linear classifier in the form of Equation 2, where the weights and threshold are integers. Such a classifier can be represented by an OBDD of size nodes, and compiled in time, where is a sum of absolute values.
While this result is known, Appendix A provides a construction for completeness.555This result appears, for example, as an exercise in https://www.cs.ox.ac.uk/people/james.worrell/lectures.html. This result also falls as a special case of , which showed how to compile tree-augmented naive Bayes classifiers into OBDDs, where a naive Bayes classifier is a special case. Note that the integrality assumption of Theorem 2 can be applied to classifiers with real-valued weights by multiplying the parameters by a constant and then truncating (i.e., the parameters have fixed precision).
As we shall show in our experiments in Section 7, this pseudo-polynomial time algorithm enables the compilation of neurons, and ultimately neural networks, with hundreds of features, in contrast to the preliminary work of , which scaled only to dozens of features, using the algorithm of .
Now that we can compile each neuron into a (tractable) Boolean circuit, the whole neural network will then induce a Boolean circuit as illustrated in Figure 3. That is, for the given neural network in Figure 2(a), each neuron is compiled into a Boolean circuit as in Figure 2(b). The circuits for neurons are then connected according to the neural network structure, leading to the Boolean circuit in Figure 2(c), where the circuit of each neuron is portrayed as a block.
Using the algorithm of , the Boolean circuit that we obtain from a neuron is tractable. The network’s Boolean circuit, that we construct from the Boolean circuits of the neurons, may not be tractable however. To use the explanation and verification techniques proposed in [35, 36], we require a tractable circuit; cf. . We next show how to obtain such a circuit using tools from the field of knowledge compilation.
4 Tractability via Knowledge Compilation
In this section, we provide a short introduction to the domain of knowledge compilation, and then show how we can compile a neural network into a tractable Boolean circuit.
We follow , which considers tractable representations of Boolean circuits, and the trade-offs between succinctness and tractability. In particular, they consider Boolean circuits of and-gates, or-gates and inverters, but where inverters only appear at the inputs (hence the inputs of the circuit are variables or their negations). This sub-class of circuits is called Negation Normal Form (NNF) circuits. Any circuit with and-gates, or-gates and inverters can be efficiently converted into an NNF circuit while at most doubling its size.
By imposing properties on the structure of NNF circuits, one can obtain greater tractability (the ability to perform certain operations in polytime) at the possible expense of succinctness (the size of the resulting circuit). To help motivate this trade-off, consider Figure 4, which highlights the containment relationship between four complexity classes. The “easiest” class is , and the “hardest” class is . The canonical problems that are complete for each class all correspond to queries on Boolean expressions. One popular computational paradigm for solving problems in these classes is to reduce them to the canonical problem for that class, and to compile the resulting Boolean expressions to circuits with the appropriate properties.666For more on this paradigm, see http://beyondnp.org.777For a video tutorial on this paradigm, “On the role of logic in probabilistic inference and machine learning,” see https://www.youtube.com/watch?v=xRxP2Wj4kuA For example,  shows how to solve -complete problems by reduction to MajMajSAT queries on a specific tractable class of Boolean circuits.
Consider now a property on NNF circuits called decomposability . This property asserts that the sub-circuits feeding into an and-gate cannot share variables. An NNF circuit that is decomposable is said to be in Decomposable Negation Normal Form (DNNF). In a DNNF circuit, testing whether the circuit is satisfiable can be done in time linear in the size of the circuit. Another such property is determinism . This property asserts that for each or-gate, if the or-gate outputs 1 then exactly one of its input is 1. A DNNF circuit that is also deterministic is called a d-DNNF. The circuit in Figure 1(b) is an example of a d-DNNF circuit. In a d-DNNF circuit, counting the number of assignments that satisfy the circuit can be done in time linear in the size of the circuit, assuming the circuit also satisfies smoothness .888Counting how many assignments satisfy a given circuit allows us to tell whether a majority of them satisfy the circuit (MajSAT). Hence, with these first two properties, we can solve the canonical problems in the two “easiest” classes illustrated in Figure 4.
A more recently proposed class of circuits is the Sentential Decision Diagram (SDD) [15, 41, 7]. SDDs are a subclass of d-DNNF circuits that assert a stronger form of decomposability, and a stronger form of determinism. SDDs subsume OBDDs and are exponentially more succinct . SDDs support polytime conjunction and disjunction. That is, given two SDDs and , there is a polytime algorithm to construct another SDD that represents or .999If and are the sizes of input SDDs, then conjoining or disjoining the SDDs takes time, although the resulting SDD may not be compressed . Further, SDDs can be negated in linear time.101010In our case study in Section 6 , we used the open-source SDD package available at
, we used the open-source SDD package available athttp://reasoning.cs.ucla.edu/sdd/.
These polytime operations allow a simple algorithm for compiling a Boolean circuit with and-gates, or-gates and inverters into an SDD. We first obtain an SDD for each circuit input. We then traverse the circuit bottom-up, compiling the output of each visited gate into an SDD by applying the corresponding operation to the SDDs of the gate’s inputs.
SAT and MajSAT can be solved in linear time on SDDs. Further properties on SDDs allow the problems E-MajSAT and MajMajSAT, the two hardest problems illustrated in Figure 4, to be also solved in time linear in the size of the SDD . In our experiments, we compiled the Boolean circuits of neural networks into standard SDDs as this was sufficient for efficiently supporting the explanation and verification queries we are interested in.
5 On the Robustness of Classifiers
While neural networks have become ubiquitous in machine learning and artificial intelligence, it is also increasingly apparent that neural networks learned in practice can be fragile. In particular, they can be susceptible to misclassifying an instance after small perturbations have been applied to it [38, 18, 27]. Next, we show how compiling a neural network into a tractable circuit can give one the ability to analyze the robustness of a neural network’s decisions.
We consider first the robustness of a binary classifier’s decision to label a given instance or . We consider the following question: how many features do we need to flip from to or to , before the classifier’s decision flips? That is, we consider the robustness of a given instance as the closest instance of the opposite label, by Hamming distance.
Definition 2 (Instance-Based Robustness).
Consider a Boolean classification function and a given instance The robustness of the classification of by , denoted by , is defined as follows. If is a trivial function (true or false), then . Otherwise,
where denotes the Hamming distance between and , i.e., the number of variables on which and differ.
Given a classification function , we refer to an instance as being -robust if , i.e., it takes flips of the features to flip the classification. In general, it is intractable to compute the robustness of a classification, unless P=NP. Consider the following decision problem:
D-ROBUST: Given function , instance , and integer , is ?
D-ROBUST is coNP-complete.
Given a tractable circuit (in particular, an OBDD), this question can in fact be answered in time linear in the size of the circuit .111111The robustness of an instance can be computed by the following recurrence, which recurses on the structure of an OBDD: where if is false and if is true. We employ the algorithm given by  in Section 6, in our case studies.121212Since computing robustness is coNP-complete, we can also apply SAT solvers to this task. For a given , we can determine if by first encoding the set of instances within a distance of away from as a CNF formula , using a standard encoding. We then encode the value of and the neural network’s classification function as a CNF formula , using a technique similar to that by . The formula is then satisfiable iff . Finally, we iterate over all possible values of (or perform binary search), so the instance-based robustness is just the smallest value of such that (i.e., is SAT).
Next, rather than consider the robustness of just one classification, we can consider the average robustness of a classification function, over all possible inputs. In other words, we consider the expected
robustness of a classifier, under a uniform distribution of its inputs.
Definition 3 (Model-based Robustness).
Consider a Boolean classification function . The model robustness of is defined as:
Let be the classification function whose robustness we want to assess. If we refer to as a positive instance; otherwise and we refer to as a negative instance. We propose Algorithm 1 for computing the model robustness of a classifier over all positive instances (the robustness of negative instances can be computed by invoking Algorithm 1 on function ).131313Recently,  proposed an approach for estimating robustness using approximate model-counting, with PAC-style guarantees. Their approach scaled to
proposed an approach for estimating robustness using approximate model-counting, with PAC-style guarantees. Their approach scaled todigits datasets; our exact approach scales to digits datasets in Section 6.
Our algorithm is based on computing the set of functions , which are the Boolean functions representing all positive instances that have robustness or higher. That is, represents all instances where and where . First, . For we have:
where denotes the conditioning of on value , i.e., the function that we would obtain by setting to true (replace every occurrence of with true, and in the case of replace with false). Say that is an instance of , and thus and remains no matter how we set . By taking the conjunction across all variables , we obtain all instances whose output would not flip after flipping any single feature . Next, consider the robustness of the instances of Some of these instances will become -robust with respect to These instances are in turn -robust with respect to the original function . More generally, we can compute from :
We can now compute the functions representing all of the -robust examples of , via:
The model count of , denoted by , is the number of instances satisfying a Boolean function . We can then compute the model robustness of by:
Consider now the most robust instances of a function .
Definition 4 (Maximum Robustness).
Consider a Boolean classification function where is non-trivial. The maximum robustness of is defined as:
Note that the instances of is a subset of the instances of , as computed in Algorithm 1. Hence, the model count of will decrease as we increase . At a large enough , then and hence also , will have no models and will equal false. At this point, we know is the maximum robustness, and we can stop Algorithm 1 early. Moreover, the largest non-false gives us the set of examples that are the most robust (requires the most number of features to flip).141414Note that if is a non-trivial Boolean function, then must be false. Suppose were not false, and that is an instance of . This means we can flip any and all variables of , and it would always be an example of . This implies that must have been true, and hence a trivial function.
Finally, we observe that model-based robustness appears to be computationally more difficult than instance-based robustness. In particular, the model-robustness over positive instances can be shown to be a PP-hard problem.
D-POS-MODEL-ROBUST: Given function , and integer , is ?
D-POS-MODEL-ROBUST is PP-hard.
To compute model-based robustness using Algorithm 1, we must be able to negate, conjoin and condition on Boolean functions, as well as compute their model count. Given a circuit represented as an SDD, operations such as negation and counting the models of an SDD can be done in time linear in the size of the SDD. Conjoining two SDDs of size and takes time although a sequence of conjoin operations may still take exponential time, as in Algorithm 1.
6 A Case Study
We next provide a case study in explaining and verifying a convolutional neural network via knowledge compilation.
6.1 (Binary) Convolutional Neural Networks
In our case study, we consider binary convolutional neural networks (binary CNNs).151515 A number of binary variations of neural networks have been proposed in the literature. The XNOR-Networks of are another binary variation of CNNs, which also assumes binary weights. The binarized neural networks (BNNs) of
A number of binary variations of neural networks have been proposed in the literature. The XNOR-Networks of
are another binary variation of CNNs, which also assumes binary weights. The binarized neural networks (BNNs) of have binarized parameters and activations. In work closely related to ours,  studied the verification of BNNs, using SAT solvers, as discussed in Section 1. That is, if we assume binary inputs and step activations, then the outputs of all neurons are binary, and the output of the network itself is also binary. Hence a binary CNN represents a Boolean function. We can construct a Boolean circuit representing a binary CNN, and then compile it to a tractable one as described in Section 4.
Our binary CNNs contain three types of layers:
convolution + step layers: a convolution layer consists of a set of filters, that can be used to detect local patterns in an input image. Typically, a ReLU unit is applied to the output of a filter. In a binary CNN, we assume step activations (whose parameters are trained first using sigmoid activations, then replacing them with step activations);
a max-pooling layer can be used to reduce the dimension of an image, helping to reduce the overall computational and statistical demands. In a binary CNN, if the inputs of a max-pooling layer isor , then the “max” reduces to a logical “or”;
fully-connected layers: if the inputs are binary and if we use step activations, then each neuron represents a Boolean function, as in Section 3.
6.2 Experimental Setup
We consider the USPS digits dataset of handwritten digits, consisting of pixel images, which we binarized to black and white 
. Here, we performed binary classification using different pairs of digits. We first trained a CNN using sigmoid activations, using TensorFlow. We replaced the sigmoid activations with step activations, to obtain a binary CNN that we compiled into a tractable circuit. In particular, we compiled the binary CNN into a Sentential Decision Diagrams (SDD). We shall subsequently provide analyses of the binary CNN, via queries on the SDD.
More specifically, we created two convolution layers, each with stride size. We first swept a filter on the original image (resulting in a grid), followed by a second filter (resulting in a grid). These outputs were the inputs of a fully-connected layer with a single output. We did not use max-pooling as the dimension was reduced enough by the convolutions. Finally, we optimized a sigmoid cross-entropy loss using the Adam optimizer.
6.3 Explaining Decisions
We consider how to explain why a neural network classified a given instance positively or negatively. In particular, we consider prime-implicant explanations (PI-explanations), as proposed by ; see also [21, 10]. Say that an input image is classified positively, i.e., as a digit-1. A PI-explanation returns the smallest subset of the inputs in that render the remaining inputs irrelevant. That is, once you fix the pixel values , the values of the other pixels do not matter—the network will always classify the instance as a digit-1.161616Consider in contrast “Anchors,” recently proposed by [32, 33]. An anchor for an instance is a subset of the instance that is highly likely to be classified with the same label, no matter how the missing features are filled in (according to some distribution). In contrast, PI-explanations are exact.
We first trained a CNN to distinguish between digit-0 and digit-1 images, which achieved accuracy. Next, we took one correctly classified instance of each digit from the test set, shown in Figures 4(a) & 4(b). The shortest PI-explanations for these two images are displayed in Figures 4(c) & 4(d). In Figure 4(c), the PI-explanation consists of three white pixels. Once we fix these three pixels, the network will always classify the image as a digit-0, no matter how the pixels in the gray region are set. Similarly, Figure 4(d) sets three black patches of pixels to the left and right, and sets two center pixels to white, which is sufficient for the network to always classify the image as a digit-1.
The guarantees of these PI-explanations are strong: the pixels in the gray region can be manipulated in any way and the classification would still not change. In fact, these guarantees are so strong that one can easily create counterexamples to fool the network. In Figures 4(e) & 4(f), we fill in the remaining pixels in such a way that the digit-0 image looks like a digit-1, and vice versa. The network classifies these new images incorrectly because it is misled by the subset of pixels shown in the PI-explanation of Figures 4(c) & 4(d). Using this method, we can generate a large number of counterexamples very quickly. Experiments based on other pairs of digits found similar results.
6.4 Explaining Model Behavior
To explain the network’s behavior as a whole (and not just per instance), we provide two visualizations of how each pixel contributes to the classification decision: a marginal grid and a unateness grid. Figure 5(a) is a marginal grid
, which highlights the marginals of the output neuron, i.e., the probability that each pixel is white given that the output of the network is digit-1. In general, it is intractable to compute such marginals, which naively entails enumerating allpossible input images and then checking the network output. If we can compile a neural network’s Boolean function into a tractable circuit, like an SDD, then we can compute such marginals in time linear in the size of the circuit.
In Figure 5(a), red pixels correspond to marginals greater than , and redder pixels are closer to one. Blue pixels correspond to marginals less than , and bluer pixels are closer to zero. The grid intensities have been re-scaled for clarity. Not surprisingly, we find that if the output of the network is high (indicative of a digit-1), then it is somewhat more likely that the pixels in the middle are set to white.
Figure 5(b) is a unateness grid, which identifies pixels that sway the classification in one direction only. Red pixels are positively unate (monotone), so turning them from off to on can only flip the classification from digit-0 to digit-1. Blue pixels are negatively unate, i.e., turning them from off to on can only flip the classification from digit-1 to digit-0. Black pixels are ignored by the network completely. Finally, gray pixels do not satisfy any unateness property. In general, determining whether an input of a Boolean function is unate/monotone or unused are computationally hard problems. In tractable circuits such as SDDs, they are queries that can be performed in time polynomial in the circuit size.
In Figure 5(b)
, the majority of pixels are unate (monotone), suggesting that the overall network behavior is still relatively simple. We also observe that there are many unused pixels on the right and bottom borders, which can be explained by the lack of padding (i.e., given the filter size and stride length, no filter takes any of these pixels as inputs). There is another block of unused pixels closer to the middle. On closer inspection, we find that these pixels are unique to one particular filter in the second convolution layer (no other filter depends on their values). In the tractable circuit of the output neuron, we find that the circuit does not essentially depend on the output of this filter. Thus, the output of the network does not depend on the values of any of these pixels. Note that deciding whether an input of a neuron is unused is an NP-hard problem.171717This reduction is similar to the one showing that compiling a linear classifier is NP-hard . However, given a tractable circuit such as an SDD, this question can be answered in time linear in the size of the circuit.
We emphasize a few points now. First, this (visual) analysis is enabled by the tractability of the circuit, which allows marginals to be computed and unate pixels to be identified efficiently. Second, the analysis also emphasizes that the network is not learning the conceptual differences between a digit-0 and a digit-1. It is identifying subsets of the pixels that best differentiate between images of digit-0 and digit-1 from the training set with high accuracy. This perhaps explains why it is sometimes easy to “fool” neural networks, which we demonstrated in Figure 5.
6.5 Analyzing Classifier Robustness
Next, we provide a case study in analyzing CNNs based on their robustness. We consider the classification task of discriminating between a digit-1 and a digit-2. First, we trained two CNNs with the same architectures (as described earlier), but using two different parameter seeds. We achieved 98.18% (Net 1) and 96.93% (Net 2) testing accuracies. The SDD of Net 1 had 1,298 nodes and a size of 3,653. The SDD of Net 2 had 203 nodes and a size of 440.181818The size of a decision node in an SDD is the number of its children. The size of an SDD is the aggregate size of its nodes. Net 1 obtained a model-robustness of 11.77 but Net 2 only obtained a robustness of 3.62. For Net 2, this means that on average, 3.62 pixel flips are needed to flip a digit-1 classification to digit-2, or vice versa. Moreover, the maximum-robustness of the Net 1 was 27, while that of Net 2 was only 13. For Net 1, this means that there is an instance that would not flip unless you flipped (the right) 27 pixels. These are two networks which are similar in terms of accuracy (differing by only 1.25%), but very different when compared by robustness.
Figure 7 further highlights the differences between these two networks by the level of robustness . On the -axis, we increase the level of robustness (up to the max of 27), and on the -axis we measure the proportion of instances having robustness , i.e., we plot , as in Section 5. Clearly, the first network is able to more robustly classify a larger number of instances. Given two networks with comparable accuracies, we clearly prefer the one that is more robust, as it would be more resilient to adversarial perturbations and to noise. When we compute the average instance-based robustness of testing instances, Net 1 obtains an average of 4.47, whereas Net 2 obtains a lower average of 2.61, as expected.
Next, we consider in more depth Net 2, which again had a test set accuracy of First, we visualize the most robust and the least robust instances of the CNN. Figures 7(a) & 7(b) depict an example of a most robust digit-1 and digit-2, from the testing set. Similarly Figures 7(c) & 7(d) depict an example of a least robust digit-1 and digit-2, both having robustness 1. For these latter two instances, it suffices to flip a single pixel in each image, for the classifier to switch its label. These perturbations are given in Figures 7(e) & 7(f). Finding training examples that have low-robustness can help finding problematic or anomalous instances in the dataset, or otherwise indicate weaknesses of the learned classifier. Finding training examples that have high-robustness provides an insight into which instances that the classifier considers to be prototypical of the class.
In this section, we empirically evaluate our approach for compiling a neural network into an SDD. First, we take a given neural network trained from data, and encode it as an NNF, as described in Section 3. Next, we compile this NNF into an SDD, using the simple circuit compiler described in Section 4.191919We used a few additional optimizations in our experiments. First, we labeled each NNF node by its maximum distance from the output. We then compiled the NNF nodes by depth, deepest first. This helps control the scope of the sub-functions that need to be maintained. Next, we invoked SDD minimization relatively aggressively, any time the number of live SDD nodes exceeded . Finally, the initial vtree used for compilation was taken from the final vtree of a previously compiled neural network.202020The circuit compiler is available open-source at https://github.com/art-ai/nnf_compiler We consider binary CNNs learned from a binarized MNIST dataset, downsampled to from . Experiments were run on a Linux system with a 2.40GHz Intel i7-7560U CPU and access to 16GB RAM. The CNN has 196 inputs, a filter with stride leading to hidden nodes, and one fully-connected layer. We minimized a sigmoid cross-entropy loss with an Adam optimizer. To speed up training and testing, we only used 1,024 examples from each set; note that the the algorithm to compile a neural network to SDD does not need to see the training set.
Table 1 summarizes our results, across all pairs of digits. First, we observe that the neural networks we trained generally had high accuracy. Next, we see that across all pairs, the node count and edge count of the compiled SDDs were relatively small! This suggests that the neural networks trained from this dataset are not particularly complex. Further, the compilation times were within 1 minute for all pairs ! This is very surprising, when compared to the preliminary results of  which could only compile neural networks learned from images, in contrast to the images that we used here. The main difference here is in the use of a pseudo-polynomial time algorithm for compiling neurons into OBDDs (here, we kept two digits of precision from the network parameters). Finally, we report the average and maximum robustness of instances in the test set. We remark that these numbers are relatively low: on average, it suffices to flip a few pixels to flip the output of the neural network. It is likely, because of this low robustness, that these neural networks did not truly learn the concept of a digit, but found some pattern in the training data that allowed it to achieve high accuracy.
In this paper, we proposed a knowledge compilation approach for explaining and verifying the behavior of a neural network. We considered in particular neural networks with 0/1 inputs and step activation functions. Such networks have neurons that correspond to Boolean functions. The network itself also corresponds to a Boolean function, which represents how a neural network labels an input feature vector with a class. We showed how to compile the Boolean function of each neuron and the network itself into a tractable circuit, and into an Sentential Decision Diagram (SDD) in particular. Further, we developed new queries and algorithms for analyzing the robustness of a Boolean function. In a case study, we explained and analyzed the robustness of binary convolutional neural networks (binary CNNs) for classifying handwritten digits. Finally, in our experiments, we showed that it is indeed feasible to compile neural networks with over a hundred features into SDDs.
Appendix A Proof of Theorem 2
Consider a neuron with inputs . Setting the inputs results in a smaller sub-classifier (or sub-neuron) over inputs . No matter how we set the inputs the resulting sub-classifier is identical (has the same weights) except for the threshold being used. Two different settings of inputs may lead to identical sub-classifiers with the same threshold. Say we set variables from to . There are at most possible valid thresholds, so there are at most possible sub-classifiers that we can see while setting variables.
Consider an matrix where cell is associated with the sub-classifier where variable is about to be set, and where is the threshold being used. If is set to , we obtain the sub-classifier at where is the (integer) weight of feature . If is set to , we obtain the sub-classifier at Each cell thus represents an OBDD node for the corresponding sub-problem, whose hi- and lo-children are known. We add an -th layer where every sub-classifier with threshold below 0 is and where every sub-classifier at or above 0 is . The root of the original neuron’s OBDD is then found at , which we can extract and reduce if needed.
The size of matrix bounds the size of the OBDD to nodes. Further, it takes constant time to populate each entry, and hence time to construct the OBDD.
-  (2010) How to explain individual classification decisions. Journal of Machine Learning Research 11, pp. 1803–1831. Cited by: §1.
-  (2019) Quantitative verification of neural networks and its security applications. In Proceedings of the 26th ACM Conference on Computer and Communications Security (CCS), pp. 1249–1264. Cited by: footnote 13.
-  (2016) SDDs are exponentially more succinct than OBDDs. In AAAI, pp. 929–935. Cited by: §4.
-  (1986) Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers C-35, pp. 677–691. Cited by: §3.
-  (1997) A survey on knowledge compilation. AI Commun. 10 (3-4), pp. 137–150. Cited by: §1.
Reasoning about Bayesian network classifiers. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI), pp. 107–115. Cited by: Appendix A, §1, §3, §3, §3, §3, §3.
-  (2013) Dynamic minimization of sentential decision diagrams. In Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Cited by: §4.
-  (2019) Compiling neural networks into tractable Boolean circuits. In AAAI Spring Symposium on Verification of Neural Networks (VNN), Cited by: §3, §7.
-  (2020) Interpretability of bayesian network classifiers: obdd approximation and polynomial threshold functions. In International Symposium on Artificial Intelligence and Mathematics (ISAIM), Cited by: footnote 5.
-  (2020) On the reasons behind decisions. In Proceedings of the 24th European Conference on Artificial Intelligence (ECAI), Cited by: §1, §1, §3, §6.3.
-  (2002) A knowledge compilation map. JAIR 17, pp. 229–264. Cited by: §1, §4.
-  (2001) Decomposable negation normal form. Journal of the ACM 48 (4), pp. 608–647. Cited by: §4.
-  (2001) On the tractable counting of theory models and its application to truth maintenance and belief revision. Journal of Applied Non-Classical Logics 11 (1-2), pp. 11–34. Cited by: §4.
-  (2003) A differential approach to inference in Bayesian networks. J. ACM 50 (3), pp. 280–305. Cited by: §4.
-  (2011) SDD: a new canonical representation of propositional knowledge bases. In Proceedings of IJCAI, pp. 819–826. Cited by: §4.
-  (2014) Tractable knowledge representation formalisms. In Tractability: Practical Approaches to Hard Problems, pp. 141–172. Cited by: §1.
-  (1997) Boosting and naive Bayesian learning. Cited by: §3.
-  (2014) Explaining and harnessing adversarial examples. CoRR abs/1412.6572. External Links: Cited by: §5.
-  (2016) Binarized neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115. Cited by: footnote 15.
-  (1994) A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence 16 (5), pp. 550–554. Cited by: §6.2.
-  (2019) Abduction-based explanations for machine learning models. In Proceedings of the Thirty-Third Conference on Artificial Intelligence (AAAI), pp. 1511–1519. Cited by: §1, §6.3.
-  (2019) On relating explanations and adversarial examples. In Advances in Neural Information Processing Systems 32 (NeurIPS), pp. 15857–15867. Cited by: §1, §1.
-  (2017) Reluplex: an efficient SMT solver for verifying deep neural networks. In Computer Aided Verification CAV, pp. 97–117. Cited by: §1.
-  (2018) Automated verification of neural networks: advances, challenges and perspectives. CoRR abs/1805.09938. Cited by: §1, §5.
-  (2018) The mythos of model interpretability. Commun. ACM 61 (10), pp. 36–43. Cited by: §1.
-  (1998) Algorithms and data structures in VLSI design: OBDD — foundations and applications. Springer. Cited by: §3.
-  (2016) DeepFool: A simple and accurate method to fool deep neural networks. In , pp. 2574–2582. Cited by: §5.
-  (2018) Verifying properties of binarized deep neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), Cited by: §1, footnote 12, footnote 15.
-  (2016) Solving PPPP-complete problems using knowledge compilation. In Proceedings of the 15th International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 94–103. Cited by: §4, §4.
XNOR-Net: imagenet classification using binary convolutional neural networks. In Proceedings of the 14th European Conference on Computer Vision (ECCV), pp. 525–542. Cited by: footnote 15.
-  (2016) ”Why should i trust you?”: explaining the predictions of any classifier. In Knowledge Discovery and Data Mining (KDD), Cited by: §1.
-  (2016) Nothing else matters: model-agnostic explanations by identifying prediction invariance. In NIPS Workshop on Interpretable Machine Learning in Complex Systems, Cited by: footnote 16.
-  (2018) Anchors: high-precision model-agnostic explanations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), Cited by: §1, footnote 16.
-  (1996) Knowledge compilation and theory approximation. J. ACM 43 (2), pp. 193–224. Cited by: §1.
-  (2018) A symbolic approach to explaining Bayesian network classifiers. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1, §1, §1, §3, §3, §6.3, footnote 17.
-  (2018) Formal verification of Bayesian network classifiers. In Proceedings of the 9th International Conference on Probabilistic Graphical Models (PGM), Cited by: §1, §3, §5, §5.
-  (2019) Verifying binarized neural networks by angluin-style learning. In SAT, Cited by: §1.
-  (2013) Intriguing properties of neural networks. CoRR abs/1312.6199. External Links: Cited by: §5.
-  (2015) On the role of canonicity in knowledge compilation. In AAAI, Cited by: footnote 9.
-  (2000) Branching programs and binary decision diagrams. SIAM. Cited by: §3.
-  (2012) Basing decisions on sentences in decision diagrams. In AAAI, pp. 842–849. Cited by: §4.