Deep Neural Networks (DNNs) have been applied to a variety of domains and achieved great success. Reliance on DNNs’ decisions makes their behavior reliability of high importance. Recent research has shown that the safety of DNNs is threatened by their susceptibility to human-imperceptible adversarial perturbations [14, 4, 1].
To explore the adversarial robustness of neural networks, two aspects have been considered: crafting adversarial examples and automatic verification. Given an input sample, adversarial example generation techniques [13, 2, 10, 15, 6] fail to guarantee that no adversarial example exists around the given input, when they cannot generate adversarial examples for it. The efforts in automatic verification mainly focus on the guarantee of local robustness [9, 5, 16, 11, 3], i.e., the robustness of an input’s neighborhood. These verification approaches can provide a rigorous local robustness proof if adversarial examples do not exist in a local region. However, the local robustness only takes a small part of the input space into account, and thus cannot guarantee reliability of the whole network for every possible input.
Some attempts have been made towards the verification and evaluation of global robustness, i.e., finding out whether no adversarial example exists in the input space of a network [12, 6]. Though the SMT/SAT-based method in  takes global robustness into account, the definition for global robustness in  cannot be satisfied by the inputs near the classification boundary. In other words, no network can satisfy this definition. The technique developed in  evaluates the local robustness of each sample in a test dataset and treats the expected value of evaluation results as the indicator of “global robustness”. The technique in 
can be considered as finding expected maximum safe radius over the test dataset. Thus, the selection of the test dataset directly influences the estimation in, and the global robustness cannot be formally guaranteed in general. We can easily identify two stumbling blocks on the path of global robustness verification: the complex activation patterns and large input space. It is computationally prohibited to analyze all possible activation patterns or traverse input space to guarantee global robustness. Thus existing testing and verification techniques are infeasible to handle the global robustness verification for DNNs.
In this paper, we develop a feasible global verification framework with three components: 1) a novel rule-based “back-propagation” which is used for mapping classification rules from output to input to find which input region is responsible for the corresponding class assignment111Note that this “back-propagation” is entirely different from typical use of back-propagation for evaluating the gradient with respect to the weight parameters in DNNs., 2) a new network design Sliding Door Network (SDN)
that enables feasible rule-based “back-propagation”, 3) a region-based global robustness verification (RGRV) approach by finding “adversarial regions”. Particularly, we address the “two stumbling blocks” by two means. Firstly, we design a new activation functionSliding Door Activation (SDA), with which the number of possible activation patterns is dramatically reduced to circumvent the complexity issue. Secondly, instead of treating a single input as the foundation “atom” of global robustness analysis, we cluster the input space into multiple classification regions to address the input space explosion challenge. To the best of our knowledge, this is the first work that can achieve global robustness formally with only slight drop of classification accuracy compared with classic DNNs. We evaluate the effectiveness of our framework on the MNIST  dataset. We also design a synthetic case study to show the feasibility of our global verification method.
The rest of this paper is structured as follows. We introduce the rule-based “back-propagation” in Section 2. The network design and the corresponding rule-based back-propagation method are described in Section 3. Section 4 presents the RGRV approach. We evaluate the usefulness of SDN and effectiveness of RGRV in Section 5. Section 6 summarizes our work.
For the convenience of presentation, each traditional layer () is treated as two virtual layers: pre-activation and activation layer, denoted by and , respectively. An example is shown in Figure 1, where the activation layers are and the pre-activation ones are . The
-th neurons inand are denoted as and , respectively; the weights and the corresponding biases connecting and are represented as and ; the activation function of is
, such as ReLU or softmax.
2 Rule-based back-propagation for DNNs
Due to the colossal input space of DNNs, exhausting all possible inputs with traditional testing methods is infeasible. Thus we develop a family of classification rules to divide the input space into several regions. These regions can simplify the global robustness verification significantly. The classification rules in the input space could be achieved by the proposed rule-based back-propagation, as elaborated below. Before introducing the classification rules in detail, we first present a warm up example.
Considering the one-layer network in Figure 2(a), a classification rule in the output space is () which represents a blue region in output space shown as Figure 2(b). The back-propagation we proposed aims for mapping classification rules to the input space. For example, the activation pattern “all neurons are active” means that (). As (), () is equivalent to () and () is equivalent to (). Thus the mapping result of output space () to input space is (). If we change the activation pattern to “ is active and is inactive”, the equivalent condition of this activation pattern is (), because ReLU assigns 0 to . Thus () is equivalent to (); () is equivalent to (); the mapping result is (). Obviously, the activation pattern determines the mapping result. The mapping result () represents a blue region in input space shown in Figure 2(c). We name this blue region as classification region, indicating its responsibility to the class assignment.
With the intuition from the warm up example, we now elaborate the rule-based back-propagation layer by layer for deep neural networks.
There are many inequations recorded as in . These inequations make up the disjunctive normal form
which describes how neural networks classify the inputs. For simplicity, if we select ain this will be denoted as . We can easily provide the classification rules in output space. For example, “the output belongs to class ” is where s are the output values and is the number of classes for output. To obtain the classification rules in input space, we take a typical DNN222The activation functions in hidden layers and output layer of this DNN are ReLU and softmax, respectively. as an example and propose a back-propagating function. During the back-propagation from to , the s should be substituted and the s and s should be retained to , thus we apply our function to each instead of the conjunctive normal form. The recursive call of this function can back-propagate the classification rules layer by layer to the input space of the network. Since the output layer and the hidden layers should be treated in different ways, we divide the function into two parts: the output and hidden layer part.
The comparison rules like can be directly mapped to the corresponding pre-activation layer based on the order-preserving activation function, e.g., softmax, of the output layer. We replace every variable in inequations with the corresponding polynomial to obtain the classification rule in . Thus the output layer part is the function
and the mapping result of is , i.e., the classification rules in .
The hidden layer part is a function MAP-HIDDEN. Since each linear inequation in can be simplified into the form , we select an inequation as the input of MAP-HIDDEN to show how MAP-HIDDEN works. The hidden layers cannot be processed in the same way as output layer because of the activation patterns which determine the mapping result. We denote the set of active neurons’ indexes by and use to represent the activation pattern. For simplicity, we record the mapping result of under as MAP-FIX where is the activation pattern of . MAP-FIX is the conjunction of some classification rules in .
The function MAP-HIDDEN is shown as follows where denotes all activation patterns of :
The mapping result of MAP-HIDDEN is the disjunction of all the classification rules in . As each neuron has two activation states, there are activation patterns in where is the number of neurons in . The time cost of whole back-propagation is . Such immense time cost makes the above mapping NP-hard and infeasible.
3 Sliding Door Network for Feasible Back-propagation
To handle the complexity issue, we present a novel network design, SDN, and the corresponding rule-based back-propagation method . SDN reduces the size of by grouping the neurons in each layer to overcome the infeasibility problem in back-propagating classification rules for DNNs.
3.1 Sliding Door Network
Compared with typical DNNs, SDN has two different components: a novel activation function SDA and the loss function design for supporting SDA.
Sliding Door Activation. SDA takes a pre-activation layer into account and divides neurons into several groups evenly. For example, the layer in Figure 3 with 10 neurons is divided into 5 groups which are represented as . These groups are classified by SDA into three categories: active group with all positive neurons (e.g., and in Figure 3), inactive group in which all neurons are negative (e.g., in Figure 3), and trivial groups with mixing of both positive and negative neurons (i.e., and in Figure 3).
In order to reduce the complexity, we select the first active (inactive) group as active (inactive) door for each pre-activation layer. For example in Figure 3, and are active door and inactive door respectively. Based on the assigned doors, we define SDA as:
To increase the network expressiveness, SDA strengthens the active door by and assigns to inactive door’s neurons. Other groups are sent to the corresponding activation layer directly. During the training, for each pre-activation layer, the position of the two doors might change instantly up to the states of the groups, behaving like a sliding door, thus the name of our activation function. Figure 4 shows the entire network architecture, replacing the ReLU in classic DNNs with the proposed SDA for each layer.
Loss function design. If a pre-activation layer cannot provide active or inactive door, the expressiveness of SDN will be weakened. To avoid this issue, we design regularization term to penalize the absence of either of the two doors. If the active (inactive) door does not appear in , we will find the group () in with most active (inactive) neurons, and adjust the weights to make the negative (positive) neurons in () tend to be positive (negative) so as to create active (inactive) groups. Thus, besides the typical data fitting loss, we add a regularization term to encourage the emergence of such groups, defined as:
where denotes all the weights and biases to be trained, and is the user-given penalty parameter.
3.2 Rule-based Back-propagation for SDN
As MAP-OUT can be reused for the back-propagation of SDN’s output layer, we focus on back-propagation between hidden layers in this section. The constructing process of MAP-HIDDEN for SDN is as follows.
We denote the set of neurons in the active door of layer as , the set of neurons in the inactive door as , and other neurons are in . Considering the condition that a rule in is , and the activation pattern is fixed where , we record the mapping result of under as MAP-FIX with three components:
where we replace the neurons in belonging to with the corresponding polynomial, i.e., multiplied by due to SDA activation, the neurons in with the corresponding polynomial meanwhile remove the neurons in to obtain .
It describes the rules that “all the corresponding pre-activation neurons of are greater than 0” and we replace these pre-activation neurons with corresponding polynomial.
It describes the rules that “all the corresponding pre-activation neurons in are less than 0” and we replace these pre-activation neurons with corresponding polynomial. The function MAP-FIX is shown as follows:
Taking all the activation patterns into account, we can obtain the function MAP-HIDDEN.
The combination of MAP-OUT and MAP-HIDDEN is the complete back-propagating function :
Thus the mapping result of all the rules in is which is the collection of rules for . Each equals to where is the number of groups in . The SDN has activation patterns which is greatly less than the number of DNN’s activation patterns and its rule-based back-propagation becomes more feasible.
maps the explicit rules, but ignores the implicit rules. Implicit rules are the constraints from trivial groups guaranteeing that compared with other active (inactive) groups in the active (inactive) door has the minimal . For example in Figure 3, the implicit rules are:
The combination of explicit rules and implicit rules can be organized into the form . Each represents a classification region like the blue region in Figure 2(c). Selecting a in , a boundary of ’s classification region is . We will show that explicit rules is sufficient for global verification in Theorem 1.
4 Region-based Global Robustness Analysis
In this section, we define and address the global robustness verification problem by a region-based global robustness analysis (RGRA) approach. Firstly, we provide the definition for global robustness:
Definition 1 (Global robustness).
Given a network N, if there is no adversarial example in its input space, N is a globally robust network.
An adversarial example exists in two types of regions: 1) the region which is isolated, 2) the region which is connected to the correctly classified region. Taking Figure 5 as an example, it shows a binary classification task where the black dashed line is orale decision boundary of class and and the orange solid line is the decision boundary determined by network.
and are adversarial regions. is the small-size isolated connected component and is the protruding region which is connected to the correctly classified region. The adversarial examples belonging to but classified as tend to exist in and . Before presenting the definitions of protruding region and small-size isolated connected components, we first present a formal definition of the input space classification graph.
Definition 2 (Classification graph).
Given a result of backward-mapping in the form of , we can build a classification graph as a tuple where
. In other words, each classification region can be defined as a vertex .
. Given two vertexes and , the corresponding classification regions in the input space are and . Two vertexes are adjacent iff their classification regions are adjacent in high-dimensional space, and formally,
Definition 3 (Limiting ball).
Given a set of vertexes , the stitching of their regions is whose center of gravity is . The limiting ball of can be defined as a ball whose center is and radius .
With the description of classification graph and limiting ball, we now provide the formal definition of two types of regions in which adversarial example exist.
Definition 4 (Small-size isolated connected component).
Given a connected component in classification graph and a (), let be ’s vertexes’ limiting ball. If ’s radius is smaller than , is a small-size isolated connected component.
Definition 5 (Protruding regions).
Given a vertex and a (), let be ’s limiting ball. The classification region of is . All points with same class as in constitute a set . The volume of and are recorded as and respectively. If is not a small-size isolated connected component and , is a protruding region.
The first step of finding the “adversarial regions” is to construct the adjacency relationship between vertexes, i.e., building in classification graph. The construction can be split into two phases: 1) traverse the vertexes in classification graph, and treat () as the potential adjacent vertexes; 2) find the common boundary shared by and if they are adjacent, and provide the proof that they are not adjacent otherwise. Intuitively, a common boundary between and means that they stick together on the boundary, formally defined as follows.
Definition 6 (Common boundary).
Given two vertexes and in the classification graph, the corresponding classification regions are and . A boundary () which belongs to and is defined as a common boundary iff
where is the set of points on , is the set of points on other boundaries of and , and .
As the classification graph is an undirected graph, the first phase only explores the edge to find the potential adjacent vertex . And the following Theorem 1 implies that by traversing the of , we can find out all the () which share a common boundary with , i.e., all the adjacent vertexes.
Given and , there is a common boundary belonging to the of shared by them iff and are adjacent.
Proof: see appendix.
Finally, the global robust verification can be achieved by the following steps:
Build classification graph. We can obtain the vertexes from the back-propagation result. Each vertex has some boundaries. For each boundary of , we select some points on it randomly and sample in the tiny neighborhood of these points. By feeding the samples into SDN, we can find out their activation patterns. For example, a sample’s activation pattern represents , as this sample is in the tiny neighborhood of points on , belongs to . Thus and shares the boundary and we add edge to the edge set .
Find the limiting ball of classification regions and connected components. For a classification region , we can obtain the rough upper and lower bound of each dimension. Taking the inequation as an example, if and , the lower bound of is . All these bounds form a “box” including . We take samples () in this “box" and select the samples in . is the estimated center. Denote as the distance between and , the estimated radius is . Moreover, we can calculate the volume of the “box” , if there are samples in , . If a connected component consists classification regions s, the estimated center is and is the estimating radius where s are the samples in .
Find small-size isolated connected components and protruding regions. Given a connected component, by comparing the radius of limiting ball with the given by user, we can find out whether a connected component is a small-size isolated connected component. For a classification region belonging to class , we take samples in the limiting ball. If of them belong to , calculate and compare it with the user given . Then we can find out whether this region is a protruding region.
The evaluation of our work concentrates on three aspects: 1) the utility of SDN on classification tasks, 2) generating adversarial examples, 3) the feasibility of global verification. In the first part, we evaluate our method on MNIST database with typical DNNs as baseline. In the second part, we show the adversarial examples generated from adversarial regions. In the third part, we design a synthetic case study to show the effectiveness of RGRV.
5.1 Utility on classification task
Due to the reduction of activation patterns, the expressive ability of SDN is slightly inferior to typical DNNs with the same architecture. We empirically show the drop of classification accuracy is acceptable based on two groups of case studies. The first group is the comparison of typical DNNs and SDNs on MNIST dataset. The details of SDNs are shown as follows.
1) Each SDN has two hidden layers and ; 2) These SDNs have 16, 24, 32, 40 groups in , respectively and 12, 18, 24, 30 groups in , respectively. We name these SDNs as (16,12), (24,18), (32,24), and (40,30) based on their architecture features; 3) Each group in has four neurons, and each group in has two neurons. 4) The in these SDNs are set as 2.
Each baseline DNN has two hidden layers. The number neurons in each layer is the same as corresponding SDNs. Cross entropy loss and Adam 
are used to train all the networks 1500 epochs with batch size 256. The evaluation result is shown in Table1. Compared with typical DNNs, the accuracy of SDN only drops 2.72, 2.88, 2.89, 2.83 percent respectively. Besides, the accuracy of SDN and the sat-rate increase as the numbers of groups in each layer increase.
5.2 Generating adversarial examples
Since our verification method is aimed at global verification, i.e., finding all the adversarial regions in the input space, the generation of attacks is only a by-product. As this generation is not based on local or testing way, it is meaningless to compare its efficiency with state-of-the-art attack generation approaches like [13, 2, 10, 15, 6]. Given parameters, we show the adversarial examples in SDN (20,20)333(20,20) has 20 groups in and and each group has three neurons. (20,20) is trained on MNIST images which are resized as . All the other settings are same as the SDNs in subsection 5.1. and point out the corresponding adversarial regions.
The digits in the first line of Figure 6 classified as “2” are the adversarial examples of the digits below them classified as “0, 6, 8, 4, 9” respectively. Take as an example, the upper is in “[[18,1],[1,15]]”, denoting the activation pattern where and are active doors, and are inactive doors. By inputting “[[18,1],[1,15]]”, we can find a group of inequations returned by our algorithm, and the conjunction of these inequations represents the corresponding classification region of “[[18,1],[1,15]]”. The lower classified as is in the activation pattern “[[4,10],[1,3]]”. The classification region of “[[18,1],[1,15]]” is the protruding region found by our method which is close to “[[4,10],[1,3]]’s” classification region. Moreover, we have found out that there is only one connected component of class 2 in the input space. Obviously, it is easy to select a large number of adversarial examples in the adversarial regions.
5.3 Feasibility of precise global verification
As our work is the first global robustness verification work, there is no baseline method existing for this case study. To draw an exact conclusion whether the results of the proposed method are correct, we train a SDN on a two-dimensional synthetic dataset shown in Figure 7(a). In Figure 7(a), the big blue region at the top-right belongs to the first class and the small blue region at the lower-left which is the “noise” in this dataset belongs to the first class as well. The points in the white region belong to the second class. The classification results of the trained SDN are visually shown in Figure 7(b) where the points in the blue regions belong to the first class and the points in the white region belongs to the second class. Our method has found the adversarial regions in the orange circles. Obviously, these regions containing adversarial examples are what we do not want. The verification result shows that this SDN is not globally robust.
In this paper, we present a novel global verification framework. To the best of our knowledge, this is the first work that provides a complete solution to achieving global robust verification for neural networks. Based on the proposed rule-based back-propagation, we analyze the relationship between activation patterns and classification rules, and thus design a new network SDN. Together with a region-based global robustness analysis approach, the verification can be finished in an acceptable duration, dramatically reducing computational complexity. Our evaluation shows that the SDN can perform comparable with classic DNNs, and favourable, the global verification can be achieved, which is unattainable in the past. We hypothesise that SDN is suitable for safety-critical fields, especially for some classification tasks which are not very complex but eager for strict robustness. Developing verification framework for large-scale networks is our further research direction.
We assign serial numbers to each activation pattern to store the mapping result in a B+ tree for the convenience of global verification:
where is the number of layers, represents , () is the index of ’s active(inactive) door, and is the number of groups in . Two activation patterns and satisfy iff
Given two vertexes and , they are adjacent iff they share a common boundary.
Given two adjacent vertexes and , the corresponding classifications are and . According to the definition of adjacent in Definition 2, there is a satisfies
There must be a in on a boundary which satisfies where , and are the point set of boundary and s. Otherwise we can find a series of points and corresponding balls satisfying:
is on the boundary and
, on , and
Since we have only finite number of boundaries, here comes the contradiction. Hence we can find a in on which is a boundary of both and and a corresponding satisfies the above condition. As is on the boundary, it satisfies Thus satisfies
and is a common boundary shared by and .
The sufficient condition is obvious. ∎
Given two adjacent vertexes and , the shared common boundary of them comes from the of .
The activation pattern and of and satisfy thus
This indicates the first change of door happens on . The change of activation pattern leads to the “mutation” of neurons in . However the change of neurons in is continuous. According to the proof of Lemma 1, we can find a path in which crosses and only crosses the . Thus When the point on this path approaches the boundary, there is at most one inactive neuron corresponding to approaches 0 in . Here we consider where the comes from:
comes from . The change of sign of will not influence the activation pattern in which contradicts to “the first change of door happens on ”
comes from . If it changes the activation pattern, it contradicts to “the first change of door happens on ”. Otherwise, there must be another inactive neuron in changes the sign at the same time which contradicts to “there is at most one inactive neuron corresponding to approaches 0 in ”
comes from in . If the activation pattern does not change, here comes the contradiction to “the first change of door happens on ”. If the activation pattern changes, according to the definition of , either the index of active door or inactive door will become smaller, that is to say, the result is a activation pattern instead of . Thus we have a contradiction
We eventually find out that comes from of by a process of elimination.
-  (2017) Security evaluation of pattern classifiers under attack. CoRR abs/1709.00609. Cited by: §1.
-  (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 39–57. Cited by: §1, §5.2.
-  (2018) AI2: safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, California, USA, pp. 3–18. Cited by: §1.
-  (2015) Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), Cited by: §1.
-  (2017) Safety verification of deep neural networks. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, pp. 3–29. Cited by: §1.
-  (2017) Reluplex: an efficient SMT solver for verifying deep neural networks. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, pp. 97–117. Cited by: §1, §1, §5.2.
-  (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, External Links: Cited by: §5.1.
-  THE mnist database of handwritten digits. Note: http://yann.lecun.com/exdb/mnist/Accessed January 4, 2020 Cited by: §1.
Differentiable abstract interpretation for provably robust neural networks.
Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 3575–3583. Cited by: §1.
The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pp. 372–387. Cited by: §1, §5.2.
Reachability analysis of deep neural networks with provable guarantees.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pp. 2651–2659. Cited by: §1.
-  (2019) Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pp. 5944–5952. Cited by: §1.
-  (2019) Hybrid batch attacks: finding black-box adversarial examples with limited queries. CoRR abs/1908.07000. Cited by: §1, §5.2.
-  (2014) Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Cited by: §1.
-  (2018) Feature-guided black-box safety testing of deep neural networks. In Tools and Algorithms for the Construction and Analysis of Systems - 24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Part I, pp. 408–426. Cited by: §1, §5.2.
-  (2020) A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci. 807, pp. 298–329. Cited by: §1.