1 Introduction & Related Work
Neural networks with rectified linear unit activation functions (ReLU NNs) are arguably the most popular type of neural networks in deep learning. This type of network enjoys many appealing properties including better performance than NNs with sigmoid activation
Glorot, Bordes, and Bengio (2011), universal approximation ability Arora et al. (2018); Lu et al. (2017); Montufar et al. (2014); SchmidtHieber (2020), and fast training speed via scalable algorithms such as stochastic gradient descent (SGD) and its variants
Zou et al. (2020).Despite their strong predictive power, ReLU NNs have seen limited adoption in risksensitive settings Bunel et al. (2018). These settings require the model to make robust predictions against potential adversarial noise in the input Athalye et al. (2018); Carlini and Wagner (2017); Goodfellow, Shlens, and Szegedy (2014); Szegedy et al. (2014). The alignment between model behavior and human intuition is also desirable Liu et al. (2019): prior knowledge such as monotonicity may be incorporated into model design and training Daniels and Velikova (2010); Gupta et al. (2019); Liu et al. (2020); Sharma and Wehrheim (2020); users and auditors of the model may require a certain degree of explanations of the model predictions Gopinath et al. (2019); Chu et al. (2018).
The requirements in risksensitive settings has motivated a great amount of research on verifying certain properties of ReLU NNs. These works often exploit the piecewise linear function form of ReLU NNs. In Bastani et al. (2016)
the robustness of a network is verified in very small input region via linear programming (LP). To consider the nonlinearity of ReLU activation functions,
Ehlers (2017); Katz et al. (2017); Pulina and Tacchella (2010, 2012)formulated the robustness verification problem as a satisfiability modulo theories (SMT) problem. A more popular way to model ReLU nonlinearality is to introduce a binary variable representing the onoff patterns of ReLU neurons. Property verification can then be solved using mixedinteger programming (MIP)
Anderson et al. (2020); Fischetti and Jo (2017); Liu et al. (2020); Tjeng, Xiao, and Tedrake (2018); Weng et al. (2018).The piecewise linear functional form of ReLU NNs also creates distinct topological structures in the input space. Previous studies have shown that a ReLU NN partitions the input space into convex polytopes and has one linear model associated with each polytope Montufar et al. (2014); Serra, Tjandraatmadja, and Ramalingam (2018); Sudjianto et al. (2020). Each polytope can be coded by a binary activation code, which reflects the onoff patterns of the ReLU neurons. The number of local polytopes is often used as a measure of the model’s expressivity Hanin and Rolnick (2019); Lu et al. (2017). Built upon this framework, multiple studies Sudjianto et al. (2020); Yang, Zhang, and Sudjianto (2020); Zhao et al. (2021) tried to explain the behavior of ReLU NNs and to improve their interpretability. They viewed ReLU NN as a collection of linear models. However, the relationship among the local polytopes their linear models has not been fully investigated.
In this paper, we explore the topological relationship among the local polytopes created by ReLU NNs. We propose algorithms to identify the adjacency among these polytopes, based on which we develop traversing algorithms to visit all polytopes within a bounded region in the input space. Our paper has the following major contributions:

The polytope traversing algorithm provides a unified framework to examine the network behavior. Since each polytope contains a linear model whose properties are easy to verify, the full verification on a bounded domain is achieved after all the covered polytopes are visited and verified. We provide theoretical guarantees on the thoroughness of the traversing algorithm.

Property verification based on the polytope traversing algorithm can be easily customized. Identifying the adjacency among the polytopes is formulated as LP. Within each local polytope, the user has the freedom to choose the solver most suitable for the verification subproblem. We demonstrate that many common applications can be formulated as convex problems within each polytope.

Because the polytope traversing algorithm explicitly visits all the local polytopes, it returns a full picture of the network behavior within the traversed region and improves interpretability.
Although we focus on ReLU NN with fully connected layers through out this paper, our polytope traversing algorithm can be naturally extended to other piecewise linear networks such as those containing convolutional and maxpooling layers.
The rest of this paper is organized as follows: Section 2 reviews how polytopes are created by ReLU NNs. Section 3 introduces two related concepts: the boundaries of a polytope and the adjacency among the polytopes. Our polytope traversing algorithm is described in Section 4. Section 5 demonstrates several cases of adapting the traversing algorithm for network property verification. The paper is concluded in Section 6.
2 The Local Polytopes in ReLU NNs
2.1 The case of one hidden layer
A ReLU NN partitions the input space into several polytopes and forms a linear model within each polytope. To see this, we first consider a simple NN with one hidden layer of neurons. It takes an input and outputs by calculating:
(1) 
For problems with a binary or categorical target variable (i.e. binary or multiclass classification), a sigmoid or softmax layer is added after
respectively to convert the convert the NN outputs to proper probabilistic predictions.The ReLU activation function inserts nonlinearity into the model by checking a set of linear inequalities: , where is the th row of matrix and is the th element of . Each neuron in the hidden layer creates a partitioning hyperplane in the input space with the linear equation . The areas on two sides of the hyperplane are two halfspaces. The entire input space is, therefore, partitioned by these hyperplanes. We define a local polytope as a set containing all points that fall on the same side of each and every hyperplane. The polytope encoding function (2) uses an elementwise indicator function to create a unique binary code for each polytope. Since the th neuron is called “ON” for some if , the code also represents the onoff pattern of the neurons. Using the results of this encoding function, we can express each polytope as an intersection of halfspaces as in (3), where the binary code controls the directions of the inequalities.
(2)  
(3) 
Figure 1.(b) shows an example of ReLU NN trained on a twodimensional synthetic dataset (plotted in Figure 1.(a)). The bounded input space is and the target variable is binary. The network has one hidden layer of 20 neurons. The partitioning hyperplanes associated with these neurons are plotted as the blue dashed lines. They form in total 91 local polytopes within the bounded input space.
For a given , if , the ReLU neuron turns on and passes through the value. Otherwise, the neuron is off and suppresses the value to zero. Therefore, if we know the th neuron is off, we can mask the corresponding and by zeros and create and that satisfy (5). The nonlinear operation, therefore, can be replaced by the a locally linear operation after zeromasking. Because each local polytope has a unique neuron activation pattern encoded by , the zeromasking process in (4) is also unique for each polytope. Here,
is a vector of 1s of length
and denotes elementwise product.(4)  
(5) 
Within each polytope, as the nonlinearity is taken out by the zeromasking process, the input and output have a linear relationship:
(6) 
The linear model associated with polytope has the weight matrix
and the bias vector
. The ReLU NN is now represented by a collection of linear models, each defined on a local polytope .In Figure 1.(b), we represent the linear model in each local polytopes as a red solid line indicating . In this binary response case, the two sides of this line have the opposite class prediction. We only plot the line if it passes through its corresponding polytope. For other polytopes, the entire polytopes fall on one side of their corresponding classseparating lines and the predicted class is the same within the whole polytope. The red lines all together form the decision boundary of the ReLU NN and are continuous when passing through one polytope to another. This is a direct result of ReLU NN being a continuous model.
2.2 The case of multiple layers
We can generalize the results to ReLU NNs with multiple hidden layers. A ReLU NN with hidden layers hierarchically partitions the input space and is locally linear in each and every level polytope. Each level polytope has a unique binary code representing the activation pattern of the neurons in all hidden layers. The corresponding partitioning hyperplanes of each level, , , can be calculated recursively level by level, using the zero masking procedure:
(7)  
(8)  
(9) 
We emphasis that , , , and depend on all polytope code up to level : . The subscription is dropped to simplify the notations.
At each level , the encoding function and the polytope expressed as an intersection of halfspaces can be written recursively as:
(10)  
(11)  
(12)  
(13) 
Finally, the linear model in a level polytope is:
(14) 
Figure 1.(c) shows an example of ReLU NN with two hidden layers of size 10 and 5 respectively. The partitioning hyperplanes associated with the first 10 neuron are plotted as the blue dashed lines. They form 20 level1 polytopes within the bounded input space. Within each of the level1 polytope, the hyperplanes associated with the second 5 neurons further partition the polytope. In many cases, some of the 5 hyperplanes are outside the level1 polytope and, therefore, not creating a new subpartition. The hyperplanes do create new partitions are plotted as the orange dashed lines. The orange lines are only straight within a level1 polytope but are continuous when passing through one polytope to another, which is also a result of ReLU NN being a continuous model. In total, this ReLU NN creates 41 (level2) local polytopes. As in Figure 1.(b), the linear model within each level2 polytope is represented as a red solid line if class separation occurs within the polytope.
3 Polytope Boundaries and Adjacency
Beyond viewing ReLU NNs as a collection of linear models defined on local polytopes, we explore the topological relationship among these polytopes. A key concept is the boundaries of each polytope. As shown in (13), each level polytope with corresponding binary code is an intersection of halfspaces induced by a set of inequality constraints. Two situations can rise among these inequalities. First, an arbitrary may lead to conflicting inequalities and makes an empty set. This situation can be common when the number of neurons is much larger than the dimension of the input space. Second, there can be redundant inequalities which means removing them does not affect set . We now show that the nonredundant inequalities are closely related to the boundaries of a polytope.
Definition 3.1
Let contains all that satisfy linear inequalities: . Assume that . Let contains all ’s that satisfy linear inequalities: . Then the inequality is a redundant inequality with respect to (w.r.t.) if .
With the redundant inequality defined above, the following lemma provides an algorithm to identify them. The proof of this lemma is in the Appendix.
Lemma 3.1
Given a set , then is a redundant inequality if the new set formed by flipping this inequality is empty: .
We can now define the boundaries of a polytope formed by a set of linear inequalities using a similar procedure in Lemma3.1. The concept of polytope boundaries also leads to the definition of adjacency. Intuitively, we can move from one polytope to its adjacent polytope by crossing a boundary.
Definition 3.2
Given a nonempty set formed by linear inequalities: , then the hyperplane is a boundary of if the new set formed by flipping the corresponding inequality is nonempty: . Polytope is called oneadjacent to .
Since for each polytope the directions of its linear inequalities are reflected by the binary code, two oneadjacent polytopes must have their code differ by one bit. Figure 2.(a) demonstrates the adjacency among the local polytopes. The ReLU NN is the same as in Figure 1.(b). Using the procedure in Definition 3.2, 4 out of the 20 partitioning hyperplanes are identified as the boundaries of polytope No.0 and marked in red. The 4 oneadjacent neighbors to polytope No.0 are No.1, 2, 3, and 4; each can be reached by crossing one boundary.
As we have shown in the Section 2.2, ReLU NNs create polytopes level by level. We follow the same hierarchy to define the polytope adjacency. Assume two nonempty level polytopes, and , are inside the same level polytope, which means their corresponding code and only differs at level. We say that polytope is a level oneadjacent neighbor of if and only differs in one bit.
The condition that and only differs at level is important. In this way, the two linear inequalities associated with each pair of bits in and have the same coefficients, and the difference in and only changes the direction of the linear inequality. On the other hand, if the two codes differ at a level , then according to the recursive calculation in (8) and (9), the codes starting from level will correspond to linear inequalities of different coefficients, leaving our Definition 3.2 of adjacency not applicable.
Figure 2.(b) demonstrates the hierarchical adjacency among the local polytopes. The ReLU NN is the same as in Figure 1.(c). Level1 polytopes and are both (level1) oneadjacent to . Within the level1 polytope , Level2 polytopes and are (level2) oneadjacent to each other. Similarly, we can identify the level2 adjacency of the other two pairs and . Note that in the plot, even thought one can move from polytope to by crossing one partitioning hyperplane, we do not define these two polytopes as adjacent, as they lie into two different level1 polytopes.
4 Polytope Traversing
4.1 The case of one hidden layer
The adjacency defined in the previous section provides us an order to traverse the local polytopes: starting from an initial polytope , visiting its all oneadjacent neighbors, then visiting all the neighbors’ neighbors and so on.
This algorithm can be viewed as breathfirst search (BFS) on a polytope graph. To create this graph, we turn each polytope created by the ReLU NN into a node. An edge is added between each pair of polytopes that are oneadjacent to each other. The BFS algorithm uses a queue to keep track the traversing progress. At the beginning of traversing, the initial polytope is added to an empty queue and is marked as visited afterwards. In each iteration, we pop out the first polytope from the queue and identify all of its oneadjacent neighbors. Among these identified polytopes, we add those that have not been visited to the back of the queue and mark them as visited. The iteration stops when the queue is empty.
The key component of the polytope traversing algorithm is to identify a polytope’s oneadjacent neighbors. For a polytope coded by of bits, there are at most oneadjacent neighbors with codes corresponding to flipping one of the bits in . Each valid oneadjacent neighbor must be nonempty and can be reached by crossing a boundary. Therefore, we can check each linear inequality in (3) and determine whether it is a boundary or redundant. Some techniques of identifying redundant inequalities are summarized in Telgen (1983). By flipping the bits corresponding to the identified boundaries, we obtain the codes of the oneadjacent polytopes.
Equivalently, we can identify the oneadjacent neighbors by going through all candidate codes and selecting those corresponding to nonempty sets. Checking the feasibility of a set constrained by a set of linear inequalities is often referred to as the “PhaseI Problem” of LP and can be solved efficiently by modern LP solvers. During BFS iterations, we can hash the checked codes to avoid checking them repetitively. The BFSbased polytope traversing algorithm is summarized in Algorithm 1. We now state the correctness of this algorithm with its proof in Appendix.
Theorem 4.1
Algorithm 1 visits all the local polytopes created by a ReLU NN within . The time complexity is exponential to the number of neurons, as all possible activation patterns are checked once in the worstcase scenario. The space complexity is also exponential to the number of neurons as we hash all the checked activation patterns. Furthermore, for each activation pattern, we solve a phaseI problem of LP with inequalities in . Traversing all local polytopes in , therefore, becomes intractable for neural networks with a large number of neurons.
Fortunately, traversing in is usually undesirable. Firstly, a neural network may run into extrapolation issues for points outside the sample distribution. The polytopes far away from the areas covered by the samples are often considered unreliable. Secondly, many reallife applications, to be discussed in Section 5, only require traversing within small bounded regions to examine the local behavior of a model. In the next section, we introduce a technique to improve the efficiency when traversing within a bounded region.
4.2 Polytope traversing within a bounded region
We first consider a region with each dimension bounded independently: , . These linear inequalities creates a hypercube denoted as . During the BFSbased polytope traversing, we repetitively flip the direction of one of the inequalities to identify the oneadjacent neighbors. When the bounded region is small, it is likely that only a small number of the hyperplanes cut through the hypercube. For the other hyperplanes, the entire hypercube falls onto only one side. Flipping to the other sides of these hyperplanes would leave the bounded region. Therefore, at the very beginning of polytope traversing, we can run through the hyperplanes to identify those cutting through the hypercube. Then in each neighbor identifying step, we only flip these hyperplanes.
To identify the hyperplanes cutting through the hypercube, we denote the two sides of a hyperplane and : and . If neither nor is empty, we say the hyperplane cuts through . and are both constrained by inequalities, checking their feasibility can again be formulated as a phaseI problem of LP. We name this technique hyperplane prescreening and summarize it in algorithm 2.
Hyperplane prescreening effectively reduces the complexity from to , where is the number of hyperplanes cutting through the hypercube. The number corresponds to the worstcase scenario. Since the BFSbased traversing only checks nonempty polytopes and their potential oneadjacent neighbors, the number of activation patterns actually checked can be less than . In general, the fewer hyperplanes go through the faster polytope traversing finishes.
Figure 2.(a) shows traversing the 8 local polytopes within the bounded region. The ReLU NN is the same as in Figure 1.(b). The lines marked in red are the hyperplanes cutting through the bounded region and are identified by the prescreening algorithm. The evolution of the BFS queue is shown in Figure 2.(c). The gray arrows show the traversing order. The colored arrows at the bottom indicate the oneadjacent neighbors added to the queue. When polytope No.0 is popped from the queue, its oneadjacent neighbors, No.1, 2, 3, and 4, are added to the queue. Next, when polytope No.1 is popped, its oneadjacent neighbors, No.5 and 6, are added. Polytope No.0, although as a oneadjacent neighbor to No.1, is ignored since it has been visited. Similarly, when polytope No.2 is popped, only one of its oneadjacent neighbors, No. 7, is added, since all others have been visited (including those in the queue). The algorithm finished after popping Polytope No.8 as no new polytopes can be added and the queue is empty. All 8 local polytopes in the bounded region are traversed.
Because is bounded by a set of linear inequalities, the correctness of BFSbased polytope traversing as stated in Theorem 4.1 can be easily extended to this bounded traversing case. It can be proved by showing that for any two nonempty polytopes overlapped with , we can move from one to another by repetitively finding a oneadjacent neighbor within . We emphasis that the correctness of BFSbased polytope traversing can be proved for any traversing region bounded by a set of linear inequalities. This realization is critical to generalize our results to the case of ReLU NNs with multiple hidden layers. Furthermore, as any closed convex set can be represented as the intersection of a set of (possibly infinite) halfspaces, the correctness of BFSbased polytope traversing is true for any closed convex .
4.3 Hierarchical polytope traversing in the case of multiple hidden layers
The BFSbased polytope traversing algorithm can be generalized to ReLU NNs with multiple hidden layers. In section 2.2, we described how a ReLU NN with hidden layers hierarchically partition the input space into polytopes of different level. Then in section3, we showed the adjacency of level polytopes is conditioned on all of them belonging to the same level polytope. Therefore, to traverse all level polytopes, we need to traverse all level polytopes and within each of them traversing the subpolytopes by following the oneadjacent neighbors.
The procedure above leads us to a recursive traversing scheme. Assume a ReLU NN with L hidden layers and a closed convex traversing region . Starting from a sample , we traverse all level1 polytopes using the BFSbased algorithm. Inside each level1 polytope, we traverse all the contained level2 polytopes, so on and so forth until we reach the levelL polytopes. As shown in (13), each level polytope is constrained by linear inequalities, the way to identify level oneadjacent neighbors is largely the same as what we have described in Section 4.1. Two level oneadjacent neighbors must have the same linear inequalities corresponding to , and have one of the last inequalities differ in direction, so there are cases to check.
We can use hyperplane prescreening at each level of traversing. When traversing the level polytopes within in a level polytope , we update the bounded traversing region by taking the intersection of and . We then screen the partitioning hyperplanes and only select those passing through this update traversing region.
The BFSbased hierarchical polytope traversing algorithm is summarized in Algorithm 3. The correctness of this algorithm can be proved based on the results in Section 4.2, which guarantees the thoroughness of traversing the level polytopes within in any level polytope. Then the overall thoroughness is guaranteed because each level of traversing is thorough. We state the result in the following theorem.
Theorem 4.2
Figure 2.(b) shows traversing the 6 local polytopes within the bounded region. The ReLU NN is the same as in Figure 1.(c). The evolution of the hierarchical BFS queue is shown in Figure 2.(d). The level1 BFS queue is shown vertically while the level2 BFS queue is shown horizontally. Starting from level1 polytope , the algorithm traverses the two level2 polytopes inside it (line 10 in Algorithm 3). It then identifies the two (level1) oneadjacent neighbors of : and . Every time a level1 polytope is identified, the algorithm goes into it to traverse all the level2 polytopes inside (line 36). At the end of the recursive call, all 6 local polytopes in the bounded region are traversed.
5 Network Property Verification Based on Polytope Traversing
The biggest advantage of the polytope traversing algorithm is its ability to be adapted to solve many different problems of practical interest. Problems such as local adversarial attacks, searching for counterfactual samples, and local monotonicity verification can be solved easily when the model is linear. As we have shown in Sections 2.2, the local model within each level polytope created by a ReLU NN is indeed linear. The polytope traversing algorithm provides a way to analyze not only the behavior of a ReLU NN at one local polytope but also the behavior within the neighborhood, and therefore enhances our understanding of the overall model behavior. In this section, we describe the details of adapting the polytope traversing algorithm to verify several properties of ReLU NNs.
5.1 Local Adversarial Attacks
We define the local adversarial attack problem as finding the perturbation within a bounded region such that the model output can be changed most adversarially. Here, we assume the model output to be a scalar in
and consider three regression cases with different types of response variable: continuous, binary, and categorical. The perturbation region is a convex set around the original sample. For example, we can allow certain features to increase or decrease by certain amount; or we can use a norm (
, , ) ball centered at the original sample.In the continuous response case, the onedimensional output after the last linear layer of a ReLU NN is directly used as the prediction of the target variable. Denote the model function as , the original sample as , and the perturbation region as . The local adversarial attack problem can be written as:
(15) 
which means we need to find the range of the model outputs on . We can traverse all local polytopes covered by , finding the model output range within each intersection , then aggregating all the local results to get the final range. Finding the output range within each is a convex problem with linear objective function, so the optimality can be guaranteed within each polytope. Because our traversing algorithm covers all polytopes overlapped with , the final solution also has guaranteed optimality.
In the case of binary response, the onedimensional output after the last linear layer of a ReLU NN is passed through a logistic/sigmoid function to predict the probability of a sample belonging to class 1. To conduct adversarial attack, we minimize the predicted probability
if the true response is 1, and maximize the prediction if the true response is 0:(16) 
Because of the monotonicity of the logistic function, the minimizer and maximizer of the probabilistic output are the same minimizer and maximizer of the output after the last linear layer (i.e. the predicted log odds), making it equivalent to the case of continuous response.
In the case of categorical response with levels 1 to , the output after the last linear layer of a ReLU NN is in and is passed through a softmax layer to be converted to probabilistic predictions of a sample belonging to each class. The adversarial sample is generated to minimize the predicted probability of the sample being in its true class. Within each local polytope, the linear models are given by (14), and the predicted probability of class can be minimized by finding the maximizer of the following optimization problem:
(17) 
where is the th row of the matrix and is the th element in . Since the objective function in (17) is convex, the optimality of local adversarial attack with polytope traversing is guaranteed.
Figure 3.(a) demonstrates local adversarial attack in the case of regression with binary response. The ReLU NN is the same as in Figure 1.(b), which predicts the probability of a sample belong to class 1. The predictions across the whole domain are shown as the heat map. Within the region bounded by the black box, we find the minimum and maximum predictions and marked them by red and green respectively. Due to the nature of linear models, the minimizer and maximizer always fall on the intersections of partitioning hyperplanes and/or region boundaries.
5.2 Counterfactual sample generation
In classification problems, we are often interested in finding the smallest perturbation on a sample such that the model changes its class prediction. The magnitude of the perturbation is often measured by , , or norm. The optimization problem can be written as:
(18) 
where is the original sample, indicates a specific type of norm, and is a ReLU NN outputting class predictions.
We can adapt the polytope traversing algorithm to solve this problem. In the case of binary response, each local polytope has an associated hyperplane separating the two classes: , where and are given in (14), and is the threshold converting predicted log odds to class. Finding the counterfactual sample within a local polytope can be written as a convex optimization problem:
(19) 
where is the original class (0 or 1) predicted by the model.
We start the traversing algorithm from the polytope where lies. In each polytope, we solve (19). It is possible that the entire polytope fall on one side of the class separating hyperplane and (19) does not have any feasible solution. If a solution can be obtained, we compare it with the solutions in previously traversed polytopes and keep the one with the smallest perturbation. Furthermore, we use this perturbation magnitude to construct a new bounded traversing region around . Because no points outside this region can have a smaller distance to the original points, once we finish traversing all the polytopes inside this region, the algorithm can conclude. In practice we often construct this dynamic traversing region as , where is the smallest perturbation magnitude so far. When solving for (19) in the proceeding polytopes, we add to the constraints. is updated whenever a smaller is found. Because the new traversing region is always a subset of the previous one, our BFSbased traversing algorithm covers all polytopes within the final traversing region under this dynamic setting. The final solution to (18) is guaranteed to be optimal, and the running time depends on how far the original point is away from a class boundary.
In the case of categorical response with levels 1 to , the output after the last linear layer of a ReLU NN has dimensions and the dimension of the largest value is the predicted class. We ignore the softmax layer at the end because it does not change the rank of the dimensions. Assuming the original example is predicted to belong to class , we generate counterfactual samples in the rest of classes.
We consider one of these classes at a time and denote it as . Within each ReLU NN’s local polytope, the linear models are given by (14). The area where a sample is predicted to be in class is enclosed by the intersection of halfspaces:
(20) 
Therefore, within each local polytope, we solve the convex optimization problem:
(21) 
We compare all feasible solutions of (21) under different and keep the one counterfactual sample that is closest to . The traversing procedure and the dynamic traversing region update is the same as in the binary response case. Since (21) is convex, the final solution to (18) is guaranteed to be optimal.
Figure 3.(b) demonstrates counterfactual sample generation in the case of binary classification. The ReLU NN is the same as in Figure 1.(b) whose class decision boundaries are plotted in red. Given an original sample plotted as the black dot, we generate two counterfactual samples on the decision boundaries. The red dot has the smallest distance to the original point while the green dot has the smallest distance.
5.3 Local monotonicity verification
We can adapt the polytope traversing algorithm to verify if a trained ReLU NN is monotonic w.r.t. certain features. We consider the regression cases with continuous and binary response. In both cases, the output after the last linear layer is a scalar. Since the binary response case uses a logistic function at the end which is monotonically increasing itself, we can ignore this additional function. The verification methods for the two cases, therefore, are equivalent.
To check whether the model is monotonic w.r.t. a specific feature within a bounded convex domain, we traverse the local polytopes covered by the domain. Since the model is linear within each polytope, we can easily check the monotonicity direction (increasing or decreasing) by checking the sign of the corresponding coefficients. After traversing all local polytopes covered by the domain, we check their agreement on the monotonicity direction. Since a ReLU NN produces a continuous function, if the local models are all monotonically increasing or all monotonically decreasing, the network is monotonic on the checked domain. If there is a disagreement in the direction, the network is not monotonic. The verification algorithm based on polytope traversing not only provides us the final monotonicity result but also tells us in which part of the domain monotonicity is violated.
Figure 3.(c) demonstrates local adversarial attack in the case of regression with binary response. The ReLU NN is the same as in Figure 1.(b), which predicts the probability of a sample belong to class 1. The predictions across the whole domain are shown as the heat map. We check if the model is monotonically increasing w.r.t. along the horizontal axis. The domain to check is bounded by the black box. Among the 5 polytopes overlapped with the domain, one of them violates the monotonically increasing condition and is marked in red.
5.4 Comparison with algorithms based on mixedinteger programming
The three applications above have been traditionally solved using MIP Anderson et al. (2020); Fischetti and Jo (2017); Liu et al. (2020); Tjeng, Xiao, and Tedrake (2018); Weng et al. (2018). Our algorithms based on polytope traversing have several advantages. First, our method exploits the topological structure created by ReLU NNs and fully explains the model behavior in small neighborhoods. For the cases created by a ReLU NN with neurons, MIP eliminates the searching branches using branchandbound. Our method, on the other hand, eliminates the searching branches by checking the feasibility of the local polytopes and their adjacency. Since a small traversing region often covers a limited number of polytopes, our algorithm has short running time when solving local problems.
Second, since our algorithm explicitly identifies and visits all the polytopes, the final results contain not only the optimal solution but also the whole picture of the model behavior, providing explainability to the oftensocalled blackbox model.
Third, our method requires only linear and convex programming solvers and no MIP solvers. Identifying adjacent polytopes requires only linear programming. Convex programming may be used to solve the subproblem within a local polytope. Our algorithm allows us to incorporate any convex programming solver that is most suitable for the subproblem, providing much freedom to customize.
Last but probably the most important, our algorithm is highly versatile and flexible. Within each local polytope, the model is linear, which is often the simplest type of model to work with. Any analysis that one runs on a linear model can be transplanted here and wrapped inside the polytope traversing algorithm. Therefore, our algorithm provides a unified framework to verify different properties of piecewise linear networks.
6 Conclusion
We explored the unique topological structure that ReLU NNs create in the input space; identified the adjacency among the partitioned local polytopes; developed a traversing algorithm based on this adjacency; and proved the thoroughness of polytope traversing. Our polytope traversing algorithm could be extended to other piecewise linear networks such as those containing convolutional or maxpooling layers.
References
 Anderson et al. (2020) Anderson, R.; Huchette, J.; Ma, W.; Tjandraatmadja, C.; and Vielma, J. P. 2020. Strong mixedinteger programming formulations for trained neural networks. Mathematical Programming, 1–37.
 Arora et al. (2018) Arora, R.; Basu, A.; Mianjy, P.; and Mukherjee, A. 2018. Understanding Deep Neural Networks with Rectified Linear Units. In International Conference on Learning Representations.

Athalye et al. (2018)
Athalye, A.; Engstrom, L.; Ilyas, A.; and Kwok, K. 2018.
Synthesizing robust adversarial examples.
In
International conference on machine learning
, 284–293. PMLR.  Bastani et al. (2016) Bastani, O.; Ioannou, Y.; Lampropoulos, L.; Vytiniotis, D.; Nori, A.; and Criminisi, A. 2016. Measuring neural net robustness with constraints. Advances in neural information processing systems, 29: 2613–2621.
 Bunel et al. (2018) Bunel, R.; Turkaslan, I.; Torr, P. H.; Kohli, P.; and Kumar, M. P. 2018. A unified view of piecewise linear neural network verification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 4795–4804.
 Carlini and Wagner (2017) Carlini, N.; and Wagner, D. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), 39–57. IEEE.
 Chu et al. (2018) Chu, L.; Hu, X.; Hu, J.; Wang, L.; and Pei, J. 2018. Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1244–1253.
 Daniels and Velikova (2010) Daniels, H.; and Velikova, M. 2010. Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks, 21(6): 906–917.

Ehlers (2017)
Ehlers, R. 2017.
Formal verification of piecewise linear feedforward neural networks.
In International Symposium on Automated Technology for Verification and Analysis, 269–286. Springer.  Fischetti and Jo (2017) Fischetti, M.; and Jo, J. 2017. Deep neural networks as 01 mixed integer linear programs: A feasibility study. arXiv preprint arXiv:1712.06174.

Glorot, Bordes, and Bengio (2011)
Glorot, X.; Bordes, A.; and Bengio, Y. 2011.
Deep sparse rectifier neural networks.
In
Proceedings of the fourteenth international conference on artificial intelligence and statistics
, 315–323. JMLR Workshop and Conference Proceedings.  Goodfellow, Shlens, and Szegedy (2014) Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
 Gopinath et al. (2019) Gopinath, D.; Converse, H.; Pasareanu, C.; and Taly, A. 2019. Property inference for deep neural networks. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 797–809. IEEE.
 Gupta et al. (2019) Gupta, A.; Shukla, N.; Marla, L.; Kolbeinsson, A.; and Yellepeddi, K. 2019. How to Incorporate Monotonicity in Deep Networks While Preserving Flexibility? arXiv preprint arXiv:1909.10662.
 Hanin and Rolnick (2019) Hanin, B.; and Rolnick, D. 2019. Deep ReLU Networks Have Surprisingly Few Activation Patterns. Advances in Neural Information Processing Systems, 32: 361–370.
 Katz et al. (2017) Katz, G.; Barrett, C.; Dill, D. L.; Julian, K.; and Kochenderfer, M. J. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer.
 Liu et al. (2019) Liu, C.; Arnon, T.; Lazarus, C.; Strong, C.; Barrett, C.; and Kochenderfer, M. J. 2019. Algorithms for verifying deep neural networks. arXiv preprint arXiv:1903.06758.
 Liu et al. (2020) Liu, X.; Han, X.; Zhang, N.; and Liu, Q. 2020. Certified monotonic neural networks. arXiv preprint arXiv:2011.10219.
 Lu et al. (2017) Lu, Z.; Pu, H.; Wang, F.; Hu, Z.; and Wang, L. 2017. The expressive power of neural networks: A view from the width. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6232–6240.
 Montufar et al. (2014) Montufar, G. F.; Pascanu, R.; Cho, K.; and Bengio, Y. 2014. On the Number of Linear Regions of Deep Neural Networks. Advances in Neural Information Processing Systems, 27: 2924–2932.
 Pulina and Tacchella (2010) Pulina, L.; and Tacchella, A. 2010. An abstractionrefinement approach to verification of artificial neural networks. In International Conference on Computer Aided Verification, 243–257. Springer.
 Pulina and Tacchella (2012) Pulina, L.; and Tacchella, A. 2012. Challenging SMT solvers to verify neural networks. Ai Communications, 25(2): 117–135.
 SchmidtHieber (2020) SchmidtHieber, J. 2020. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48(4): 1875–1897.
 Serra, Tjandraatmadja, and Ramalingam (2018) Serra, T.; Tjandraatmadja, C.; and Ramalingam, S. 2018. Bounding and counting linear regions of deep neural networks. In International Conference on Machine Learning, 4558–4566. PMLR.
 Sharma and Wehrheim (2020) Sharma, A.; and Wehrheim, H. 2020. Testing monotonicity of machine learning models. arXiv preprint arXiv:2002.12278.
 Sudjianto et al. (2020) Sudjianto, A.; Knauth, W.; Singh, R.; Yang, Z.; and Zhang, A. 2020. Unwrapping The Black Box of Deep ReLU Networks: Interpretability, Diagnostics, and Simplification. arXiv preprint arXiv:2011.04041.
 Szegedy et al. (2014) Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014.
 Telgen (1982) Telgen, J. 1982. Minimal representation of convex polyhedral sets. Journal of Optimization Theory and Applications, 38(1): 1–24.
 Telgen (1983) Telgen, J. 1983. Identifying redundant constraints and implicit equalities in systems of linear constraints. Management Science, 29(10): 1209–1222.
 Tjeng, Xiao, and Tedrake (2018) Tjeng, V.; Xiao, K. Y.; and Tedrake, R. 2018. Evaluating Robustness of Neural Networks with Mixed Integer Programming. In International Conference on Learning Representations.
 Weng et al. (2018) Weng, L.; Zhang, H.; Chen, H.; Song, Z.; Hsieh, C.J.; Daniel, L.; Boning, D.; and Dhillon, I. 2018. Towards fast computation of certified robustness for relu networks. In International Conference on Machine Learning, 5276–5285. PMLR.
 Yang, Zhang, and Sudjianto (2020) Yang, Z.; Zhang, A.; and Sudjianto, A. 2020. Enhancing explainability of neural networks through architecture constraints. IEEE Transactions on Neural Networks and Learning Systems.
 Zhao et al. (2021) Zhao, W.; Singh, R.; Joshi, T.; Sudjianto, A.; and Nair, V. N. 2021. Selfinterpretable Convolutional Neural Networks for Text Classification. arXiv preprint arXiv:2105.08589.
 Zou et al. (2020) Zou, D.; Cao, Y.; Zhou, D.; and Gu, Q. 2020. Gradient descent optimizes overparameterized deep ReLU networks. Machine Learning, 109(3): 467–492.
7 Acknowledgments
The authors would like to thank Lin Dong, Linwei Hu, Rahul Singh, and Han Wang from Wells Fargo, and Sihan Zeng from Georgia Institute of Technology for their valuable inputs and feedback on this project.
8 Appendix
8.1 Proof of Lemma 3.1
Lemma 8.1
Given a set , then is a redundant inequality if the new set formed by flipping this inequality is empty: .
Proof:
Let be the set formed by removing inequality : . Then . If , then and the inequality satisfies Definition 3.1.
8.2 Proof of Theorem 4.1
Theorem 8.2
Proof:
Since each partitioning hyperplane divide into two halfspaces, all activation patterns encoded by covers the entire input space. We construct a graph with nodes, each representing a possible polytope code. Some the nodes may correspond to an empty set due to conflicting inequalities. For each pair of nonempty polytope that are oneadjacent to each other, we add an edge to their corresponding nodes. What left to prove is that any pair of nonempty polytopes are connected.
W.l.o.g. assume two nodes with code and that differ only in the first bits. Also assume the polytopes and are both nonempty. We will show that there must exist a nonempty polytope that is oneadjacent to with code different from in one of the first bits. As a result, is now one bit closer to .
We prove the claim above by contradiction. Assuming claim is not true, we flip any one of the first bits in , and the corresponding polytope must be empty. By Definition 3.1, the inequality , must all be redundant, which means they can be removed from the set of constraints Telgen (1982, 1983):
(22) 
The derived relationship in (22) plus the assumption that all must be empty lead to the conclusion that , which contradict with the nonempty assumption.
Therefore, for any two nonempty polytopes and , we can create a path from to by iteratively finding an intermediate polytope whose code is one bit closer to . Since the polytope graph covers all input space and all nonempty polytopes are connected, BFS guarantees the thoroughness of traversing.
Comments
There are no comments yet.