Neural Networks (NNs) are increasingly used in modern cyber-physical systems, where they are often employed as feedback controllers. Typically, however, these NNs are not designed analytically or explicitly, but rather trained from data in an end-to-end learning paradigm. Such data-trained controllers are popular in this context because they often perform quite well; but because they are obtained by an implicit design methodology, they do not come with crucial safety guarantees per se. Thus, algorithms that can independently verify the safety of a NN (controller) are a very active area research, even though progress in this area not kept pace with the momentum of NN controller adoption in safety-critical applications.
Despite the importance – and abundance – of such NN verification algorithms, relatively little attention has been paid to a theoretical analysis of their computational complexity. While it is known that the satisfiability of any 3-SAT formula, , can be encoded as a NN verification problem, this result requires the variables in to be in correspondence with the input dimensions to the network (KatzReluplexEfficientSMT2017a, ). On the one hand, this makes it clear that the complexity of verifying a NN depends unfavorably on the dimension of its input space. On the other hand, this result doesn’t speak directly to the relative difficulty of verifying a NN with a fixed input dimension but an increasing number of neurons. The only results that hint at this second consideration are those that exhibit networks for which the number of affine regions grows exponentially in the number of neurons in the network – see e.g. (MontufarNumberLinearRegions2014, ). However, these kinds of results merely suggest that the verification problem is still “hard” in the number of neurons in the network (input and output dimensions are fixed). There are not, to our knowledge, any concrete complexity results that address this second question.
In this paper, we we prove two such concrete complexity results that explicitly describe the computational complexity of verifying a NN as a function of its size. In particular, we consider two specific NN architectures – shallow NNs and Two-Level Lattice NNs (FerlezAReNAssuredReLU2020, ) – and prove that the complexity of verifying either type of NN grows only polynomially in the number of neurons in the network to be verified, all other aspects of the verification problem held fixed. These results appear in Section 3 as Theorem 1 and Theorem 2 for shallow NNs and TLL NNs, respectively. Our proofs for both of these complexity results are existential: that is we propose two concrete verification algorithms, one for shallow NNs and one for TLL NNs. However, we note that although our proposed algorithms do have polynomial time complexities, the constants and exponents prevent them from being amenable to real-world problems; they are merely proof devices to establish categories of NNs for which polynomial verification is possible (if not practical).
Despite some network specific differences, the algorithms we propose for shallow NNs and TLL NNs depend on the following common observations.
The parameters of both architectures readily expose an arrangement of hyperplanes that partitions the input space into affine regions of the NN to be verified. Consequently, the regions of this arrangement constitute a partitioning of the original verification problem into a number of sub-verification problems, each of which can be solved in polynomial time directly with Linear Programs (LPs). Moreover, the number of hyperplanes in these arrangements can be chosen to depend polynomially on the size of the NN to be verified; thus, since the number of regions in such an arrangement further depends polynomially on the number of hyperplanes, this partitioning results in polynomially many sub-verification problems. Thus, for both shallow NNs and TLL NNs, it is possible to replace the original verification problem with polynomially many verification problems over affine regions of the original NN, each of which may in turn be verified in polynomial time.
Given the prequel, a polynomial verification algorithm is to exhaustively verify the (polynomially many) sub-problems — that is provided the regions in the associated hyperplane arrangement can be enumerated in polynomial time. Thus, we also introduce a novel algorithm that does exactly that. Our algorithm is based on a long-known poset for the regions in a hyperplane arrangement (EdelmanPartialOrderRegions1984, ), but we show that successors in this arrangement can be obtained in polynomial time. Thus, traversing the (poset of) regions in such an arrangement is itself a polynomial operation.
Finally, we note that our results should be viewed in the context of three observations. First, by their mere existence, the complexity results we prove herein demonstrate that the NN verification problem is not per se a “hard” problem as a function of the size of the NN to be verified. Second, although our results show that complexity of verifying a shallow or a TLL NN scales polynomially with its size, all other aspects of the problem held constant, our complexity claim do scale exponentially in the dimension of the input to the NN. Thus, our results do not contradict the known “hardness” of the verification problem as a function of the NN’s input dimension – i.e. the results in (KatzReluplexEfficientSMT2017a, ). Finally, while our results do speak directly to the complexity of the verification problem as a function of the number of neurons, they do not speak as directly to the complexity of the verification problem as a function of the expressivity of a particular network size. We consider this aspect in more detail in Section 3.
Related work: Most of the work on NN verification has focused on obtaining practical algorithms rather than theoretical complexity results, although many have noticed empirically that there is a significant complexity associated with the input dimension; (KatzReluplexEfficientSMT2017a, ) is a notable exception, since it also included an NP-completeness result based on the 3-SAT encoding mentioned above. Other examples of pragmatic NN verification approaches include: (i) SMT-based methods (katz2019marabou, ; KatzReluplexEfficientSMT2017a, ; ehlers2017formal, ); (ii) MILP-based solvers (lomuscio2017approach, ; tjeng2017evaluating, ; bastani2016measuring, ; bunel2020branch, ; fischetti2018deep, ; anderson2020strong, ; cheng2017maximum, ); (iii) Reachability based methods (xiang2017reachable, ; xiang2018output, ; gehr2018ai2, ; wang2018formal, ; tran2020nnv, ; ivanov2019verisig, ; fazlyab2019efficient, ); and (iv) convex relaxations methods (wang2018efficient, ; dvijotham2018dual, ; wong2017provable, ). By contrast, a number of works have focused on the computational complexity of various other verification-related questions for NNs ((KatzReluplexEfficientSMT2017a, ) is the exception in that it expressly considers the verification problem itself). Some NN-related complexity results include: the minimum adversarial disturbance to a NN is NP hard (WengFastComputationCertified2018, ); computing the Lipschitz constant of a NN is NP hard (VirmauxLipschitzRegularityDeep2018, ); reachability analysis is NP hard (RuanReachabilityAnalysisDeep2018a, ).
Throughout this paper, will refer to the real numbers. For an
matrix (or vector),, we will use the notation to denote the element in the row and column of . Analogously, the notation will denote the row of , and will denote the column of ; when is a vector instead of a matrix, both notations will return a scalar corresponding to the corresponding element in the vector. Let be an matrix of zeros. We will special bold parenthesis to delineate the arguments to a function that returns a function. For example, given an matrix, , (possibly with ) and an dimensional vector, , we define the linear function:
(that is is itself a function). We also use the functions and to return the first and last elements of an ordered list (or by overloading, a vector in ). The function concatenates two ordered lists, or by overloading, concatenates two vectors in and along their (common) nontrivial dimension to get a third vector in . We will use an overbar to indicate (topological) closure of a set: i.e. is the closure of . Finally, will denote an open Euclidean ball centered at with radius .
2.2. Neural Networks
In this paper, we will exclusively consider Rectified Linear Unit Neural Networks (ReLU NNs). A -layer ReLU NN is specified by composing layer functions (or just layers). We allow two kinds of layers: linear layers and nonlinear layers. A nonlinear layer with inputs and outputs is specified by a real-valued matrix of weights, , and a real-valued matrix of biases, , as follows:
where the function is taken element-wise, and for brevity. A linear layer is the same as a nonlinear layer, except for the omission of the nonlinearity in the layer function; a linear layer will be indicated with a superscript “lin” as in
Thus, a -layer ReLU NN function as above is specified by layer functions whose input and output dimensions are composable: that is they satisfy . We further adopt the convention that the final layer is always a linear layer; however, other layers are allowed to be linear as desired. Specifically:
When we wish to make the dependence on parameters explicit, we will index a ReLU function by a list of matrices 111That is is not the concatenation of the into a single large matrix, so it preserves information about the sizes of the constituent . in this respect, we will often abuse notation slightly, and use and interchangeably when no confusion will result.
Note that specifying the number of layers and the dimensions of the associated matrices specifies the architecture of the ReLU NN. Therefore, we will use:
to denote the architecture of the ReLU NN . Note that our definition is quite general since it allows the layers to be of different sizes, as long as for .
Definition 0 (Shallow NN).
A shallow NN is a NN with two layers, the first of which is nonlinear and the second of which is linear.
2.3. Special NN Operations
Definition 0 (Sequential (Functional) Composition).
Let and be two NNs where and for some nonnegative integers , and . Then the sequential (or functional) composition of and , i.e. , is a well defined NN, and can be represented by the parameter list .
Definition 0 ().
Let and be two -layer NNs with parameter lists:
Then the parallel composition of and is a NN given by the parameter list
That is accepts an input of the same size as (both) and , but has as many outputs as and combined.
Definition 0 (-element / NNs).
An -element network is denoted by the parameter list . such that is the the minimum from among the components of (i.e. minimum according to the usual order relation on ). An -element network is denoted by , and functions analogously. These networks are defined rigorously in Appendix 7.
2.4. Two-Level-Lattice (TLL) Neural Networks
In this paper, we will be especially concerned with ReLU NNs that have a particular architecture: the Two-Level Lattice (TLL) architecture introduced as part of the AReN algorithm in (FerlezAReNAssuredReLU2020, ). We describe the TLL architecture in two separate subsections, one for scalar output TLL NNs, and one for multi-output TLL NNs.
2.4.1. Scalar TLL NNs
From (FerlezAReNAssuredReLU2020, ), a scalar-output TLL NN can be described as follows.
Definition 0 (Scalar TLL NN (FerlezAReNAssuredReLU2020, , Theorem 2)).
A NN that maps is said to be TLL NN of size if the size of its parameter list can be characterized entirely by integers and as follows.
In the above,
and each , has the form:
for some sequence (recall that is the identity matrix).
The linear functions implemented by the mapping for will be referred to as the local linear functions of ; we assume for simplicity that these linear functions are unique. The matrices will be referred to as the selector matrices of . Each set will be called the selector set of .
2.4.2. Multi-output TLL NNs
For the purposes of this paper, we will define a multi-output TLL NN with range space using equally sized scalar TLL NNs. This is for two reasons. First, it will make the eventual computational complexity expressions for our algorithms more compact. Second, it will make straightforward the connection to assured architecture designs, such as the one in (FerlezAReNAssuredReLU2020, ): if local linear functions are needed in a NN with real-valued outputs, then an architecture with component TLL NNs, each with its own local linear functions, will have enough local linear functions to meet the assurance (a consequence of (FerlezAReNAssuredReLU2020, , Theorem 3)). We will likewise assume that all of the component TLLs have a common number of selector matrices.
Definition 0 (Multi-output TLL NN).
A ReLU NN that maps is said to be an -output TLL NN of size if its parameters are the parallel composition of scalar-output TLL NNs, each of size . That is:
The subnetworks will be referred to as the component (TLL) networks of .
2.5. Hyperplanes and Hyperplane Arrangements
In this section, we review mostly notation for hyperplanes and hyperplane arrangements; (StanleyIntroductionHyperplaneArrangements, ) is the primary reference for this section, although we don’t need any of the sophisticated theorems therein.
Definition 0 (Hyperplanes and Half-spaces).
Let be an affine map. Then define:
We say that is the hyperplane defined by in dimension , and and are the negative and positive half-spaces defined by , respectively.
Definition 0 (Normal Vector to a Hyperplane).
Let be a hyperplane. Then let be the unit normal vector to such that for any , .
Definition 0 (Rank of a set of Hyperplanes).
Let be a set of hyperplanes with associated affine functions . Then we define:
Definition 0 (Hyperplane Arrangement).
Let be a set of affine functions for which each is a function . Then is said to be an arrangement of hyperplanes in dimension .
Definition 0 (Region of a Hyperplane Arrangement).
Let be an arrangement of hyperplanes in dimension defined by corresponding set of affine functions, . Then a non-empty open subset is said to be an -dimensional region of if there is an indexing function such that and
When the dimension of a region is omitted, it will be assume to be the same as the dimension of the arrangement. The set of all regions of an arrangement will be denoted by .
Definition 0 (Face).
Let be an arrangement of hyperplanes in dimension defined by affine functions , and let be an -dimensional region of that is specified by the indexing function . Then a closed set is an dimensional face of if is the closure of an -dimensional region contained in . That is there exists an indexing function such that
for all ; and
Note that for an -dimensional region , is an -dimensional face of . -dimensional faces are called vertices.
Theorem 13 ().
Let be an arrangement of hyperplanes in dimension . Then the number of regions in is at most .
Remark 1 ().
Note that for a fixed dimension, , the bound grows like , i.e. sub-exponentially.
3. Main Results
3.1. NN Verification Problem
In this paper, we take as a starting point the following verification problem, which we refer to as the verification problem for NNs.
Problem 1 ().
Let be a NN with at least two layers. Furthermore, assume that there are two convex, bounded, full-dimensional polytopes and with H-representations222An H-representation of a polytope is (possibly non-minimal) representation of that polytope as an intersection of half-spaces. given as follows:
where is an affine map for each ; and
where is an affine map for each .
Then the verification problem is to decide whether the following formula is true:
If (16) is true, then we say that the problem is SAT; otherwise, we say that the problem is UNSAT.
3.2. Main Theorems
As mentioned above, the main results of this paper demonstrate that Problem 1 can be solved in polynomial time complexity in the number of neurons for two classes of networks. In particular, the following two Theorems are the main results of this paper.
Theorem 1 ().
Let be a shallow network with neurons. Now consider an instance of Problem 1 for this network: i.e. fixed dimensions and , and fixed constraint sets and . Then there is an algorithm that solves this instance of Problem 1 in polynomial time complexity in . This algorithm has a worst case runtime of
where and is the complexity of solving a linear program in dimension with constraints.
Theorem 2 ().
Let be a multi-output TLL network. Now consider an instance of Problem 1 for this network: i.e. fixed dimensions and , and fixed constraint sets and . Then there is an algorithm that solves this instance of Problem 1 in polynomial time complexity in and . This algorithm has a worst case runtime of
where and is the complexity of solving a linear program in dimension with constraints. Consequently, this algorithm is polynomial in the number of neurons in , since the number of neurons depends polynomially on and (see Definition 6).
In particular, Theorem 1 and Theorem 2 explicitly indicate that the difficulty in verifying their respective classes of NNs grows only polynomially in the complexity of the network, all other parameters of Problem 1 held fixed. Note also that the polynomial complexity of these algorithms depends on the existence of polynomial-time solvers for linear programs, but such solvers are well known to exist (see e.g. (NemirovskiInteriorpointMethodsOptimization2008, )).
It is important to note that Theorem 1 and Theorem 2 both identify algorithms that are exponential in the input dimension to the network. Thus, these results do not contradict the 3-SAT embedding of (KatzReluplexEfficientSMT2017a, ). Indeed, given that a TLL NN can represent any CPWA function (FerlezAReNAssuredReLU2020, ) – including the 3-SAT gadgets used in (MontufarNumberLinearRegions2014, ) – it follows directly that the satisfiability of any 3-SAT formula can be encoded as an instance of Problem 1 for a TLL NN. Since the input dimensions of both networks are the same, the conclusion of (KatzReluplexEfficientSMT2017a, ) is preserved, even though the TLL construction may use a different number of neurons.
Finally, it is essential to note that the results in Theorem 1 and Theorem 2 connect the difficulty of verifying a TLL NN (resp. shallow NN) to the size of the network not the expressivity of the network. The semantics of the TLL NN in particular make this point especially salient, since each distinct affine function represented in the output of a TLL NN can be mapped directly to parameters of the TLL NN itself (see Proposition 1 in Section 6). In particular, consider the deep NNs exhibited in (MontufarNumberLinearRegions2014, , Corrollary 6): this parameterized collection of networks expresses a number of unique affine functions that grows exponentially in the number of neurons in the network (i.e. as a function of the number of layers in the network). Consequently, the size of a TLL required to implement one such network would likewise grow exponentially in the number of neurons deployed in the original network. Thus, although a TLL NN may superficially seem “easy” to verify because of Theorem 2, the efficiency in verifying a TLL NN form could mask the fact that a particular TLL NN implementation is less parameter efficient than some other representation (in terms of neurons required). Ultimately, this trade-off will not necessarily be universal, though, since TLL NNs also have mechanisms for parametric efficiency
: for example, a particular local linear function need only be implemented once in a TLL NN, no matter how many disjoint regions on which it is activated (this could be especially useful in the case of networks implementing interpolated zero-order-hold functions, such as in(FerlezTwoLevelLatticeNeural2020, )).
3.3. Proof Sketch of Main Theorems
3.3.1. Core Lemma: Polynomial-time Enumeration of Hyperplane Regions
The results in this paper all have the same broad proof structure:
Choose a hyperplane arrangement with the following three properties:
The number of hyperplanes is polynomial in the number of network neurons;
is the union of the closure of regions from this arrangement; and
Iterate over all of the regions in this arrangement, and for each region, solve Problem 1 with replaced by the closure of the current region.
The details of Step 1 will vary depending on the architecture of the network being verified (Theorem 1 and Theorem 2). However, no matter the details of Step 1, this proof structure depends on a polynomial time algorithm to traverse the regions in a hyperplane arrangement. Thus, the following Lemma establishes the complexity of just such a polynomial-time algorithm.
Lemma 0 ().
Let be a set of affine functions, , that can be accessed in time, and let be the associated hyperplane arrangement.
Then there is an algorithm to traverse all of the regions in that has runtime
where is the complexity of solving a linear program in dimension with constraints.
It is crucial to note that there is more to Lemma 3 than just the sub-exponential bound on the number of regions in a hyperplane arrangement (see Theorem 13 in Section 2). Indeed, although there are only regions in an arrangement of hyperplanes in dimension , it must be inferred which of the possible hyperplane activations correspond to valid regions. That this is possible in polynomial time is the main contribution of Lemma 3, and hence the main facilitator of the other results in this paper.
In both cases, we note that the easiest closed convex polytope on which to solve Problem 1
is one on which the underlying NN is affine. Indeed, suppose for the moment thatis affine on the entire constraint set with on this domain. Under this assumption, solving the verification problem for a single output constraint, , entails solving the following linear program:
Of course if , then Problem 1 is UNSAT under these assumptions; otherwise it is SAT for the constraint and the next output constraint needs to be considered. Given the known (polynomial) efficiency of solving linear programs, it thus makes sense to select a hyperplane arrangement for Step 1 with the property that the NN is affine on each region of the arrangement. Although this is a difficult problem for a general NN, the particular structure of shallow NNs and TLL NNs allow just such a selection to be accomplished efficiently.
To this end, we make the following definition.
Definition 0 (Switching Affine Function/Hyperplane).
Let be a NN. A set of affine functions with is said to be a set of switching affine functions for if is affine on every region of the hyperplane arrangement . is then said to the be arrangement of switching hyperplanes of .
For both shallow NNs and TLL NNs, we will show that a set of switching hyperplanes is immediately evident (i.e. in polynomial complexity) from the parameters of those architectures directly; this satisfies Step 1(b). However it also further implies that this choice of switching hyperplanes has a number of hyperplanes that is polynomial in the number of neurons in either network; this satisfies Step 1(a). Identifying the affine function represented on a region from the arrangement we propose is a different procedure for a shallow NN and a TLL NN. Nevertheless, we show subsequently that it is polynomial in the number of neurons for both types of NNs. By these results, and the procedure outlined in the previous section, we will thus obtain the conclusions of Theorem 1 and Theorem 2.
4. Polynomial-time Algorithm to Traverse Regions in a Hyperplane Arrangement
Our proof strategy for Theorem 1 and Theorem 2 is intended to exploit the sub-exponential scaling of the number of regions in a hyperplane arrangement (in a fixed dimension); see Theorem 13. However, since we ultimately seek a polynomial-time algorithm, it is essential that we exhibit an algorithm to traverse the regions in a hyperplane arrangement without exhaustively searching over all possible combinations of half-spaces associated with hyperplanes.
The core of our region traversal algorithm is a particular poset on the regions in a hyperplane arrangement, which was introduced in (EdelmanPartialOrderRegions1984, ) with other intentions. For our purposes, though, the essential feature of this poset is that given any region, we can compute its successors in polynomial time. Thus, this section first introduces the claimed poset, and then describes polynomial algorithm to traverse it.
4.1. A Poset of Regions in a Hyperplane Arrangement
We begin by defining the poset of interest from (EdelmanPartialOrderRegions1984, ), which we dub the Region Adjacency Poset of a hyperplane arrangement.
Definition 0 (Region Adjacency Poset of a Hyperplane Arrangement (EdelmanPartialOrderRegions1984, , pp. 618)333This poset is unnamed in (EdelmanPartialOrderRegions1984, ): we chose this name for notational convenience.).
Let be a hyperplane arrangement, and let be its regions. Furthermore, let be a specific region of , and without loss of generality assume that
Furthermore, for a region , define the notation
Then the region adjacency poset of based at is a partial order defined as follows. Let , be two regions of .
Note that the definition of the region adjacency poset in terms of the special region really is without loss of generality. For if we wish to take as a base some other region, , then we can simply redefine those hyperplanes that don’t participate in by way of “positive” half-spaces. This alteration doesn’t affect the regions of the arrangement in any way.
No matter the choice of a base region , though, for our purposes it is essential that every other region is a successor of . This is more or less trivially true, but it is stated formally in Proposition 3, by way of Proposition 2.
Proposition 0 ( is ranked (EdelmanPartialOrderRegions1984, , Proposition 1.1)).
The function ranks the poset .
Proposition 0 ().
Let be the region poset associated with the base region , as described in Definition 1. Then for any region it is the case that . Moreover, there exists a finite sequence of regions such that , and is totally ordered by .
Let with as usual. It follows from the choice of B (without loss of generality) that . The other claim follows directly from Proposition 2. ∎
Before we introduce the main facilitatory result for our algorithm, we need the following small proposition.
Proposition 0 ().
Let be an -dimensional face of any region , and let this face be described by the indexing function . Then the set is a singleton.
This follows immediately from Definition 12, since a set of distinct hyperplanes with non-empty intersection has rank if and only if (otherwise, the hyperplanes are all parallel to each other). ∎
Now we are in a position to state the two most important results of this section. The first, Proposition 5 states that every -dimensional face of a region is associated with an adjacent region that is obtained by flipping the sign of a single hyperplane. In other words, this this feature of the poset allows us to “move from one region [in a hyperplane arrangement] to another by ’walking’ through one hyperplane at a time.” (EdelmanPartialOrderRegions1984, ) This result thus forms the motivation for the second critical result of this section, Proposition 6. In particular, Proposition 6 states that the -dimensional faces of a region, , can be obtained by computing one minimal H-representation of . Thus, Proposition 6 forms the basis for a polynomial region traversal algorithm, since minimal H-representations can be computed in polynomial time. (FukudaFrequentlyAskedQuestions, )
Proposition 0 ().
Let be an -dimensional face of any region , and let this face be described by the indexing function , where is the unique affine map for which .
Then there is a unique region with indexing function that satisfies the property:
Regions are uniquely specified by their indexing functions (over the same set of hyperplanes), so if such a region exists, it is unique. Thus, we only need to show that the claimed indexing function corresponds to a valid region. But the only way that the claimed can fail to correspond to a valid region is if the intersection of the associated half-spaces is empty. Thus, we must prove that .
Let , and note that , so is itself a nonempty open set. Moreover, is nonempty, since is of dimension (the only faces with empty interior are vertices). In particular, there exists an . Hence, since is an open set, there exists a ball , and in particular, . However, since , by linearity it is clear that one these vectors belongs to and the other to . Thus, is nonempty as claimed. ∎
Proposition 0 ().
Let be the face of any region with dimension , and let this face be described by the indexing function with (see Proposition 4). Then corresponds to a hyperplane in the minimal H-representation of . Likewise, every affine function in the minimal H-representation of corresponds to an -dimensional face of .
First, we show the forward direction. Suppose that such an corresponds to an -dimensional face of , but is not in the minimal H representation of ; in particular,
But this means that
for the some new indexing function . Let . Since are taking only finite intersections and for any , we can write
Thus, we also have that
This implies that , and in particular
However, by the above argument, this implies that , which contradicts our assumption that is a face of dimension (in particular, that it is the closure of an -dimensional region). Thus, every -dimensional face of corresponds to a hyperplane in the minimal H-representation of .
Finally, let be the minimal H-representation of . It is enough to show that
for any such that if and only if . Thus, fix and suppose not. However, this assumption implies that the set must lie entirely in or , since the aforementioned set is an open, convex set. Of course this directly implies that the affine function is redundant, thus contradicting the assumption that is the minimal H-representation of . ∎
4.2. A Polynomial Time Algorithm to Traverse the Regions in a Hyperplane Arrangement
Proposition 6 is the basis for a polynomial-time algorithm that visits each region in a hyperplane arrangement, since it directly establishes that the successors of any region can be obtained by a polynomial algorithm (a minimal H-representation computation). Thus, we exhibit Algorithm 1, a polynomial-time algorithm to list the successors of a hyperplane region, , given an binary encoding of the half-spaces describing it. The correctness of Algorithm 1 follows directly from Proposition 6.
Thus, it only remains to show that we can effectively use Algorithm 1 to traverse the poset efficiently. However, this can be done by traversing the poset level-wise: recall that the region adjacency poset is ranked by the cardinality of the set (Proposition 2). Thus, we propose Algorithm 2 to traverse the poset level by level, using a heap (over the binary region encoding) to ensure that the same region isn’t traversed multiple times. The correctness of Algorithm 2 then follows from the arguments above and the correctness of Algorithm 1.
Remark 3 ().
Using a heap in Algorithm 2 is somewhat extravagant computationally. We employ it here only because it has a straightforward worst-case runtime.
Now we are in a position to state the proof of Lemma 3.
(Lemma 3) Together, the correctness of both Algorithm 1 and Algorithm 2 is an existential proof that such an algorithm is possible, so we just need to account for the claimed runtime. But this follows from the fact that there are regions in a hyperplane arrangement. For each of these regions, we need to compute one minimal H-representation, which has cost , and we need to do at most heap operations per region, each of which we assume is logarithmic in the size of the heap (in this case, the size of the heap is itself bounded by the number of regions). ∎
5. Polynomial-time Algorithm to Verify Shallow NN
This section will consist of a sequence of propositions that address the various aspects of Step 1, as described in Subsection 3.3.1. Then the section will conclude with a formal statement of the proof of Theorem 1.
Proposition 0 ().
Let be a shallow NN with . Then the set of affine functions
is a set of switching affine functions for , and is a set of switching hyperplanes.
A region in the arrangement exactly assigns to each neuron a status of strictly active or strictly inactive – i.e. the output of the neuron is strictly greater than zero or strictly less than zero. But forcing a particular activation status on each of the neurons, forces the shallow NN to operate in an affine region. ∎
Proposition 0 ().
Let Let be a shallow NN with , and let be as in Proposition 1. Then the complexity of determining the active linear function on a region of is at most
This runtime is clearly dominated by the cost of doing the matrix multiplication . Given that , this operation has the claimed runtime. ∎
Proposition 0 ().
This follows trivially by definition of a region and the fact that we are merely adding hyperplanes to the arrangement . ∎
Now we can finally state the proof of Theorem 1.
(Theorem 1) We need to traverse the hyperplane
, which has hyperplanes. By Theorem 13, this arrangement has regions, so by Lemma 3, we need at most calls to an LP solver to traverse all of these regions – i.e. we need LP calls per region to find the minimal H-representation of that region using Algorithm 1, in addition to the overhead associated with Algorithm 2. Then, on each region, we need to access the active linear function, so we add the run time described in Proposition 2 times the number of regions, i.e. . Finally, we need to run one linear program per region per output constraint. This comes at a total cost of calls to the LP solver. This explains the runtime expression claimed in the Theorem. ∎
6. Polynomial-time Algorithm to Verify TLL NN
This section will consist of a sequence of propositions that address the various aspects of Step 1, as described in Subsection 3.3.1. Then the section will conclude with a formal statement of the proof of Theorem 2.
Proposition 0 ().
Let be a TLL NN as described in Definition 6. Then define
and . Furthermore, define and
Then is a set of switching affine functions for , and the component of is an affine function on each region of and is exactly equal to for some .
Let be a region in . It is obvious that such a region is contained in exactly one region from each of the component-wise arrangements , so it suffices to show that each component TLL is linear on the regions of its corresponding arrangement.
Thus, let be a region in . We claim that is linear on . To see this, note by definition of a (-dimensional) region, there is an indexing function such that
Thus, is a unique order region by construction: each such half-space identically orders the outputs of two linear functions, and since is -dimensional it is contained in just such a half space for each and every possible pair. That is
for some sequence .
Thus, for each , there exists an index such that
That is each of the min terms in is exactly equal to one of the linear functions on . Therefore,
But applying the unique ordering property of to the above further implies that there exists an index such that