Effective Formal Verification of Neural Networks using the Geometry of Linear Regions

by   Haitham Khedr, et al.
University of California, Irvine

Neural Networks (NNs) have increasingly apparent safety implications commensurate with their proliferation in real-world applications: both unanticipated as well as adversarial misclassifications can result in fatal outcomes. As a consequence, techniques of formal verification have been recognized as crucial to the design and deployment of safe NNs. In this paper, we introduce a new approach to formally verify the most commonly considered safety specification for ReLU NNs – i.e. polytopic specifications on the input and output of the network. Like some other approaches, ours uses a relaxed convex program to mitigate the combinatorial complexity of the problem. However, unique in our approach is the way we exploit the geometry of neuronal activation regions to further prune the search space of relaxed neuron activations. In particular, conditioning on neurons from input layer to output layer, we can regard each relaxed neuron as having the simplest possible geometry for its activation region: a half-space.This paradigm can be leveraged to create a verification algorithm that is not only faster in general than competing approaches, but is also able to verify considerably more safety properties. For example, our approach completes the standard MNIST verification test bench 2.7-50 times faster than competing algorithms while still proving 14-30 neural network controlled autonomous robot in a structured environment, and observed a 1900 times speed up compared to existing methods.



There are no comments yet.


page 1

page 2

page 3

page 4


Reachable Set Computation and Safety Verification for Neural Networks with ReLU Activations

Neural networks have been widely used to solve complex real-world proble...

ZoPE: A Fast Optimizer for ReLU Networks with Low-Dimensional Inputs

Deep neural networks often lack the safety and robustness guarantees nee...

Reachable Polyhedral Marching (RPM): A Safety Verification Algorithm for Robotic Systems with Deep Neural Network Components

We present a method for computing exact reachable sets for deep neural n...

The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification

We improve the effectiveness of propagation- and linear-optimization-bas...

Fast Falsification of Neural Networks using Property Directed Testing

Neural networks are now extensively used in perception, prediction and c...

Verification of Binarized Neural Networks via Inter-Neuron Factoring

We study the problem of formal verification of Binarized Neural Networks...

Precise Multi-Neuron Abstractions for Neural Network Certification

Formal verification of neural networks is critical for their safe adopti...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Neural Networks have become an increasingly central component of modern machine learning systems, including those that are used in safety-critical cyber-physical systems such as autonomous vehicles. The rate of this adoption has exceed the ability to reliably verify the safe and correct functioning of these components, especially when they are integrated with other components such as controllers. Thus, there is an increasing need to verify that NNs reliably produce safe outputs, especially in the presence of malicious adversarial inputs 

Szegedy et al. (2013); Goodfellow et al. (2014); Kurakin et al. (2016); Song et al. (2018).

In this paper, we propose PeregriNN, an algorithm for efficiently and formally verifying ReLU NNs. In particular, we consider the problem of whether a particular set of inputs always results in NN outputs within some other (output) set. However, PeregriNN will also verify input and output constraints that are interrelated by convex inequalities: this feature distinguishes PeregriNN from other formal NN verifiers, which verify only static input/output constraints. And in particular, it makes PeregriNN uniquely well suited to the verification of NNs when they are used as state-feedback controllers for dynamical systems: in such cases, static input/output constraints are inadequate to capture the most important safety properties.

Broadly speaking, PeregriNN falls into the category of sound and complete search and optimization NN verifiers Liu et al. (2019). Like other algorithms in this category, the optimization aspect of PeregriNN is a relaxed convex program where the output of each individual neuron is assigned a slack variable that is decoupled from the actual output of the neuron; the convex solver tries to minimize the slacks in order to drive each slack to match the output of its associated neuron, thereby obtaining an actual input/output response of the network (see also Fig. 1). The search aspect of PeregriNN is a consequence of the fact that the convex solver often cannot drive all of these slacks to zero: in such a case, the neurons with non-zero slacks can be regarded as indeterminate, and a search must be conducted over all the possible combinations of activations (exponentially growing Montufar et al. (2014); Pascanu et al. (2013)) for these neurons. This is accomplished by means of conditioning on the neurons one at a time until all possible activation combinations are exhausted, usually with the benefit of a methodology for ruling out multiple combinations at once.

The main contribution of the PeregriNN algorithm is its search algorithm (and the modified convex problem that makes it possible). Uniquely, the PeregriNN algorithm searches over indeterminate-neuron activations in a way that emphasizes and exploits the geometry of their activation regions. In particular, PeregriNN leverages the following geometric observation: from the input-feature space of a network, the activation regions of neurons in the input layer111We refer to the first hidden layer as the input layer. are demarcated by hyperplanes, and so have a much simpler geometry than activation regions from deeper in the network. To recreate and exploit this advantage for arbitrary indeterminate neurons in the network, PeregriNN incorporates two different levels of prioritization in its search.

  1. [wide, labelwidth=0pt, labelindent=0pt]

  2. Inter-layer prioritization. PeregriNN always prefers to search (i.e. condition on) neurons closest to the input layer1, so that the next indeterminate neuron in the search necessarily receives its input from a sub-network operating in a linear region; hence, the next search neuron may itself be regarded as an input neuron of the complementary sub-network. This also depends on a novel convex program.

  3. Intra-layer prioritization.

    PeregriNN furthers exploits this exposed hyperplane geometry via a novel search priority within each layer. In particular, PeregriNN prioritizes the activation region with the smallest volume: this is a heuristic that balances the accuracy of over-approximation methods with the number of activation combinations they can prune from the search.

We used PeregriNN to verify the adversarial robustness of networks trained on MNIST LeCun (1998) and CIFAR-10 Krizhevsky and Hinton (2014) datasets, as well as safety properties of a NN-controlled autonomous system. For MNIST, our experiments show that PeregriNN is on average 2.7 faster than Neurify Wang et al. (2018a) and 50 faster than Marabou Katz et al. (2019) which are two state-of-the-art algorithms. It also proves 2 % and 80 % more properties than Neurify and Marabou on CIFAR dataset respectively. PeregriNN is also 1900 faster than SMC Shoukry et al. (2018) for verifying the safety of NN controlled autonomous systems.

Related work.

Since PeregriNN is a sound and complete verification algorithm, we restrict our comparison to other sound and complete algorithms. NN verifiers can be grouped into roughly three categories: (i) SMT-based methods, which encode the problem into a Satisfiability Modulo Theory problem Katz et al. (2019, 2017); Ehlers (2017)

; (ii) MILP-based solvers, which directly encode the verification problem as a Mixed Integer Linear Program 

Lomuscio and Maganti (2017); Tjeng et al. (2017); Bastani et al. (2016); Bunel et al. (2020); Fischetti and Jo (2018); Anderson et al. (2020); Cheng et al. (2017); (iii) Reachability based methods, which perform layer-by-layer reachability analysis to compute the reachable set Xiang et al. (2017, 2018); Gehr et al. (2018); Wang et al. (2018b); Tran et al. (2020); Ivanov et al. (2019); Fazlyab et al. (2019); and (iv) convex relaxations methods Wang et al. (2018a); Dvijotham et al. (2018); Wong and Kolter (2017). In general, (i), (ii) and (iii) suffer from poor scalability. On the other hand, convex relaxation methods depend heavily on pruning the search space of indeterminate neuron activations; thus, they generally depend on obtaining good approximate bounds for each of the neurons in order to reduce the search space (the exact bounds are computationally intensive to compute Dutta et al. (2017)). These methods are most similar to PeregriNN: for example, Wang et al. (2018b); Bunel et al. (2020); Royo et al. (2019) recursively refine the problem using input splitting, and Wang et al. (2018a) does so via neuron splitting. Other search and optimization methods include: Planet Ehlers (2017), which combines a relaxed convex optimization problem with a SAT solver to search over neurons’ phases; and Marabou Katz et al. (2019), which uses a modified simplex algorithm to handle non-convex ReLU activations.

2 Problem formulation

Neural Networks.

In this paper, we will consider Rectified Linear Unit (ReLU) NNs. By a (

-layer) ReLU network, we mean a composition of () ReLU layer functions (or just layers): i.e. where the ReLU layer function is parameterized by weights, , and biases, , and is defined as .

To simplify future notation, we define the output of each layer in the computation of as for . To refer to individual neurons, we use the notation to refer to the

element of the vector

. As a final notational convenience, we refer to as the input layer.

Verification Problem. Let be an -layer NN as defined above. Furthermore, let be a convex polytope in the input space of , and let be a convex polytope in the output space of . Finally, let , be convex functions. Then the verification problem is to decide whether


Note that the addition of the convex inequality constraints is a unique feature of our problem formulation compared to other NN verifiers, and it significantly broadens the scope of the problem. In particular, other solvers can only verify independent input and output constraints and .

3 Approach

Figure 1: Neuron activation by minimizing a slack variable, .

As in some other algorithms, we convert the verification problem (Section 2) into a relaxed convex program (e.g. Wang et al. (2018a, b); Bunel et al. (2020)). Convex programs of this type assign a slack variable to the output of each neuron that is purposely allowed to differ from the actual output of the neuron (as calculated from its inputs). Each slack variable is then constrained to lie above the response of its associated neuron, and the input and output sets from the verification problem are incorporated as further constraints on the relevant slack variables. The objective, then, is to minimize the sum of total slacks in the hope that each slack variable will be driven to lie exactly on the response of its respective neuron. In such a case, the solver will have found a solution that corresponds to an actual input/output pair for the network, and thus decide the verification problem. Mathematically, relaxed convex programs of this type can be written as:


A valid input/output pair for the network is obtained when either or for all . An example of this situation is depicted in Fig. 1, where the black dot illustrates a choice of the slack variable that corresponds to a valid input/output pair for the illustrated neuron.

Given the efficiency of modern convex solvers, this is an extremely attractive approach when it is successful. However, all formulations of this type suffer from the same drawback: the solver may return a solution in which some of the slack variables are indeterminate – i.e. they do not lie on the response of their respective neurons. In a case like this, the solution doesn’t correspond to an actual evaluation of the network, and more work is needed to decide the verification problem. In particular, the indeterminate neurons must be conditioned – i.e. constrained to be either active or inactive – and then the convex program re-solved with these new constraints. This process is repeated as a search over all combinations of different conditionings until either a valid solution is found or the problem is shown to be infeasible.

The primary novelty in our approach is thus the way in which we search over the different possible conditionings of indeterminate neurons. Most approaches treat all indeterminate neurons as roughly equivalent for the purpose of search, and use ReLU over-approximation methods to guide which neuron to consider next (and to exclude combinations when possible). Instead, we propose a new, more efficient heuristic that prioritizes neurons in a way that emphasizes the geometry of their activation regions: our approach uses this information to more effectively exclude combinations of conditionings from the search space. Our approach has two specific levels of prioritization that we described subsequently.

Figure 2: (Top) Activation regions for a three-neuron NN with a 2-d input-feature space. The NN has two input neurons and one output neuron: black and red lines show activation boundaries for input neurons; the blue line shows the activation region for the output neuron. (Bottom) Illustration of the conditioning order. Indeterminate neurons are shown in blue, with those already conditioned also filled-in; all other neurons are active or inactive as specified by the convex solver. The next neuron is selected from the red rectangle.

3.1 Inter-Layer Prioritization

Consider neurons’ activation regions as expressed in the input-feature space of a network: the activation regions of input-layer neurons are always hyperplanes, whereas the activation regions for deeper neurons are more complicated regions – see the top pane of Fig. 2 for an example. Thus, input-neuron activation regions are specifically subject to all of the geometric properties of hyperplane arrangements Stanley . In particular, the geometric properties of hyperplane arrangements govern which input-layer-neuron activation regions have non-empty intersections, and hence which combinations of activations are simultaneously possible. In the extreme case when there are more neurons in an input layer than there are inputs, the reduction in possible activation combinations can be considerable. It is known that in such cases, the number of non-empty regions formed by an arrangement of hyperplanes scales sub-exponentially in the number of hyperplanes – i.e. the number of input-layer neurons. In particular, the number of regions formed by such hyperplanes each of dimension is at most .

Thus, we propose to always condition on those indeterminate neurons closest to the input layer, since this will recreate these geometric properties for all indeterminate neurons within the same layer – even if that layer is deeper in the network. Spurious conditionings can thus be eliminated by evaluating the compatibility of these indeterminate neurons according to their activation-region hyperplanes before conditioning them. This amounts to a direct pruning of conditionings from the search space. Note that the sub-exponential savings is particularly salient in shallow networks where the number of neurons typically exceeds the number of inputs by a significant factor.

This prioritization works because it effectively partitions the network along a layer boundary into two sub-networks: see the bottom pane of Fig. 2. The suggestively named fixed-phase sub-network necessarily operates in a (known) linear region because the phases of all of its neurons are fixed (either by the convex solver or already taken conditioning decisions). Consequently, the relevant portion of the input constraint set can be propagated through this fixed-phase sub-network to obtain an exact polytope representation of its outputs. But the next indeterminate neuron to be conditioned can be regarded as belonging to the input layer of the un-conditioned sub-network, whose input is of course supplied by the output of the fixed-phase sub-network. Thus, we can effectively reconsider the original verification problem in terms of a new verification problem on the un-conditioned sub-network, and one where next indeterminate neuron to be conditioned is in the input layer.

There is one important implementation detail that is necessary to actually implement this layer-wise prioritization, but which we have not yet mentioned. Notably, the search procedure involves re-solving the convex program with neuron conditionings as we decide them, and in general, this may result in new indeterminate neurons in layers closer to the input layer than the last neuron we conditioned. In order to prevent this, and thus ensure that re-solving the convex program forces new indeterminate neurons to appear deeper in the network, we modify the objective function in (2) with layer-wise weights that strongly penalize slacks in earlier layers. That is we use an objective function like:


where the coefficients are chosen so that non-zero slack variables are guaranteed to be driven as close to the input layer as possible using the notion of prefix-ordered monotone formulas shown in Shoukry et al. (2018).

3.2 Intra-Layer Prioritization

One important technique to improve the conditioning search is to use ReLU over-approximations (e.g. symbolic interval analysis Zhang et al. (2018); Wang et al. (2018b)) to show that a particular conditioning cannot possibly satisfy the constraints of the verification problem. That is if the over-approximation can’t satisfy the output constraints, then the particular choice of conditioning can’t either. Over-approximations of this type are best be employed when not all of the neurons have yet been conditioned: if a particular neuron conditioning is impossible, then there is no need to check the activation combinations of the remaining indeterminate neurons. The opposite conditioning can thus be incorporated into the convex program as a constraint to aid the convex solver the next time it is run. In particular, when a specific indeterminate neuron can be shown to necessarily be on or off, the inequality constraints for that neuron in (2) can be replaced by the combined constraints or , respectively.

However, over-approximations like this come with a drawback: the more neurons that can be both active and inactive in a region, the worse the approximation error is – and hence the less likely that that particular combination is invalidated by the over-approximation. On the other hand, the more neurons that have both phases active in a region, the more possible combinations that are eliminated if the activation region is infeasible. Thus, there would seem to be an intuitive trade-off between the number of further conditionings that are possible within an activation region, and the likelihood that those combinations can be excluded by over-approximating the network.

Given the need to use over-parameterized deep networks for effective training, it is plausible to suppose that activation regions are roughly uniformly distributed within any un-conditioned network (as defined in

Fig. 2). According to this heuristic, then, the volume of an activation region is a useful proxy for the number of indeterminate neurons that can be excluded by over-approximating the network on that region. This volume heuristic is an especially convenient one given the discussion above: after all, it is considerably easier to compute the volume of activation regions when those activation regions are defined by hyperplanes, as they are for neurons in the input layer; again, refer the top pane of Fig. 2.

Thus, we start from the observations in Section 3.1 to propose the following intra-layer conditioning priority: we condition the next indeterminate neuron (within a layer) according to the (valid) activation region that has the smallest volume

. We reiterate that this heuristic is only reasonable to implement because of the geometric basis we established above. Moreover, we have found that minimum-volume priority is an effective heuristic across all of our experiments, which suggests that the aforementioned trade-off skews heavily toward the accuracy of the over-approximation, not the number of neurons that can be excluded. Nevertheless, this analysis suggests that there is a rich opportunity for future work here. In particular, it is natural to consider the possibility of a volume-based prioritization that is adaptive according to the properties of the individual neuron under consideration.

4 Algorithm

We now describe how our novel search procedure integrates with other techniques to form the full PeregriNN algorithm. The state of our algorithm is captured by a list of conditioning choices already made. The main loop of the algorithm updates this state once per iteration in two sequential stages: a convex query stage and a state update stage. The convex query stage entails solving a single convex problem derived from the current list of conditioning choices. Based on the result of the convex query, the state update stage proceeds in one of two ways: it conditions if the convex solver returns indeterminate neurons (i.e. descends deeper, adding conditionings to the state), or it backtracks if the convex solver returns infeasible (i.e. ascends shallower, removing conditionings from the state). Before executing the convex solver, we also include an inference step, which further improves efficiency by providing the solver additional constraints that are inferred by symbolic interval analysis.

Convex Query. The convex query step takes the current list of conditioning choices, and translates them into constraints on the slack variables as follows:


These constraints are then added of the convex program (3), and the program is passed to the convex solver to solve. The output of the convex solver (for our purposes) is a either a list of indeterminate neurons or the conclusion that the program was infeasible. Any list of indeterminate neurons returned by the solver does not include any of the neurons we have conditioned on already by definition. Moreover, any returned indeterminate neurons are at least as deep in the network as the last conditioned neuron: this is because of the choice of the in (3) (see Section 3.2).

State Update. The state update stage proceeds to update the algorithm’s state by one of two mechanisms, depending on the output of the convex solver.

Conditioning. If the convex solver returns a list of indeterminate neurons, then PeregriNN’s unique prioritizations are used to update the state with a new conditioning choice (see Section 3). First, the inter-layer prioritization is applied: the list of indeterminate neurons is sorted by depth, and only those neurons in the layer closest to the input are eligible for conditioning. From these neurons, the next conditioning choice is made using the intra-layer prioritization scheme: the activation region with the smallest volume (that is also compatible with the other conditionings in the layer) is selected, and added to the state. After the state update, a new iteration of the main loop starts.

Backtracking. If the convex solver returns infeasible, then we have explored as far as possible into the network given the current conditioning choices. Thus, the algorithm must backtrack, and undo some of those choices to explore other possible activation combinations. Since each conditioning choice is in direct correspondence with a pair of convex constraints – i.e. (4) and (5) – and the convex solver returned infeasible, we know that these constraints are mutually incompatible. Thus, we can use an Irreducible Inconsistent Subsystem (IIS) of these constraints to find compatible constraints, and roll back some of the conditioning choices in the current state. Moreover, since an IIS is irreducible, this method often rolls back a considerable number of conditionings at once. After the state update, a new iteration of the main loop starts.

As an additional optimization, we perform an inference step right before executing the convex query.

Inference. Given a set of conditioning choices, it is often possible to rule out some other neuron activations directly by over-approximation methods. Thus, we use Symbolic Interval AnalysisWang et al. (2018b) to find other neuron activations that must follow from the current state of conditioning choices. These inferred conditioning choices are converted to convex constraints as in (4) and (5), and added to the convex program right before the convex query stage is executed.

1:procedure NN_verify(nn, problem)
2:     inferred = ; decided =
3:     while True do
4:          inferred, undecided SymIntervalAnalysis(nn, problem.input_bounds)
5:          sol, realxed_neuronsCheckFeas(nn, problem, undecided) Inter layer prioritization
6:          if sol.status == INFEASIBLE then
7:               if decided ==  then return SAFE
8:               else
9:                    decided backtrack(decided, problem)                
10:          else if relaxed_neurons == 0 then return UNSAFE, sol
11:          else
12:               neuron pick_one(relaxed_neurons) Intra layer prioritization
13:               decided decided neuron                
Algorithm 1 Verification of ReLU networks

5 Experiments


We implemented PeregriNN in Python, but used off-the-shelf Gurobi Optimization (2013) convex optimizer. We ran our experiments on a 24-core machine with 256 GB of memory. For fairness of comparison across implementations as well as single-threaded algorithms, we limited all algorithms(including PeregriNN) to run on a single core.

In order to evaluate the performance and effectiveness of PeregriNN, we conducted two different experiments, which can be summarized as follows.

1. We used PeregriNN to verify the adversarial robustness of NNs trained on standard datasets (MNIST and CIFAR); this experiment allowed us to compare PeregriNN against state-of-the-art NN verifiers both in terms of execution time and in terms of effectiveness at proving properties.

2. We used PeregriNN to evaluate the safety of NN controllers for a LiDAR equipped quadrotor; this experiment exercised PeregriNN’s unique ability to verify properties specified with interrelated input/output constraints (see (1)) in a practical safety verification problem.

5.1 Adversarial robustness

In this experiment, our objective is to compare PeregriNN with other NN verifiers both in terms of performance and in terms of effectiveness at proving properties. To this end, we evaluated adversarial robustness of NN classifiers that we trained on the MNIST and CIFAR datasets. Each property we considered characterized whether a

-norm perturbation of the input could lead to a misclassification, and we parameterized these properties by the size of the input perturbation, . Specifically, let be the classifier output indicating the belief that the input belongs to the category out of categories; then checking the robustness of the NN around is equivalent to checking the truth of:


Problem (6) can proved by checking instances of (1). Specifically, for each instance we check if:


where . If any of those instances is unsatisfied (nonempty set), then the property is violated. Otherwise, the property holds.

Models # ReLUs Architecture Accuracy
MNIST_FC1 48 <784,24,24,10> 96.5%
MNIST_FC2 100 <784,50,50,10> 97.5%
MNIST_FC3 1024 <784,512,512,10> 98.2%
CIFAR_FC1 2048 <3072,1024,512,512,10> 98%
Table 1: Models used in the experiments

MNIST Results.

We evaluated the robustness of three different classifiers for the MNIST dataset using four different magnitudes of input perturbation. The architectures of these classifiers are shown in Table 1 together with the accuracy and the number of ReLUs for each of these networks. We tested our framework with 100 randomly selected images and compared it with Marabou and Neurify. We chose to compare with Neurify because to our knowledge, Neurify is the best performing NN verifier on this dataset (and similar ones) Wang et al. (2018a); we chose to compare with Marabou, since it is one of the newest verifiers. Each query consists of an input image, and the property to be checked is whether the network is robust against an -norm perturbation. The timeout for checking each property is 20 minutes. Table 2 summarizes the performance of the three solvers on the three networks.

Time(s) / proved cases
Time(s) / proved cases
Time(s) / proved cases
308.8 / 100
682.2 / 100
38778.5 / 95
7.1 / 100
6.37 /100
384 / 100
10 / 100
7.806 / 100
91.6 / 100
2198.6 / 100
41812.13 / 81
1200000 / 0
4058.6 / 99
45482.1 / 68
105088.2 / 13
595.726 / 100
5921 / 99
39477 / 71
23329.0 / 94
107815.86 / 16
120000 / 0
24725.5 / 86
48650.74 / 61
100306.86 / 17
7580 / 99
80844 / 83
118478.55 / 4
25039.82 / 89
84655.4 / 32
120000 / 0
8056.5 / 94
25211.23 / 79
120000 / 0
6330 / 98
31384.321 / 78
118356.89 / 3
Table 2: Performance (execution time, number of proved cases) of the three solvers using three different networks. For each cell, the total execution time for the 100 queries is reported together with the number of successfully proved properties. The last column shows the number of properties proved to be safe by PeregeriNN.

PregeriNN outperforms Marabou and Neurify by 50 and 2 respectively on average execution time, and it proves 14-30% more properties than Marabou and Neurify respectively. The results also show that for the cases that Neurify performs faster, PeregeriNN can still prove more properties in almost the same amount of time. We note here that when running Neurify on MNIST_FC3 with , it gave segmentation faults due to huge memory consumption, and we counted these cases as timeouts. As shown in the table, the number of unproved cases generally increases with

and with the size of the network; this is due to looser bounds estimates on the neurons and larger search space induced by larger networks.

Marabou Neurify PeregeriNN
0.005 0 89 96
0.0075 0 88 92
0.01 0 76 78
0.02 0 57 57
0.05 0 75 74
Table 3: Number of proved properties out of 100 queries using CIFAR_FC1

CIFAR Results.

We ran the same experiments on different networks trained on CIFAR dataset. We chose the CIFAR dataset to evaluate the performance of PeregeriNN on networks with large input spaces (3072 features). Table 3 shows the number of proved properties out of 100 random queries for each of the solvers. The results shows that PeregeriNN can prove more properties than the other solvers on networks with large input spaces and ReLU counts. 

5.2 Safety of Neural Network Controlled Physical systems

In this experiment, our objective is to study two properties: (i) safety verification of a NN-controlled autonomous system and (ii) how our framework scales with the size of the trained NN to be verified. To this end, we consider the problem of verifying the safety of an autonomous quadrotor equipped with a LiDAR sensor and controlled by a NN that processes LiDAR measurements to produce control actions Sun et al. (2019). One way to verify such systems is to discretize the workspace into discrete partitions and check the feasibility of transition between these partitions to the unsafe set (obstacles) . Let be the position of the quadrotor. As shown in Sun et al. (2019), the next position of the quadrotor is then given by where the matrices and describes the physics of the robot (e.g., mass, friction, .. etc) while the affine term captures the relation between the quadrotor position and the LiDAR image. Therefore, checking the safety of the NN controller is then written as:


Indeed, the system safety property (8) can be checked by solving formulas of the form (1).

We use PeregriNN to verify (8) by varying the workspace discretization parameter

and recording the execution time for 10 different NN that have the same exact architecture and are all trained using imitation learning with 1143 episodes. Table

4 shows how the safe regions of the workspace varies with the discretization parameter . PeregriNN is able to verify the safety properties for all the networks and exactly identify the safe regions in the workspace. Next, we evaluate the scalability of PeregriNN by verifying the property (8) for NNs with different architectures and recording the verification time. Table 5 shows the scalability of our framework with different architectures of NNs. PeregeriNN can verify networks with 100,000 ReLUs in just a few seconds. However, increasing the depth of the network increases the difficulty of the verification problem. Note that the results reported in Sun et al. (2019), which uses SMC solvers Shoukry et al. (2018), are capable of handling at most networks with 1000 ReLUs. Comparing PeregeriNN to SMC solver in Sun et al. (2019), we conclude that PeregeriNN can verify networks that are 2 orders of magnitude larger than SMC with 1900 times less execution time.

Epsilon Number of safe/unsafe regions
1 2 3 4 5 6 7 8 9 10
0.25 46/52 33/65 49/49 45/53 46/52 53/45 51/47 63/35 74/24 51/47
0.5 27/38 22/43 30/35 27/38 27/38 29/36 31/34 39/26 49/16 36/29
0.75 20/34 17/37 24/30 21/33 21/33 23/31 26/28 31/23 43/11 32/22
Table 4: Shows the number of safe and unsafe regions for 10 different networks
# of neurons per layer # of layers
1 2 3 4 5 6
20 0.025 0.0479 0.1184 0.4767 26.76 0.257
128 0.267 1.57 243.8 3394.18 2740.341 1368.55
256 0.31 0.92 6956.69 136.44 4.4352 1471.29
512 0.679 19.83 5.43 10058.13 9649.55 35783.58
# of neurons time(s)
1024 3.374
4096 7.2517
20000 7.458
50000 30.189
100000 68.8614
Table 5: (Left)Shows the execution time in seconds for checking the feasibility of transition between a pair of regions in the workspace. We test the scalability of the solver by solving the verification problem for different architectures by varying the number of neurons per layer and the depth of the network. (Right) shows the verification time for single layer networks with different width.

Broader Impact

New advances in AI systems have created an urgency to study safety, reliability, and potential problems that can rise and impact the society by the deployment of AI-enabled systems in the real world. Mathematically based techniques for the specification, development, and verification of complex systems, also known as formal methods, hold the promise to provide appropriate rigorous analysis of the reliability and safety of such AI-enabled systems.

This work provides a new solver to formally verify whether a NN satisfies specified formal properties in a bounded model checking scheme. The basic idea of bounded model checking is to search for a counterexample that violate the formal property. Such counterexamples can be then used by NN developers to better understand the limitations of the trained NN in terms of safety, robustness, and hopefully bias. This in turn can enable the use of AI in safety critical cyber-physical applications that are generally regarded to have positive societal influences: autonomous cars and aircraft collision avoidance systems, for example. This work can also be used to identify performance and robustness problems in NNs that are used in non-cyber-physical applications: for example, NNs that are used in criminal justice contexts or to decide creditworthiness. On the negative side, formally verified AI systems may result into a false sense of safety since such technologies do not reason about un-modeled behaviors and side-effects. Another negative effect stems from the proliferation of the technologies that it enables: for example, increased deployment of autonomous vehicles has the potential to cause job loss.


  • R. Anderson, J. Huchette, W. Ma, C. Tjandraatmadja, and J. P. Vielma (2020) Strong mixed-integer programming formulations for trained neural networks. Mathematical Programming, pp. 1–37. Cited by: §1.
  • O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi (2016) Measuring neural net robustness with constraints. In Advances in neural information processing systems, pp. 2613–2621. Cited by: §1.
  • R. Bunel, J. Lu, I. Turkaslan, P. Kohli, P. Torr, and P. Mudigonda (2020) Branch and bound for piecewise linear neural network verification. Journal of Machine Learning Research 21 (2020). Cited by: §1, §3.
  • C. Cheng, G. Nührenberg, and H. Ruess (2017) Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, pp. 251–268. Cited by: §1.
  • S. Dutta, S. Jha, S. Sanakaranarayanan, and A. Tiwari (2017) Output range analysis for deep neural networks. arXiv preprint arXiv:1709.09130. Cited by: §1.
  • K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli (2018) A dual approach to scalable verification of deep networks.. In UAI, Vol. 1, pp. 2. Cited by: §1.
  • R. Ehlers (2017)

    Formal verification of piece-wise linear feed-forward neural networks

    In International Symposium on Automated Technology for Verification and Analysis, pp. 269–286. Cited by: §1.
  • M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas (2019) Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, pp. 11423–11434. Cited by: §1.
  • M. Fischetti and J. Jo (2018) Deep neural networks and mixed integer linear optimization. Constraints 23 (3), pp. 296–309. Cited by: §1.
  • T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev (2018) Ai2: safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §1.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1.
  • R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee (2019) Verisig: verifying safety properties of hybrid systems with neural network controllers. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pp. 169–178. Cited by: §1.
  • G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer (2017) Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Computer Aided Verification, R. Majumdar and V. Kunčak (Eds.), Lecture Notes in Computer Science, pp. 97–117. External Links: Document, ISBN 978-3-319-63387-9 Cited by: §1.
  • G. Katz, D. A. Huang, D. Ibeling, K. Julian, C. Lazarus, R. Lim, P. Shah, S. Thakoor, H. Wu, A. Zeljić, et al. (2019) The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, pp. 443–452. Cited by: §1, §1.
  • V. Krizhevsky and G. Hinton (2014) The cifar-10 dataset.. Note: http://www.cs.toronto.edu/kriz/cifar.html Cited by: §1.
  • A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1.
  • Y. LeCun (1998) The mnist database of handwritten digits.. Note: http://yann.lecun.com/exdb/mnist/ Cited by: §1.
  • C. Liu, T. Arnon, C. Lazarus, C. Barrett, and M. J. Kochenderfer (2019) Algorithms for Verifying Deep Neural Networks. External Links: 1903.06758, Link Cited by: §1.
  • A. Lomuscio and L. Maganti (2017) An approach to reachability analysis for feed-forward relu neural networks. arXiv preprint arXiv:1706.07351. Cited by: §1.
  • G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio (2014) On the number of linear regions of deep neural networks. In Advances in neural information processing systems, pp. 2924–2932. Cited by: §1.
  • G. Optimization (2013) Gurobi optimizer 5.0. Gurobi: http://www. gurobi. com. Cited by: §5.
  • R. Pascanu, G. Montufar, and Y. Bengio (2013) On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098. Cited by: §1.
  • V. R. Royo, R. Calandra, D. M. Stipanovic, and C. Tomlin (2019) Fast neural network verification via shadow prices. arXiv preprint arXiv:1902.07247. Cited by: §1.
  • Y. Shoukry, P. Nuzzo, A. L. Sangiovanni-Vincentelli, S. A. Seshia, G. J. Pappas, and P. Tabuada (2018) SMC: Satisfiability Modulo Convex Programming. Proceedings of the IEEE 106 (9), pp. 1655–1679. External Links: Document, ISSN 0018-9219, 1558-2256, Link Cited by: §1, §3.1, §5.2.
  • D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer, A. Prakash, and T. Kohno (2018) Physical adversarial examples for object detectors. In 12th USENIX Workshop on Offensive Technologies (WOOT 18), Cited by: §1.
  • [26] R. P. Stanley An Introduction to Hyperplane Arrangements. pp. 90. Cited by: §3.1.
  • X. Sun, H. Khedr, and Y. Shoukry (2019) Formal verification of neural network controlled autonomous systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pp. 147–156. Cited by: §5.2, §5.2.
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1.
  • V. Tjeng, K. Xiao, and R. Tedrake (2017) Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356. Cited by: §1.
  • H. Tran, X. Yang, D. M. Lopez, P. Musau, L. V. Nguyen, W. Xiang, S. Bak, and T. T. Johnson (2020) NNV: the neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. arXiv preprint arXiv:2004.05519. Cited by: §1.
  • S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana (2018a) Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pp. 6367–6377. Cited by: §1, §1, §3, §5.1.
  • S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana (2018b) Formal security analysis of neural networks using symbolic intervals. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1599–1614. Cited by: §1, §3.2, §3, §4.
  • E. Wong and J. Z. Kolter (2017) Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851. Cited by: §1.
  • W. Xiang, H. Tran, and T. T. Johnson (2017) Reachable set computation and safety verification for neural networks with relu activations. arXiv preprint arXiv:1712.08163. Cited by: §1.
  • W. Xiang, H. Tran, and T. T. Johnson (2018) Output reachable set estimation and verification for multilayer neural networks. IEEE transactions on neural networks and learning systems 29 (11), pp. 5777–5783. Cited by: §1.
  • H. Zhang, T. Weng, P. Chen, C. Hsieh, and L. Daniel (2018)

    Efficient neural network robustness certification with general activation functions

    In Advances in neural information processing systems, pp. 4939–4948. Cited by: §3.2.