## 1 Introduction

Deep neural networks (DNNs) have quickly become one of the most widely used tools for dealing with complex and challenging problems in numerous domains, such as image classification [krizhevsky2012imagenet, cirecsan2012multi, gatys2016image], function approximation, and natural language translation [collobert2008unified, goldberg2016primer]. Recently, DNNs have been used in safety-critical cyber-physical systems (CPS), such as autonomous vehicles [wu2017squeezedet, chen2015deepdriving, bojarski2016end] and air traffic collision avoidance systems [julian2016policy]. Although utilizing DNNs in safety-critical applications can demonstrate considerable performance benefits, assuring the safety and robustness of these systems is challenging because DNNs possess complex non-linear characteristics. Moreover, it has been demonstrated that their behavior can be unpredictable due to slight perturbations in their inputs (i.e., adversarial perturbations) [szegedy2013intriguing].

In this paper, we introduce the NNV (Neural Network Verification) tool, which is a software framework that performs set-based verification for DNNs and learning-enabled CPS, known colloquially as neural network control systems (NNCS) as shown in Figure 2^{1}^{1}1The source code for NNV is publicly available: https://github.com/verivital/nnv/. A CodeOcean capsule is also available: https://doi.org/10.24433/CO.1314285.v1, which will be updated with a new DOI and the latest reproducibility results if accepted. The latest version of the CodeOcean capsule with all aspects described in this paper is available at: https://codeocean.com/capsule/1314285/, which requires a username (taylor.johnson@uta.edu) and password (cav2020ae) to access. This account has read-only permission, so to rerun the results shown in the capsule, you can select Capsule then Duplicate from the menu bar, which will clone the capsule to allow rerunning and editing if desired. Detailed instructions for the artifact evaluation are available at: https://github.com/verivital/run_nnv_comparison/blob/cav2020/README_AE.md.
NNV provides a set of reachability algorithms that can compute both the exact and over-approximate reachable sets of DNNs and NNCSs using a variety of set representations such as polyhedra [tran2019parallel, xiang2018specification, xiang2017reachable, xiang2018reachable, xiang2018output], star sets [tran2019fm, tran2019emsoft, tran2019safe, manzanas2019arch], zonotopes [singh2018fast], and abstract domain representations[singh2019abstract].
The reachable set obtained from NNV contains all possible states of a DNN from bounded input sets or of a NNCS from sets of initial states of a plant model.
NNV declares a DNN or a NNCS to be safe if, and only if, their reachable sets do not violate safety properties (i.e., have a non-empty intersection with any state satisfying the negation of the safety property).
If a safety property is violated, NNV can construct a complete set of counter-examples demonstrating the set of all possible unsafe initial inputs and states by using the star-based exact reachability algorithm [tran2019fm, tran2019emsoft].
To speed up computation, NNV uses parallel computing, as the majority of the reachability algorithms in NNV are more efficient when executed on multi-core platforms and clusters.

NNV has been successfully applied to safety verification and robustness analysis of several real-world DNNs, primarily feedforward neural networks (FFNNs) and convolutional neural networks (CNNs), as well as learning-enabled CPS. To highlight NNV’s capabilities, we present brief experimental results from two case studies. The first compares methods for safety verification of the ACAS Xu networks

[julian2016policy], and the second presents safety verification of a learning-based adaptive cruise control (ACC) system.## 2 Overview and Features

NNV is an object-oriented toolbox written in Matlab, which was chosen in part due to the prevalence of Matlab/Simulink in the design of CPS. It uses the MPT toolbox [kvasnica2004multi] for polytope-based reachability analysis and visualization [tran2019parallel], and makes use of CORA [althoff2015introduction] for zonotope-based reachability analysis of nonlinear plant models [tran2019emsoft]

. NNV also utilizes the Neural Network Model Transformation Tool (NNMT) for transforming neural network models from Keras and Tensorflow into Matlab using the Open Neural Network Exchange (ONNX) format, and the Hybrid Systems Model Transformation and Translation tool (HyST)

[bak2015hyst] for plant configuration.The NNV toolbox contains two main modules: a *computation engine* and an *analyzer*, shown in Figure 1.
The computation engine module consists of four subcomponents: 1) the *FFNN constructor*, 2) the *NNCS constructor*, 3) *the reachability solvers*, and 4) *the evaluator*.
The FFNN constructor takes a network configuration file as an input and generates a FFNN object.
The NNCS constructor takes the FFNN object and the plant configuration, which describes the dynamics of a system, as inputs and then creates an NNCS object.
Depending on the application, either the FFNN (or NNCS) object will be fed into a reachability solver to compute the reachable set of the FFNN (or NNCS) from a given initial set of states.
Then, the obtained reachable set will be passed to the analyzer module.
The analyzer module consists of three subcomponents: 1) a *visualizer*, 2) a *safety checker*, and 3) a *falsifier*.
The visualizer can be called to plot the obtained reachable set.
Given a safety specification, the safety checker can reason about the safety of the FFNN or NNCS with respect to the specification.
When an exact (sound and complete) reachability solver is used, such as the star-based solver, the safety checker can return either ”safe,” or ”unsafe” along with a set of counterexamples.
When an over-approximate (sound) reachability solver is used, such as the zonotope-based scheme or the approximate star-based solvers, the safety checker can return either ”safe” or ”*uncertain*” (unknown).
In this case, the falsifier automatically calls the evaluator to generate simulation traces to find a counterexample.
If the falsifier can find a counterexample, then NNV returns unsafe.
Otherwise, it returns unknown.
A summary of NNV’s major features is given in Table 1.

Feature | Exact Analysis | Over-approximate Analysis |

Components | FFNN, CNN, NNCS | FFNN, CNN, NNCS |

Plant dynamics (for NNCS) | Linear ODE | Linear ODE, Nonlinear ODE |

Discrete/Continuous (for NNCS) | Discrete Time | Discrete Time, Continuous Time |

Activation functions | ReLU, Satlin | ReLU, Satlin, Sigmoid, Tanh |

CNN Layers | MaxPool, Conv, BN, AvgPool, FC | MaxPool, Conv, BN, AvgPool, FC |

Reachability methods | Star, Polyhedron, ImageStar | Star, Zonotope, Abstract-domain, ImageStar |

Reachable set/Flow-pipe Visualization | Yes | Yes |

Parallel computing | Yes | Partially supported |

Safety verification | Yes | Yes |

Falsification | Yes | Yes |

Robustness verification (for FFNN/CNN) | Yes | Yes |

Counterexample generation | Yes | Yes |

Overview of major features available in NNV. Links refer to relevant files/classes in the NNV codebase. BN refers to batch normalization layers, FC to fully-connected layers, AvgPool to average pooling layers, Conv to convolutional layers, and MaxPool to max pooling layers.

## 3 Set Representations and Reachability Algorithms

NNV implements a set of reachability algorithms for *sequential* FFNNs and CNNs, as well as NNCS with FFNN controllers as shown in Figure 2.
The reachable set of a sequential FFNN is computed layer-by-layer.
The output reachable set of a layer is the input set of the next layer in the network.

### 3.1 Polyhedron [tran2019parallel]

The polyhedron reachability algorithm computes the exact polyhedron reachable set of a FFNN with ReLU activation functions. The exact reachability computation of layer

in a FFNN is done as follows. First, we construct the affine mapping of the input polyhedron set , using the weight matrixand the bias vector

, i.e., . Then, the exact reachable set of the layer is constructed by executing a sequence of stepReLU operations, i.e., . Since a operation can split a polyhedron into two new polyhedra, the exact reachable set of a layer in a FFNN is usually a union of polyhedra. The polyhedron reachability algorithm is computationally expensive because computing affine mappings with polyhedra is costly. Additionally, when computing the reachable set, the polyhedron approach extensively uses the expensive conversion between the H-representation and the V-representation. These are the main drawbacks that limit the scalability of the polyhedron approach. Despite that, we extend the polyhedron reachability algorithm for NNCSs with FFNN controllers. However, the propagation of polyhedra in NNCS may lead to a large degree of conservativeness in the computed reachable set [tran2019emsoft].### 3.2 Star Set [tran2019fm, tran2019emsoft] (code)

The star set is an efficient set representation for simulation-based verification of large linear systems [bak2017simulation, bak2019numerical, tran2019formats] where the superposition property of a linear system can be exploited in the analysis. It has been shown in [tran2019fm] that the star set is also suitable for reachability analysis of FFNNs. In contrast to polyhedra, the affine mapping and intersection with a half space of a star set is more easily computed. NNV implements an enhanced version of the exact and over-approximate reachability algorithms for FFNNs proposed in [tran2019fm] by minimizing the number of LP optimization problems that need to be solved in the computation. The exact algorithm that makes use of star sets is similar to the polyhedron method that makes use of operations. However, it is much faster and more scalable than the polyhedron method because of the advantage that star sets have in affine mapping and intersection. The approximate algorithm obtains an over-approximation of the exact reachable set by approximating the exact reachable set after applying an activation function, e.g., ReLU, Tanh, Sigmoid. We refer readers to [tran2019fm] for a detailed discussion of star-set reachability algorithms for FFNNs.

We note that NNV implements enhanced versions of earlier star-based reachability algorithms [tran2019fm]

. Particularly, we minimize the number of linear programming (LP) optimization problems that must be solved in order to construct the reachable set of a FFNN by quickly estimating the ranges of all of the states in the star set using only the ranges of the predicate variables. Additionally, the extensions of the star reachability algorithms to NNCS with linear plant models can eliminate the explosion of conservativeness in the polyhedron method

[tran2019emsoft, tran2019safe]. The reason behind this is that in star sets, the relationship between the plant state variables and the control inputs is preserved in the computation since they are defined by a unique set of predicate variables. We refer readers to [tran2019emsoft, tran2019safe] for a detailed discussion of the extensions of the star-based reachability algorithms for NNCSs with linear/nonlinear plant models.### 3.3 Zonotope [singh2018fast] (code)

NNV implements the zonotope reachability algorithms proposed in [singh2018fast] for FFNNs. Similar to the over-approximate algorithm using star sets, the zonotope algorithm computes an over-approximation of the exact reachable set of a FFNN. Although the zonotope reachability algorithm is very fast and scalable, it produces a very conservative reachable set in comparison to the star set method as shown in [tran2019fm]. Consequently, zonotope-based reachability algorithms are usually only more efficient for very small input sets. As an example it can be more suitable for robustness certification.

### 3.4 Abstract Domain [singh2019abstract]

NNV implements the abstract domain reachability algorithm proposed in [singh2019abstract] for FFNNs. NNV’s abstract domain reachability algorithm specifies an abstract domain as a star set and uses a “back-tracking” approach to estimate the *over-approximate ranges* of the states. The abstract domain is more conservative than the star set method.

### 3.5 ImageStar Set (code)

NNV recently introduced a new set representation called the ImageStar for use in the verification of deep convolutional neural networks (CNNs). Briefly, the ImageStar is a generalization of the star set where the anchor and generator vectors are replaced by multi-channel images. The ImageStar is efficient in the analysis of convolutional layers, average pooling layers, and fully connected layers, whereas max pooling layers and ReLU layers consume most of computation time in reachability analysis of CNNs. NNV implements exact and over-approximate reachability algorithms using the ImageStar for serial CNNs. Since the ImageStar method has not been published yet, we defer its evaluation in our experimental evaluation. In short, using the ImageStar, we can analyze the robustness under adversarial attacks of the real-world VGG16 and VGG19 deep perception networks [simonyan2014very] that consist of million parameters.

## 4 Evaluation

The experiments presented in this section were performed on a desktop with the following configuration: Intel Core i7-6700 CPU 3.4GHz 8 core Processor, 64 GB Memory, and 64-bit Ubuntu 16.04.3 LTS OS.

### 4.1 Safety verification of ACAS Xu networks

We evaluate NNV in comparison to Reluplex [katz2017reluplex], Marabou [katz2019marabou], and ReluVal [shiqi2018reluval], by considering the verification of safety property , and of the ACAS Xu neural networks [julian2016policy] for all networks.^{2}^{2}2We omitted properties and for space and due to their long runtimes, but they can be reproduced in the artifact evaluation if desired.
All the experiments were done using cores for the computation.
The verification results are summarized in Table 2 where (SAT) denotes that the networks are safe, (UNSAT) denotes unsafe, and (UNK) denotes unknown.
We note that (UNK) may occur due to the conservativeness of the reachability analysis scheme.
Detailed verification results are presented in the appendix.
For a fast comparison with other tools, we also tested a subset input of Property 1-4 on all the 45 networks.
The results are also shown in the appendix.
We note that the polyhedron method [tran2019parallel] achieves a timeout on most of networks, and therefore, we neglect this method in the comparison.

ACAS XU | SAT | UNSAT | UNK | TIMEOUT | TIME(s) | ||

1h | 2h | 10h | |||||

Reluplex | 3 | 42 | 0 | 2 | 0 | 0 | 28454 |

Marabou | 3 | 42 | 0 | 1 | 0 | 0 | 19466 |

Marabou DnC | 3 | 42 | 0 | 3 | 3 | 1 | 111880 |

ReluVal | 3 | 42 | 0 | 0 | 0 | 0 | 416 |

Zonotope | 0 | 2 | 43 | 0 | 0 | 0 | 3 |

Abstract Domain | 0 | 10 | 35 | 0 | 0 | 0 | 72 |

NNV Exact Star | 3 | 42 | 0 | 0 | 0 | 0 | 1371 |

NNV Appr. Star | 0 | 29 | 16 | 0 | 0 | 0 | 52 |

ACAS XU | |||||||

Reluplex | 3 | 42 | 0 | 0 | 0 | 0 | 11880 |

Marabou | 3 | 42 | 0 | 0 | 0 | 0 | 8470 |

Marabou DnC | 3 | 42 | 0 | 2 | 2 | 0 | 25110 |

ReluVal | 3 | 42 | 0 | 0 | 0 | 0 | 27 |

Zonotope | 0 | 1 | 44 | 0 | 0 | 0 | 5 |

Abstract Domain | 0 | 0 | 45 | 0 | 0 | 0 | 7 |

NNV Exact Star | 3 | 42 | 0 | 0 | 0 | 0 | 470 |

NNV Appr. Star | 0 | 32 | 13 | 0 | 0 | 0 | 19 |

Verification time. For property , our exact-star method is about faster than Reluplex, faster than Marabou, faster than Marabou-DnC (i.e., divide and conquer method). The approximate star method is faster than Reluplex, faster than Marabou, faster than Marabou-DnC, and faster than ReluVal. For property , our exact-star method is faster than Reluplex, faster than Marabou, faster than Marabou-DnC, while the approximate star method is faster than Reluplex, faster than Marabou, faster than Marabou-DnC.

Conservativeness. The approximate star method is much less conservative than the zonotope and abstract domain methods. This is illustrated since it can verify more networks than the zonotope and abstract domain methods, and is because it obtains a tighter over-approximate reachable set. For property , the zonotope and abstract domain methods can prove the safety of networks, () and networks, () respectively, while our approximate star method can prove the safety of networks, ( ). For property , the zonotope and abstract domain method can prove the safety of networks, () and networks, () respectively while the approximate star method can prove the safety of , ().

### 4.2 Safety Verification of Adaptive Cruise Control System

To illustrate how NNV can be used to verify/falsify safety properties of learning-enabled CPS, we analyze a learning-based ACC system depicted in Figure 3, in which the ego vehicle has a radar sensor to measure the distance to the lead vehicle in the same lane, , as well as the relative velocity of the lead vehicle, . The ego vehicle has two control modes. In speed control mode, it travels at a driver-specified set speed , and in spacing control mode, it maintains a safe distance from the lead vehicle, . We train a neural network with layers, neurons per layer utilizing the ReLU activation function to control the ego vehicle with a control period of seconds.

We investigate safety of the learning-based ACC system with two types of plant dynamics: 1) a discrete linear plant, and 2) a nonlinear continuous plant governed by the following differential equations:

where , and are the position, velocity and acceleration of the lead (ego) vehicle respectively. is the acceleration control input applied to the lead (ego) vehicle, and is a friction parameter. To obtain a discrete linear model of the plant, we let and discretize the corresponding linear continuous model using a zero-order hold on the inputs with a sample time of seconds (i.e., the control period).

Verification Problem. The scenario we are interested in is when the two vehicles are operating at a safe distance between them and the ego vehicle is in speed control mode. In this state the lead vehicle driver suddenly decelerates with to reduce the speed. We want to verify if the neural network controller on the ego vehicle will also de-accelerate to maintain a safe distance between the two vehicles. To guarantee safety, we require that where seconds and . Our analysis investigates if the safety requirement holds in the seconds after the lead vehicle decelerates. We consider the safety of the system under the following initial conditions: , , , , .

Verification results.
For linear dynamics, NNV can compute both the exact and over-approximate reachable sets of the ACC system in bounded time steps, while for nonlinear dynamics, NNV constructs an over-approximation of the exact reachable sets and uses it for safety verification.
The verification results for linear and nonlinear models using the over-approximate star method are presented in Table 3, which shows that, the safety of the ACC system depends on the initial velocity of the lead vehicle.
When the initial velocity of the lead vehicle is smaller than , the ACC system with the discrete plant model is unsafe.
Using the exact star method, NNV can construct a *complete* set of counter-example inputs.
When the over-approximate star method is used, if there is a potential safety violation, NNV simulates the system with random inputs from the input set to find counter examples.
If a counterexample is found, the system is *UNSAFE*, otherwise, NNV returns a safety result of *UNKNOWN*.
Figure 4 visualizes the reachable sets of the relative distance between two vehicles versus the required safe distance over time for two cases of initial velocities of the lead vehicle: and .
We can see that in the first case, for all time steps stating that the system is safe.
In the second case, in some control steps which means that the system is unsafe.
NNV supports a *reachLive* method to perform analysis and reachable set visualization on-the-fly to help the user observe the behavior of the system during verification.

The verification results for ACC system with the nonlinear model are all , which is surprising. Since the neural network controller of the ACC system was trained with the linear model, it works quite well for the linear model. However, when a small friction term is added to the linear model to form a nonlinear model, the neural network controller’s performance, in terms of safety, is significantly reduced. This problem raises an important issue in training neural network controllers using simulation data, and these schemes may not work in real systems since there is always a mismatch between the plant model in the simulation engine and the real system.

v_lead(0) | Linear Plant | Nonlinear Plant | ||

[29, 30] | SAFE | UNSAFE | ||

[28, 29] | SAFE | UNSAFE | ||

[27, 28] | SAFE | UNSAFE | ||

[26, 27] | UNSAFE | UNSAFE | ||

[25, 26] | UNSAFE | UNSAFE | ||

[24, 25] | UNSAFE | UNSAFE |

Verification times. As shown in Table 3, the approximate analysis of the ACC system with discrete linear plant model is very fast. It can be done in seconds. We note that NNV also supports exact analysis, which is computationally expensive since it constructs all reachable sets of the system. Because there are splits in the reachable sets of the neural network controller, the number of star sets in the reachable set of the plant increases quickly over time [tran2019emsoft]. In contrast, the over-approximate method computes the interval hull of all reachable star sets at each time step. It maintains a single reachable set of the plant throughout the computation. Therefore, the over-approximate method is much faster than the exact method. In terms of plant models, the nonlinear model requires more computation time than the linear one. As shown in Table 3, the verification for linear model using the over-approximate method is faster on average than the verification of the nonlinear model.

## 5 Related Work

NNV was inspired by many insightful research works in the emerging fields of neural network and machine learning verification. For the “open-loop” verification problem (verification of DNNs), many efficient techniques have been proposed, such as SMT-based methods

[pulina2010abstraction, katz2017reluplex, katz2019marabou], mixed-integer linear programming methods [lomuscio2017approach, kouvaros2018formal, dutta2017output], set-based methods[xiang2018output, wang2018formal, wang2018efficient, gehr2018ai, singh2018fast, singh2019abstract, anderson2019pldi], and optimization methods [weng2018towards, zhang2018efficient]. For the “closed-loop” verification problem (NCCS verification), we note that the Verisig approach [ivanov2018verisig] is very efficient for NNCS with nonlinear plants and with Sigmoid and Tanh activation functions. Additionally, the recent regressive polynomial rule inference approach [souradeep2019] is very fast for the safety verification of NNCS with nonlinear plant models and ReLU activation functions. The satisfiability modulo convex (SMC) approach [sun2018formal] is also very promising for NNCS with discrete linear plants as it provides both soundness and completeness properties in verification. ReachNN [huang2019reachnn] is a recent approach that can efficiently control the conservativeness in the reachability analysis of NNCS with nonlinear plants and ReLU, Sigmoid and Tanh activation functions in the controller. In other learning-enabled systems, falsification and testing-based approaches [dreossi2019cav, tuncali2018simulation, dreossi2017compositional]have shown a significant promise in enhancing the safety of systems where perception components and neural network controllers interact with the physical world. Finally, there is significant related work in the domain of safe reinforcement learning

[alshiekh2018aaai, verma2018icml, fulton2019tacas, zhu2019pldi] and combining guarantees from NNV with those provided in these methods would be interesting to explore.## 6 Conclusion and Future Work

We have presented NNV, a toolbox for the verification of DNNs and learning-enabled CPS. Our tool provides a collection of reachability algorithms that can be used to verify the safety (and robustness) of real-world DNNs as well as learning-enabled CPS, such as the ACC system. Our method is comparable to existing methods such as Reluplex and Marabou when dealing with the open-loop verification problem. For closed-loop systems, NNV can compute the exact and over-approximate reachable sets of a NNCS with linear plant models. For a NNCS with a nonlinear plant, NNV can obtain an over-approximate reachable set and use it to verify the safety, but can also automatically falsify the system to construct/find counterexamples (using exact analysis) or randomized simulations (in over-approximate analysis).