1. Introduction
Datadriven control systems, especially neuralnetworkbased controllers (Mnih et al., 2015; Lillicrap et al., 2016; Pan et al., 2018), have recently become the subject of intense research and demonstrated great promises. Formally verifying the safety of these systems however still remains an open problem. A NeuralNetwork Controlled System (NNCS) is essentially a continuous system controlled by a neural network, which produces control inputs at the beginning of each control step based on the current values of the state variables and feeds them back to the continuous system. Reachability of continuous or hybrid dynamical systems with traditional controllers has been extensively studied in the last decades. It has been proven that reachability of most nonlinear systems is undecidable (Alur et al., 1995; Henzinger et al., 1998). Recent approaches mainly focus on the overapproximation of reachable sets (Dreossi et al., 2016; Lygeros et al., 1999; Frehse, 2005; Yang et al., 2016; Prajna and Jadbabaie, 2004; Huang et al., 2017a). The main difficulty impeding the direct application of these approaches to NNCS is the hardness of formally characterizing or abstracting the inputoutput mapping of a neural network.
Some recent approaches considered the problem of computing the output range of a neural network. Given a neural network along with a set of the inputs, these methods seek to compute an interval or a box (vector of intervals) that contains the set of corresponding outputs. These techniques are partly motivated by the study of robustness
(Dutta et al., 2018) of neural networks to adversarial examples (Szegedy et al., 2013). Katz et al. (2017)propose an SMTbased approach called Reluplex by extending the simplex algorithm to handle ReLU constraints.
Huang et al. (2017b) use a refinementbylayer technique to prove the absence or show the presence of adversarial examples around the neighborhood of a specific input. General neural networks with Lipschitz continuity are then considered by Ruan et al. (2018), where the authors show that a large number of neural networks are Lipschitz continuous and the Lipschitz constant can help in estimating the output range which requires solving a global optimization problem. Dutta et al. (2018)propose an efficient approach using mixed integer linear programming to compute the exact interval range of a neural network with only ReLU activation functions.
However, these existing methods cannot be directly used to analyze the reachability of dynamical systems controlled by neural networks. As the behavior of these systems is based on the interaction between the continuous dynamics and the neuralnetwork controller, we need to not only compute the output range but also describe the inputoutput mapping for the controller. More precisely, we need to compute a tractable function model whose domain is the input set of the controller and its output range contains the set of the controller’s outputs. We call such a function model a higherorder set, to highlight the distinction from intervals which are order sets. Computing a tractable function model from the original model can also be viewed as a form of knowledge distillation (Hinton et al., 2015) from the verification perspective, as the function model should be able to produce comparable results or replicate the outputs of the target neural network on specific inputs.
There have been some recent efforts on computing higherorder sets for the controllers in NNCS. Ivanov et al. (2018)
present a method to equivalently transform a system to a hybrid automaton by replacing a neuron in the controller with an ordinary differential equation (ODE). This method is however only applicable to differentiable neuralnetwork controllers – ReLU neural networks are thus excluded.
Dutta et al. (2019) use a flowpipe construction scheme to compute overapproximations for reachable set segments. A piecewise polynomial model is used to provide an approximation of the inputoutput mapping of the controller and an error bound on the approximation. This method is however limited to neural networks with ReLU activation functions. We will discuss technical features of these related works in more detail in Section 2 when we introduce the problem formally.Neural network controllers in practical applications could involve multiple types of activation functions (Beer et al., 1989; Lillicrap et al., 2016). The approaches discussed above for specific activation function may not be able to handle such cases, and a more general approach is thus needed.
In this paper, we propose a new reachability analysis approach for verifying NNCS with general neuralnetwork controllers called ReachNN based on Bernstein polynomial. More specifically, given an input space and a degree bound, we construct a polynomial approximation for a general neuralnetwork controller based on Bernstein polynomials. For the critical step of estimating the approximation error bound, inspired by the observation that most neural networks are Lipschitz continuous (Ruan et al., 2018), we present two techniques – a priori theoretical approach based on existing results on Bernstein polynomials and a posteriori approach based on adaptive sampling. By applying these two techniques together, we are able to capture the behavior of a neuralnetwork controller during verification via Bernstein polynomial with tight error bound estimation. Based on the polynomial approximation with the bounded error, we can iteratively compute an overapproximated reachable set of the neuralnetwork controlled system via flowpipes (Zhao, 1992). By the StoneWeierstrass theorem (De Branges, 1959), our Bernstein polynomial based approach can approximate most neural networks with different activation functions (e.g., ReLU, sigmoid, tanh) to arbitrary precision. Furthermore, as we will illustrate later in Section 3, the approximation error bound can be conveniently calculated.
Our paper makes the following contributions.

We proposed a Bernstein polynomial based approach to generate highorder approximations for the inputoutput mapping of general neuralnetwork controllers, which is much tighter than the interval based approaches.

We developed two techniques to analyze the approximation error bound for neural networks with different activation functions and structures based on the Lipschitz continuity of both the network and the approximation. One is based on the theory of Bernstein polynomials and provides a priori insight of the theoretical upper bound of the approximation error, while the other achieves a more accurate estimation in practice via adaptive sampling.

We demonstrated the effectiveness of our approach on multiple benchmarks, showing its capability in handling dynamical systems with various neuralnetwork controllers, including heterogeneous neural networks with multiple types of activation functions. For homogeneous networks, compared with stateoftheart approaches Sherlock and Verisig, our ReachNN approach can achieve comparable or even better approximation performance, albeit with longer computation time.
The rest of the paper is structured as follows. Section 2 introduces the system model, the reachability problem, and more details on the most relevant works. Section 3 presents our approach, including the construction of polynomial approximation and the estimation of error bound. Section 4 presents the experimental results. Section 5 provides further discussion of our approach and Section 6 concludes the paper.
2. Problem Statement
In this section, we describe the reachability of NNCS and a solution framework that computes overapproximations for reachable sets. In the paper, a set of ordered variables is collectively denoted by . For a vector , we denote its th component by .
A NNCS is illustrated in the Figure 1. The plant is the formal model of a physical system or process, defined by an ODE in the form of such that are the state variables and are the control inputs. We require that the function is Lipschitz continuous in and continuous in , in order to guarantee the existence of a unique solution of the ODE from a single initial state (see (Meiss, 2007)).
The controller in our system is implemented as a feedforward neural network, which can be defined as a function
that maps the values of to the control inputs . It consists of layers, where the first layers are referred as “hidden layers” and the th layer represents the network’s output. Specifically, we havewhere and for
are learnable parameters as linear transformations connecting two consecutive layers, which is then followed by an elementwise nonlinear activation function.
is the function mapping from the output of layer to the output layer such that is the output of layer . An illustration of a neural network is given in Fig 1.A NNCS works in the following way. Given a control time stepsize , at the time for , the neural network takes the current state as input, computes the input values for the next time step and feeds it back to the plant. More precisely, the plant ODE becomes in the time period of for . Notice that the controller does not change the state of the system but the dynamics. The formal definition of a NNCS is given as below.
Definition 2.1 (NeuralNetwork Controlled System).
A neuralnetwork controlled system (NNCS) can be denoted by a tuple , where denotes the state space whose dimension is the number of state variables, denotes the control input set whose dimension is the number of control inputs, defines the continuous dynamics , defines the input/output mapping of the neuralnetwork controller, is the control stepsize, and denotes the initial state set.
Notice that a NNCS is deterministic when the continuous dynamics function is Lipschitz continuous. The behavior of a NNCS can be defined by its flowmap. The flowmap of a system , is a function that maps an initial state to the state , which is the system state at the time from the initial state . Given an initial state , the flowmap has the following properties for all : (a) is the solution of the ODE with the initial condition in the time interval ; (b) .
We call a state reachable at time on a system , , if and only if there is some such that . Then, the set of all reachable states is called the reachable set of the system.
Definition 2.2 (Reachability Problem).
The reachability problem on a NNCS is to decide whether a given state is reachable or not at time .
In the paper, we focus on the problem of computing the reachable set for a NNCS. Since NNCSs are at least as expressive as nonlinear continuous systems, the reachability problem on NNCSs is undecidable. Although there are numerous existing techniques for analyzing the reachability of linear and nonlinear hybrid systems (Frehse et al., 2011; Chen et al., 2012; Kong et al., 2015; Duggirala et al., 2015; Althoff, 2015), none of them can be directly applied to NNCS, since equivalent transformation from NNCS to a hybrid automaton is usually very costly due to the large number of locations in the resulting automaton. Even an onthefly transformation may lead to a large hybrid automaton in general. Hence, we compute flowpipe overapproximations (or flowpipes) for the reachable sets of NNCS.
Similar to the flowpipe construction techniques for the reachability analysis of hybrid systems, we also seek to compute overapproximations for the reachable segments of NNCS. A continuous dynamics can be handled by the existing tools such as SpaceEx (Frehse et al., 2011) when it is linear, and Flow* (Chen et al., 2013) or CORA (Althoff, 2015) when it is nonlinear. The challenge here is to compute an accurate overapproximation for input/output mapping of the neuralnetwork controller in each control step, and we will do it in the following way.
Given a bounded input interval , we compute a model where such that for any , the control input belongs the set . is a function of , and denotes the box in each dimension. We provide a summary of the existing works which are close to ours.
Interval overapproximation. The methods described in (Ruan et al., 2018; Xiang and Johnson, 2018) compute intervals as neuralnetwork input/output relation and directly feed these intervals in the reachability analysis. Although they can be applied to more general neuralnetwork controllers, using interval overapproximation in the reachability analysis cannot capture the dependencies of state variables for each control step, and it is reported in (Dutta et al., 2018).
Exact neural network model. The approach presented in (Ivanov et al., 2018) equivalently transforms the neuralnetwork controller to a hybrid system, then the whole NNCS becomes a hybrid system and the existing analysis methods can be applied. The main limitations of the approach are: (a) the transformation could generate a model whose size is prohibitively large, and (b) it only works on neural networks with sigmoid and tanh activation functions.
Polynomial approximation with error bound. In (Dutta et al., 2019), the authors describe a method to produce higherorder sets for neuralnetwork outputs. It is the closest work to ours. In their paper, the approximation model is a piecewise polynomial over the state variables, and the error bound can be well limited when the degrees or pieces of the polynomials are sufficiently high. The main limitation of the method is that it only applies to the neural networks with ReLU activation functions.
3. Our Approach
Our approach exploits highorder set approximation of the output of general types neural network in NNCS reachability analysis via Bernstein polynomials and approximation error bound estimation. The reachable set of NNCS is overapproximated by a finite set of Taylor model flowpipes in our method. The main framework of flowpipe construction is presented in Algorithm 1. Given a NNCS and a bounded time horizon , the algorithm computes flowpipes, each of which is an overapproximation of the reachable set in a small time interval, and the union of the flowpipes is an overapproximation of the reachable set in the time interval of . As stated in Theorem 3.1, this fact is guaranteed by (a) the flowpipes are reachable set overapproximations for the continuous dynamics in each iteration, and (b) the set is an overapproximation of the neuralnetwork controller output. Notice that this framework is also used in (Dutta et al., 2019). However, we present a technique to compute for more general neural networks.
Let us briefly revisit the technique of computing Taylor model flowpipes for continuous dynamics.
Taylor model. Taylor models are introduced to provide higherorder enclosures for the solutions of nonlinear ODEs (see (Berz and Makino, 1998)), and then extended to overapproximate reachable sets for hybrid systems (Chen et al., 2012) and solve optimization problems (Makino and Berz, 2005). A Taylor Model (TM) is denoted by a pair where is a (vectorvalued) polynomial over a set of variables and is an (vectorvalued) interval. A continuous function can be overapproximated by a TM over a domain in the way that for all . When the approximation is close to , can be made very small.
TM flowpipe construction. The technique of TM flowpipe construction is introduced to compute overapproximations for the reachable set segments of continuous dynamics. Given an ODE , an initial set and a time horizon , the method computes a finite set of TMs such that for each , the flowpipe contains the reachable set in the time interval of , and , i.e., the union of the flowpipes is an overapproximation of the reachable set in the given time horizon. The standard TM flowpipe construction is described in (Berz and Makino, 1998), and it is adapted in (Chen, 2015; Chen and Sankaranarayanan, 2016) for better handling the dynamical systems.
The main contribution of our paper is a novel approach to compute a higherorder overapproximation for the output set of neural networks with a more general form of activation functions, including such as ReLU, sigmoid, and tanh. We show that this approach can generate accurate reachable set overapproximations for NNCSs, in combination with the TM flowpipe construction framework. Our main idea can be described as follows.
Given a neuralnetwork controller with a single output, we assume that its input/output mapping is defined by a function and its input interval is defined by a set . In the th (control) step, we seek to compute a TM such that
(1) 
Hence, the TM is an overapproximation of the neural network output. In the paper, we compute as a Bernstein polynomial with bounded degrees. Since is a continuous function that can be approximated by a Bernstein polynomial to arbitrary precision, according to the StoneWeierstrass theorem (De Branges, 1959), we can always ensure that such exists.
In our reachability computation, the set is given by a TM flowpipe. To obtain a TM for the neural network output set, we use TM arithmetic to evaluate a TM for with . Then, the polynomial part of the resulting TM can be viewed as an approximation of the mapping from the initial set to the neural network output, and the remainder contains the error. Such representation can partially keep the variable dependencies and much better limit the overestimation accumulation than the methods that purely use interval arithmetic.
Theorem 3.1 ().
Remark 1 ().
In our framework, Taylor models can also be considered as a candidate of highorder approximation for a neural network’s inputoutput mapping. However, comparing with Bernstein polynomial based approach adopted in this paper, Taylor models suffer from two main limitations: (1) The validity of Taylor models relies on the function differentiability, while ReLU neural networks are not differentiable. Thus Taylor models cannot handle a large number of neural networks; (2) There is no theoretical upper bound estimation for Taylor models, which further limits the rationality of using Taylor models.
3.1. Bernstein Polynomials for Approximation
Definition 3.2 (Bernstein Polynomials).
Let and be a function of over . The polynomials
are called the Bernstein polynomials of under the degree .
We then construct over by a series of linear transformation based on Bernstein polynomials. Assume that . Let , where
and
(2) 
It is easy to see that is defined over . For , let be the Bernstein polynomials of . We construct the polynomial approximation for as:
(3) 
When we want to compute a Bernstein polynomial over a noninterval domain , we may consider an interval enclosure of , since we only need to ensure that the polynomial is valid on the domain and it is sufficient to take its superset. Hence, in Algorithm 1, the Bernstein polynomial(s) in are computed based on an interval enclosure of .
3.2. Approximation Error Estimation
After we obtain the approximation of the neural network controller, a certain question is how to estimate a valid bound for approximation error such that Theorem 3.1 holds. Namely, from any given initial state set , the reachable set of the perturbed system , at any time is a superset of the one of the NNCS with ODE . A sufficient condition can be derived based on the theory of differential inclusive (Smirnov, 2002):
Lemma 3.3 ().
Given any state set , let be the polynomial approximation of with respect to the degree defined as Equation (3). For any time , the reachable set of the perturbed system , is a superset of the one of the NNCS with ODE from , if
(4) 
Intuitively, the approximation error interval has a significant impact on the reachable set overapproximation, namely a tighter can lead to a more accurate reachable set estimation. In this section, we will introduce two approaches to estimate , namely theoretical error estimation and samplingbased error estimation. The former gives us a priori insight of how precise the approximation is, while the latter one helps us to obtain a much tighter error estimation in practice.
Compute a Lipschitz constant. We start from computing the Lipschitz constant of a neural network, since Lipschize constant plays a key role in both of our two approaches, which we will see later.
Definition 3.4 ().
A realvalued function is called Lipschitz continuous over , if there exists a nonnegative real , such that for any :
Any such is called a Lipschitz constant of over .
Recent work has shown that a large number of neural networks are Lipschitz continuous, such as the fullyconnected neural networks with ReLU, sigmoid, and tanh activation functions and the estimation of Lipschitz constant upper bound for a neural network has been preliminary discussed in (Ruan et al., 2018; Szegedy et al., 2013).
Lemma 3.5 (Lipschitz constant for sigmoid/tanh/ReLU (Ruan et al., 2018; Szegedy et al., 2013)).
Convolutional or fully connected layers with the sigmoid activation function , hyperbolic tangent (tanh) activation function , and ReLU activation function have , , as their Lipschitz constants, respectively.
Based on Lemma 3.5, we further improve the Lipschitz constant upper bound estimation. Specifically, we consider a layer of neural network with neurons shown in Figure 2, where and denote the weight and the bias that are applied on the output of the previous layer. Input Interval and Output Interval denote the variable space before and after applied by the activation functions of this layer, respectively. Assume be the Input Interval.
We first discuss layers with sigmoid/tanh activation functions based on the following conclusion:
Lemma 3.6 ().
(Royden, 1968) Given a function , if over , then is Lipschitz continuous and is a Lipschitz constant.
Sigmoid. For a layer with sigmoid activation function with and , we have
(5)  
Hyperbolic tangent. For a layer with hyperbolic tangent activation function with and , we have
(6)  
For ReLU networks, we try to derive a Lipschitz constant directly based on its definition:
ReLU. For a layer with ReLU activation function with and , we have
(7) 
In Algorithm 2, we first do the initialization (line 13) by letting the variable denoting Lipschitz constant . Note that ) denotes the input interval and output interval of layer , as shown in Figure 2. For convenience, we let be the input space . Then we do the layerbylayer interval analysis and compute the corresponding Lipschitz constant (line 410). For each layer , the function ComputeInputInterval is first invoked to compute the (line 6). Then the Lipschitz constant of layer is evaluated by the function ComputeLipchitzConstant, namely Equation (5), (6), (7) in terms of the activation function of this layer (line 7). is updated to the current layer by multiplying (line 8). Finally, is computed by the function ComputeOutputInterval (line 9). Note that the implementation of ComputeInputInterval and ComputeOutputInterval is a typical interval analysis problem of neural networks, which have been adequately studied in (Dutta et al., 2018; Xiang and Johnson, 2018; Ruan et al., 2018). We do not go into details here due to the space limit.
Naive theoretical error (Terror) estimation. After obtaining a Lipschize constant of , we can directly leverage the existing result on Bernstein polynomials for Lipschitz continuous functions to derive the error of our polynomial approximation.
Lemma 3.7 ().
(Lorentz, 2013) Assume is a Lipschitz continuous function of over with a Lipschitz constant . Let and be the Bernstein polynomials of under the degree . Then we have
(8) 
Theorem 3.8 (TError Estimation).
Proof.
Adaptive samplingbased error (Serror) estimation. While Theorem 3.8 can help derive an approximation error bound easily, such bounds are often overconservative in practice. Thus we propose an alternative samplingbased approach to estimate the approximation error. For a given box , we perform a gridbased partition based on an integer vector . That is, we partition into a set of boxes , where is the abbreviation for , and for any ,
It is easy to see that the largest error bound of all the boxes is a valid error bound over .
Lemma 3.9 ().
Assume is a continuous function of over . Let be the polynomial approximation of that is defined as Equation (3) with the degree . Let be the box partition of in terms of , and be the approximation error bound of over the box . We then have ,
Leveraging the Lipschitz continuity of , we can estimate the local error bound for box by sampling the value of and at the box center.
Lemma 3.10 ().
Assume is a Lipschitz continuous function of over box with a Lipschitz constant . Let be the polynomial approximation of that is defined as Equation (3) with the degree . Let
(10) 
be the center of . For , we have
(11) 
Proof.
By (Brown et al., 1987), we know the Bernsteinbased approximation is also Lipschitz continuous with the Lipschitz constant . Then for , we have
Theorem 3.11 (SError Estimation).
Theorem 3.12 (Convergence of ).
Assume is a Lipschitz continuous function of over with a Lipschitz constant . Let be the polynomial approximation of that is defined as Equation (3) with the degree . Let be the exact error, we have
Proof.
Let , we have
Since , when , the theorem holds. ∎
Note that actually specifies the difference between the exact error and Serror. Thus we call it sampling error precision.
[Adaptive sampling] By Theorem 3.12, we can always make arbitrarily small by increasing . Then the error bound is mainly determined by the sample difference over . Thus, if our polynomial approximation regresses well, we can expect to obtain a tight error bound estimation. To bound the impact of , we set a hyper parameter as its upper bound. Specifically, given a box , we can adaptively change to sample fewer times while ensuring .
Proposition 0 ().
Given a box and a positive number , let
Then we have .
In practice, we can directly use as by specifying a small , since Serror is more precise. It is worthy noting that a small may lead to a large number of sampling points and thus can be time consuming. In our implementation, we mitigate this runtime overhead by computing the sampling errors at for each in parallel.
4. Experiments
We implemented a prototype tool and used it in cooperation with Flow*. In our experiments, we first compare our approach ReachNN with the interval overapproximation approach on an example with a heterogeneous neuralnetwork controller that has ReLU and tanh as activation functions. Then we compare our approach with Sherlock (Dutta et al., 2019) and Verisig (Ivanov et al., 2018) on multiple benchmarks collected from the related works. All our experiments were run on a desktop, with 12core 3.60 GHz Intel Core i7 ^{1}^{1}1 The experiments are not memory bounded..
4.1. Illustrating Example
Consider the following nonlinear control system (Gallestey and Hokayem, 2019):
where is computed from a heterogeneous neuralnetwork controller that has two hidden layers, twenty neurons in each layer, and ReLU and tanh as activation functions. Given a control stepsize , we hope to verify whether the system will reach from the initial set . Since the stateofart verification tools focus on NNCSs with singletype activation function and interval overapproximation is the only presented work that can be extended to heterogeneous neural networks, we compare our method with interval overapproximation in the illustrating example.
Figure (a)a shows the result of the reachability analysis with interval overapproximation, while ReachNN’s result is shown in Figure (b)b. Red curves denote the simulation results of the system evolution from different initial states, which are randomly sampled from the initial set. Green rectangles are the constructed flowpipes as the overapproximation of the reachable state set. The blue rectangle is the goal area. We conduct the reachability analysis from the initial set by combining our neural network approximation approach with Flow*. As shown in Figure (a)a, interval overapproximation based approach yields overloose reachable set approximation, which makes the flowpipes grow uncontrollably and quickly exceeds the tolerance ( steps). On the contrary, ReachNN can provide a much tighter estimation for all control steps and thus successfully prove the reachability property.
4.2. Benchmarks
#  ODE  V  Init  Goal  

1  , .  2  0.2  ,  , 
2  , .  2  0.2  ,  , 
3  , .  2  0.1  ,  , 
4  , ,  3  0.1  ,  , 
5  , ,  3  0.2  , ,  , 
6  , , ,  4  0.5  , , ,  , 
#  NN Controller  ReachNN  Sherlock(Dutta et al., 2019)  Verisig  
Act  layers  n  d  ifReach  time  d  ifReach  time  ifReach  time  
1  ReLU  3  20  0.001  0.0009995  Yes(35)  3184  2  Yes(35)  41  –  –  
sigmoid  3  20  0.001  Yes(35)  779  –  –  –  Unknown(22)  –  
tanh  3  20  0.005  Unknown(35)  –  –  –  –  Unknown(22)  –  
ReLU+tanh  3  20  0.01  Yes(35)  589  –  –  –  –  –  
2  ReLU  3  20  0.01  Yes(9)  128  2  Yes(9)  3  –  –  
sigmoid  3  20  0.01  Yes(9)  280  –  –  –  Unknown(7)  –  
tanh  3  20  0.01  Unknown(7)  –  –  –  –  Unknown(7)  –  
ReLU+tanh  3  20  0.001  Yes(9)  543  –  –  –  –  –  
3  ReLU  3  20  0.01  Yes(60)  982  2  Yes(60)  139  –  –  
sigmoid  3  20  0.005  Yes(60)  1467  –  –  –  Yes(60)  27  
tanh  3  20  0.01  Yes(60)  1481  –  –  –  Yes(60)  26  
ReLU+tanh  3  20  0.01  Unknown(60)  –  –  –  –  –  –  
4  ReLU  3  20  0.005  Yes(5)  396  2  Yes(5)  19  –  –  
sigmoid  3  20  0.01  Yes(10)  253  –  –  –  Yes(10)  7  
tanh  3  20 
Comments
There are no comments yet.