# Learning Density Distribution of Reachable States for Autonomous Systems

State density distribution, in contrast to worst-case reachability, can be leveraged for safety-related problems to better quantify the likelihood of the risk for potentially hazardous situations. In this work, we propose a data-driven method to compute the density distribution of reachable states for nonlinear and even black-box systems. Our semi-supervised approach learns system dynamics and the state density jointly from trajectory data, guided by the fact that the state density evolution follows the Liouville partial differential equation. With the help of neural network reachability tools, our approach can estimate the set of all possible future states as well as their density. Moreover, we could perform online safety verification with probability ranges for unsafe behaviors to occur. We use an extensive set of experiments to show that our learned solution can produce a much more accurate estimate on density distribution, and can quantify risks less conservatively and flexibly comparing with worst-case analysis.

## Authors

• 8 publications
• 10 publications
• 1 publication
• 2 publications
• 17 publications
06/04/2021

### Negotiation-Aware Reachability-Based Safety Verification for AutonomousDriving in Interactive Scenarios

Safety assurance is a critical yet challenging aspect when developing se...
11/24/2020

### Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Safety is an important topic in autonomous driving since any collision m...
03/27/2013

### Predicting the Likely Behaviors of Continuous Nonlinear Systems in Equilibrium

This paper introduces a method for predicting the likely behaviors of co...
10/28/2020

### Data-driven prediction of multistable systems from sparse measurements

We develop a data-driven method, based on semi-supervised classification...
09/17/2017

### Safe & Robust Reachability Analysis of Hybrid Systems

Hybrid systems - more precisely, their mathematical models - can exhibit...
07/30/2021

### Towards the Unification and Data-Driven Synthesis of Autonomous Vehicle Safety Concepts

As safety-critical autonomous vehicles (AVs) will soon become pervasive ...
10/29/2019

### A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning

Real-world autonomous systems often employ probabilistic predictive mode...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Reachability analysis has been a central topic and the key for the verification of safety-critical autonomous systems. The majority of existing reachability approaches either compute worst-case reachable sets [chen2018hamilton, devonport2020data, chen2013flow, tran2020nnv, bansal2020deepreach, liu2019algorithms], or use Monte Carlo simulation to estimate the reachable states with probabilistic guarantees [devonport2020estimating, liebenwein2018sampling, lew2020sampling, devonport2020data]. There are several obvious disadvantages of such methods. 1). Worst-case methods, especially those for nonlinear systems, often over-approximate the reachable sets and produce very conservative results (due to convex set representations [kurzhanski2000ellipsoidal, girard2005reachability, duggirala2016parsimonious, meyer2019tira], wrap effects [bak2014reducing, althoff2008reachability], low-order approximation [chen2013flow, cyranka2017lagrangian]

, etc.), not to say they are usually very computationally expensive (also known as the curse of dimensionality). 2). Existing methods do not care about reachable state concentration and produce the same reachable state estimations for different distributions of initial states if those distributions share the same support

111The probabilistic guarantees of the sampling-based methods do rely on the form of the initial states distribution. However, the final reachable sets estimate is the same for different distributions with the same support.. That is, current approaches do not compute which states are more likely to be reached. For example, in Fig. 1

, the state density distribution of a Van der Pol oscillator evolves over time and concentrates on certain states (the highlighted part in the figure), even if the initial states are uniformly distributed.

If one uses an existing worst-case reachability algorithm, most likely the results in Fig. 1 (b)(c) will show almost the entire black region inside the highlighted part is reachable, as those methods often use convex sets to represent the reachable sets.

In this paper, we aim to tackle the above problems and propose a learning-based method that can compute the density distribution of the reachable states from given initial distributions. The state density is much more powerful than worst-case reachability and can better quantify risks. Our proposed method is based on the Liouville theorem [ehrendorfer1994liouville, nakamura2019density, chen2020optimal], which is from classical Hamiltonian mechanics and asserts that the state density distribution function (which is the measurement of state concentration) is constant along the trajectories of the system. Given an autonomous system that is locally Lipschitz continuous, the evolution of the state density (i.e. the density of state at time ) is characterized by a Liouville partial differential equation (PDE). We learn the state density as a neural network (NN) while respecting the laws of physics described by the nonlinear Liouville PDE, in a way that is similar to the physics-informed NN [raissi2019physics].

Furthermore, we make two major improvements to make the learned density NN suitable for verification. 1). Instead of learning for a fixed initial distribution , we learn the density concentration function which specifies the multiplicative change of density from any . 2). We use a single NN to jointly learn both the reachable state and its corresponding density. Moreover, we use Reachable Polyhedral Marching (RPM) [vincent2020reachable]

—an exact ReLU NN reachability tool—to parse our learned NN as linear mappings from input polyhedrons to output polyhedrons. Using such parsed polyhedrons, we can perform online forward and backward reachability analysis and get the range of density bounds

for each output polyhedron. Together, our method can perform online safety verification by computing the probability of safety (instead of a single yes or no answer) under various initial conditions, even with unknown system dynamics (i.e. a black-box system where we only have access to a simulator of ). In this way, it also has the potential to collect environmental data in run-time and update its distribution for online safety verification.

We conduct experiments on 10 different benchmarks covering systems from low dimension academic examples to high dimension black-box simulators equipped with either hand-crafted or NN controllers. Surprisingly, without using any ground-truth density data in the learning process, our approach can achieve up to 99.89% of error reduction in KL divergence with respect to the ground-truth value,

when compared to sampling-based methods like kernel density estimation and Gaussian Processes

. Moreover, our learned density concentration function can also be used for reachability distribution analysis. We show that in several systems, more than 90% of the states can actually just reside in a small region (less than 10% of the volume of the convex hull for those states) in the state space, which also points out the conservativeness of the worst-case reachability analysis methods in terms of quantifying risks. We also show that different initial distributions can lead to a drastic change in the safety probability, which can help in cases when unsafe is inevitable but can be designed to happen with a very low probability.

Our major contributions are: (1)

we are the first to provide an explicit probability density function for reachable states of dynamical systems characterized as ordinary differential equations; this density function can be used for online safety analysis

, (2) we propose the first data-driven method to learn the state density evolution and give accurate state density estimation for arbitrary (bounded support) initial density conditions without retraining or fine-tuning, and (3) we use a variety of examples to show the necessity to perform reachability distribution analysis instead of pure worst-case reachability, to flexibly and less conservatively quantify safety risks.

#### Related work

Reachability analysis, especially worst-case reachability using sampling-based methods or set propagation, has been studied for decades. The literature on reachability has been extensively studied in many surveys [chen2018hamilton, liu2019algorithms, arch, agha2018survey]. Here we only discuss a few closely related works.

Hamilton Jacobian PDE has been used to derive the exact reachable sets in [mitchell2005time, chen2018hamilton, bansal2020deepreach]. However, the HJ-PDE does not provide the density information. Many data-driven approaches can compute probabilistic reachable sets using scenario optimization [devonport2020estimating, xue2020pac], convex shapes  [liebenwein2018sampling, lew2020sampling, berndt2021data]

[allen2014machine, rasmussen2017approximation], kernel embedding [thorpe2020data][chakrabarty2020active], and Gaussian process [devonport2020data]. However, the probabilistic guarantees they provide are usually in the form that with enough samples, instead of the state density distribution. [fridovich2020confidence] estimates human state distribution using a probabilistic model but requires state discretization. [majumdar2014convex] uses the Liouville equation to maximize the backward reachable set for only polynomial system dynamics. In [abate2007probabilistic]

the authors compute the stochastic reachability by discretizing stochastic hybrid systems to Markov Chains (MC), then perform probabilistic analysis on the discretized MC. The closed-form expression of the probability requires integrals over the whole state space hence is computation-heavy and cannot be used for online safety check.

The closest to ours is [matavalam2020data]

where Perron-Frobenius and Koopman operators are learned from samples of trajectories. Then, the learned operators can be used to transform the moments of the distribution over time. The distribution at time

is then recovered from the transformed moments. However, as they use moments up to a specific order to represent a distribution, even if the learned operators are perfect, the estimation error of the distribution might not be zero. Also, such a moment-based method is hard to scale to large-dimensional systems. Recently, there is also a growing interest in studying the (worst-case) reachability of NN [katz2017reluplex, katz2019marabou, xiang2018output, yang2020reachability, vincent2020reachable] or systems with NN controllers [ivanov2019verisig, tran2020nnv, fan2020reachnn, hu2020reach, everett2021efficient]. In this paper, we use [vincent2020reachable] to parse our learned NN as a set of linear mappings between polyhedrons for online reachability computation, but this step can be replaced with other NN reachability tools.

To measure the probability distribution in the reachable sets, the most naïve approach is to use histograms or kernel density estimation

[chen2020optimal], similar to the Monte-Carlo method used in dispersion analysis [spencer1996mars]. But this could lead to poor accuracy and computational scalability [niederreiter1992random]. Another approach propagates the uncertainty by approximating the solution of the probability density function (PDF) transport equation [pantano2007least], which could still be time-consuming due to the optimization process performed at each time step. Our approach finds the PDF transport equation by solving the Liouville PDE 222In the absence of process noise, this PDF transport equation reduces to stochastic Liouville equation [ehrendorfer1994liouville] using NN. Similar ideas have been explored in [nakamura2019density]. However, the density NN in [nakamura2019density] was learned solely for a fixed initial states distribution and therefore, cannot be used for online prediction. Instead, we jointly learn the reachable states and density changes, and can perform online reachability density computation for any initial states distribution.

The idea of using deep learning to solve PDEs can be traced back to the 1990s

[lee1990neural, lagaris1998artificial, uchiyama1993solving], where the solutions of the PDE on a priori fixed mesh are approximated by NN. Recently, there is a growing interest in the related sub-fields including: mesh-free methods in strong form [raissi2019physics, sirignano2018dgm, berg2018unified] and weak form [weinan2018deep, he2018relu], solving high-dimension PDEs via BSDE method [han2020algorithms], solving parametric PDE [khoo2021solving, kutyniok2021theoretical] and stochastic differential equations [zhang2019quantifying, yang2018physics], learning differential operators[li2020fourier, long2018pde, lu2019deeponet], and developing more advanced toolboxes [koryagin2019pydens, lu2021deepxde, chen2020neurodiffeq]. Our idea for solving Liouville PDE along trajectories is similar to [han2020algorithms] but without the stochastic term.

## 2 Preliminaries

We denote by , , , the sets of real numbers, non-negative real numbers,

-dimensional real vectors, and

real matrices. We consider autonomous dynamical systems of the form , where for all , is the state and is a compact set that represents the state space. We assume that is locally Lipschitz continuous. The solution of the above differential equation exists and is unique for a given initial condition . We define the flow map such that (also written as for brevity) is the state at time starting from at time . Note that system parameters can be easily incorporated as additional state variables with time derivative to be .

We analyze the evolution of the dynamical system by equipping it with a density function , which measures how states distribute in the state space at a specific time instant. A larger density means the state is more likely to reside around at time , and vice versa. The density function is completely determined by the underlying dynamics (i.e., function ) and the initial density map . Specifically, given a , the density function solves the following boundary value problem of the Liouville PDE [ehrendorfer1994liouville]:

 ∂ρ∂t+∇⋅(ρ⋅f)=0,   ρ(x,0)=ρ0(x), (1)

where is the divergence for the vector field , and takes the -th coordinate of a vector.333A general form of Liouville PDE can have a non-zero term on the right-hand side indicating how many (new) states appear or exit from the system during the run time [chen2020optimal]. In all the systems we discuss here, there is no state entering (other than the initial states) or leaving the system, so the right-hand side of Eq. (1) is zero. The total density of the systems we consider is invariant over time. Intuitively, as shown in [ehrendorfer1994liouville], Liouville PDE is analogous to the mass conservation in fluid mechanics, where the change of density at one point is balanced by the total flux traversing the surface of a small volume surrounding that point. It is hard to solve the Liouville PDE for a closed from of the density function . However, it is relatively easy to evaluate the density along a trajectory of the system. To do that, we first convert the PDE into an ODE as follows. Considering a trajectory , the density along this trajectory is an univariate function of , i.e., . If we consider the augmented system with states , from Eq. (1) we can easily get the dynamics of the augmented system [chen2020optimal]:

 [˙x˙ρ]=[f(x)−∇⋅f(x)ρ]. (2)

Therefore, to compute the density at an arbitrary point at time , one can simply proceed as follows: First, find the initial state using the inverse dynamics . Then, solve the Eq. (2) with initial condition . The solution at time just gives the desired density value. However, such a procedure only gives the density at a single point, therefore, cannot be used in reachability analysis. Instead, we need to compute the solution of Eq. (2

) for a set of initial conditions and use that to compute the reachable sets. To achieve this, we use an NN with ReLU (Rectified Linear Unit) activation functions

[nair2010rectified] to jointly approximate the flow map and the density concentration function, as will be shown in the next section.

## 3 Density learning and online reachability density computation

Let us take a closer look at Eq. (2). For an initial condition , the closed form of the solution is Interestingly, the solution of is linear in the initial condition . The gained part denoted by is a function of the initial state and time , but is independent of . This mapping is completely determined by the underlying dynamics and we call it the density concentration function. With in hand, the density at an arbitrary time and state can be quickly computed from any . However, is obviously hard to compute. Therefore, we use NN to approximate the density concentration function. In addition to , we also use NN to learn the flow map , which will be a necessity for computing the reachable set distribution shown in Sec. 3.2.

### 3.1 System dynamics and density learning framework

Let the parameterized versions of the flow map and the density concentration function be and respectively, where and are parameters. To train the neural network, we construct a dataset by randomly sampling trajectories of the system: in time steps (with time interval ): where . Then, the goal of the learning is to find parameters and satisfying

 ⎧⎨⎩Φω(xi0,kΔt)−xik=0,∀i,k,∂Gθ(xi0,kΔt)∂t+Gθ(xi0,kΔt)⋅(∇⋅f(xik))=0,∀i,k, (3)

where the first constraint is for the flow map estimation, and the second constraint enforces the Liouville equation for all the data points.

As for the implementation, we model and jointly as a fully-connected neural network with ReLU activations. To ensure numerical stability as is an exponential function, we add a nonlinear transform from the NN output to the density concentration function:

 {Gθ(x0,t)=exp(t⋅NN[0:1](x0,t))=exp(t⋅z(x0,t)),Φω(x0,t)=NN[1:n+1](x0,t), (4)

where is to choose the -th dimensions from the output of the NN, and is the intermediate density estimation from the NN. In this way, we guarantee that the density concentration function at is always 1. We optimize our NN via back propagation

with the loss function:

 L=λ⋅∑i,k[Φω(xi0,kΔt)−xik]2+∑i,k[˙Gθ(xi0,kΔt)+Gθ(xi0,kΔt)(∇⋅f(xik))]2, (5)

where the first term denotes the state estimator square error [chai2014root], the second term indicates how far (in the sense of L2-norm) the solution deviates from the Liouville Equation, and balances these two loss terms. We approximate the time derivative of the density concentration function by . Our method can also work for black-box systems if we approximate

numerically. With some tools from statistical learning theory, we can show that with a large enough number

of samples, the learned flow map and the density concentration function can be arbitrarily accurate. A formal proof is shown in Appendix A.

### 3.2 System reachable set distribution computation via NN Reachability Analaysis

In the last section, we have learned an NN that can estimate the density at a single point given the initial density. In this section, we will boost this single-point estimation to set-based estimation by analyzing the reachability of the learned NN. To compute the set of all reachable states and the corresponding density from a set of initial conditions, we use the Reachable Polyhedron Marching (RPM) method [vincent2020reachable] to further process the learned NN for and . RPM is a polyhedron-based approach for exact reachability analysis for ReLU NNs. It partitions the input space into polyhedral cells so that in each cell the ReLU activation map does not change and the NN becomes a fixed affine mapping. With the cells and the corresponding affine mapping on each cell, the exact reachable set for a set of input can be quickly evaluated. Backward reachability analysis can also be performed by computing the intersection of the pre-image of a query output set with the input polyhedron cells.

System forward reachable set with density. Recall that the input of the NN in Sec. 3.1 consists of the initial state and time . For the simplicity of comparison with other methods, we only estimate the reachable set and the density at given fixed time instances and fix the last element of the input of the NN. Thus for each time step t, the input polyhedral cells generated from the RPM will be a set of linear inequality constraints on , and those input cells together with the set of the affine mappings and output polyhedral cells can be represented as: , where in each input cell , the NN becomes an affine mapping , and thus the image of the input cell is also a polyhedron .

Recall that the first dimension of our NN output estimates the density concentration function , and the rest dimensions estimate the state , thus the output cell can be written as . By projecting it to the state space, we get the reachable set of the system, i.e., . Then, in each cell , we evaluate the lower and upper bounds of , and denote them by and . The density bound for cell is then computed as

 {ρk,min=ρ0(¯xk)⋅exp(t⋅zk,min),ρk,max=ρ0(¯xk)⋅exp(t⋅zk,max), (6)

where is the center of . Finally the system forward reachable set is a union of projected output polyhedral cells: where each cell is associated with a density bound .

System reachable set probability computation. Given an initial state distribution, we want to figure out the probability distribution of the system forward reachable sets, as well as the probability for the states land into a query set (e.g., the query set could be the unsafe region).

For an arbitrary initial probability density function (whose support is bounded), we can apply RPM to partition its support into cells444We can always further divide those input cells to make the bound tighter/ more precise, while still guaranteeing the Neural Network on each cell can be seen as an affine transformation. as in the last section. Finally, we obtain the reachable sets with bounded state densities and the probability bound in each cell is:

 {Pmink=Vol(Hk)ρmink,Pmaxk=Vol(Hk)ρmaxk, (7)

where computes the volume for a polyhedron. By computing for all input cells , we can derive the system forward reachable set and the corresponding probability bound as . The backward reachable set probability can be computed in a similar fashion, by checking the intersection between the query output region and the output cells derived by RPM, computing each intersection’s probability range by its volume and density bound and finally aggregating the probability of all intersections. Detailed computation is shown in Appendix B. 555The RPM method cannot handle systems with higher than 4 dimensional state space in our experiments. But the technique discussed in Sec. 3.2 can work with any set-based NN reachability tools with slight modification based on the set presentation of the tool. Developing a better NN reachability tool that can scale to higher dimensional systems is another topic and is out of the scope of our paper.

## 4 Experimental evaluation in simulation

Here we show the benefits of learning density concentration function from Liouville PDE in state density and reachable set distribution estimation. More experiments are provided in Appendix CH.

### 4.1 Implementation details

Datasets: We investigate 10 benchmark dynamical systems as reported in Table 1. These benchmark systems range from low dimension academic models (23 dimensions) to complex and even black-box systems (1316 dimensions) controlled by handcrafted or NN controllers. All controllers (except for the ground robot navigation example) are from the original references. More details (the model description and initial distributions) will be provided in Appendix C.

Training: For each system, we generate 10k trajectories through simulation with varied trajectory lengths from 10 to 100 time-steps, depending on different configurations of the simulation environment. We use 80% of the samples to train the NN as described in Sec. 3.1

and use the rest for validation. For the trajectory data, we collect the system states and compute for the system divergence term. For black-box systems, we use the gradient perturbation method to approximate the derivatives. We use feed-forward NN which has 3 hidden layers with 64 hidden units in each layer. We use PyTorch

[paszke2019pytorch] to train the NN and the training takes 12 hours on an RTX2080 Ti GPU.

### 4.2 Density estimation verification

We first test the density estimation accuracy of our learned NN. We compare our approach with other baselines including kernel density estimation (KDE), Sigmoidal Gaussian Process Density (SGPD) [donner2018efficient] and the histogram approach. For each simulation scenario, we first solve Eq. (2) to generate 20k 100k trajectories of (state, density) pairs, and treat this density value as the ground truth. For the KDE method, we choose an Epanechnikov kernel. We then measure the KL divergence between the density estimate of each method and the ground truth. As shown in Fig. 2, our approach has consistently outperformed KDE, SGPD and histogram approaches, with the largest reduction of 99.69% in KL divergence when compared with the histogram approach for the Kraichnan-Orszag system, while our method doesn’t use any ODE generated density values during training. Also in high-dimension systems (dimension ), the histogram approach fails to predict the density due to the curse of the dimensionality, whereas our approach can always predict the density, with a 30.13% to 99.87% decrease in KL divergence comparing to KDE. More plots will be given in Appendix D.

### 4.3 Reachable set distribution analysis

#### Forward reachable set distribution analysis.

Being confident that our approach is able to provide an accurate state density estimation, we extend our NN to do distribution analysis, which is a valuable technique in safety-related applications like autonomous driving. Here we use an existing reachability tool RPM [vincent2020reachable] to compute the forward reachable sets with probability bounds. Details about how to derive the density and probability bound for the reachable sets are presented in Sec. 3.2 and in Appendix B. Also, RPM was only able to parse the NN for Van der Pol, Double integrator, ground robot navigation, and the FACTEST car model. It fails in handling other high-dimension complex systems due to numerical issues when partitioning for the input set. Thus, we only report the results on those 4 models in this section. The main purpose is to show that for some systems the density tends to concentrate on certain states, where a small portion of the reachable sets contains the majority of states that are more likely to be reached. Therefore, our method can better quantify risks than worst-case reachability, by providing a flexible threshold for the probability of reachability.

We start with the Van der Pol oscillator. The initial states are uniformly sampled from a square region: . As illustrated in Fig. 1, the system states will gradually converge to a limit cycle. Using worst-case reachability analysis for this system will result in a very conservative over-approximation, and this over-approximation will propagate over time and lead to increasing conservativeness of the reachable set estimation. As shown in Fig. 3(a), for the worst-case methods like DryVR(red) [fan2020fast, du2020online] and GSG(Green) [everett2020robustness], the volume of their estimated reachable set relative to the volume of the convex hull of the system states keeps increasing over time, from 1.9670X to 5.3027X for the DryVR [fan2020fast] approach, and from 1.7636X to 2.8086X for GSG [everett2020robustness]. However, our method can give the probability bound for every reachable set in the state space as shown in the heatmap in Fig. 3(b), clearly identifying the region around the limit cycle in high density (bright color), and the rest space in low density (dark color).

We can also compute the relative volume of the reachable sets (comparing to the convex hull) preserving different levels of reachable probability, whose evolution over time reflects the system’s tendency for concentration. As shown in Fig. 4, we use the above 4 systems and study the volume of the reachable set with probability threshold 0.50, 0.70, 0.80, 0.90, and 0.99. As expected, the relative volume will increase as the probability threshold increases. In all cases, there exist some time instances where a small volume of the reachable set actually preserves high probability, which shows the state concentration exists in many existing systems. While as shown in Fig. 4, the worst-case reachability tools can only generate one curve which presents the (relative) volume of the reachable set that covers all possible states. Not surprisingly, the worst-case reachability tools give very conservative results. We believe using our proposed method to do reachable set distribution analysis will benefit future study for systems with uncertainty, and for systems where the failure case is inevitable but happens with a low probability. More comparisons with state-of-the-art reachability methods (Verisig [ivanov2019verisig], Sherlock [dutta2017output] and ReachNN [fan2020reachnn]) are shown in Appendix H.

#### Online safety verification under different initial state distributions.

Since our approach learns the density concentration function instead of absolute density value, it has the flexibility to estimate online reach sets distribution with any possible (bounded support) initial distributions. Consider the safety verification for the ground robot navigation problem (shown in Fig. 5(a)): The probability of colliding with a obstacle in the center of the map is determined by the initial state distribution 666How to compute the probability of a reachable set is discussed in Sec.3.2

, which is parametrized as a truncated Gaussian distribution

where and

measure the expectation and uncertainty of the initial robot state. Our method can estimate the upper and lower bounds for the probability of colliding with the obstacle, and this safety evaluation process can run in faster than 50Hz with parallel computation and heuristic searching used (see details in

Appendix E). As shown in Fig. 5(b), when the initial state uncertainty decreases from to , the upper and lower bounds for the probability of collision decrease to close to zero (from to , where other worst-case reachability analysis methods can only report a collision is inevitable, without quantifying the corresponding risks. This also shows the advantage of our approach in adapting to different density conditions in computation without retraining or fine-tuning.

#### Limitations and trade-offs of performing reachable set distribution analysis.

Experimental results show that our method can compute much less conservative probabilistic reachable sets than most worst-case reachability methods. This less conservative result benefits from RPM which can provide exact NN reachability analysis by sacrificing scalability. Therefore RPM also constrains us from performing online reachability analysis for high-dimensional systems or systems with large initial sets. Technically, we are solving a harder problem than worst-case reachability approximation, as we need not only the reachable set, but also the density over those reachable states. Like worst-case analysis, this is the reason why it can only scale to lower-dimensional systems and smaller initial sets when we want to perform accurate reachable computation. We believe that our approach can better quantify risks under different conditions, especially when unsafe is inevitable (similar to Fig. 5(a)). Our method can also give worst-case reachability by taking all output reachable cells produced by RPM regardless of their density. This worst-case reachability using RPM is less conservative than other NN reachability tools, at the cost of not scaling to high-dimensional systems.

## 5 Conclusion and discussion

In this paper, we propose a Neural Network (NN)-based probabilistic safety verification framework that can estimate state density, compute reachable sets and corresponding probability. Our Liouville-based NN can accurately estimate the state density even for high-dimension systems. Our probabilistic reachable set framework can handle nonlinear (and potentially black-box) systems with varying initial state distributions and can be used for fast online safety verification. We recognize that the task of computing probabilistic reachable sets is very useful, and our method is more helpful than worst-case reachability particularly when the system states are more likely to concentrate. One limitation of our approach is that the NN reachability tool we used (RPM) cannot handle high-dimension systems or cases where the initial set is very large, due to the numerical issues when partitioning for polyhedral cells. This limitation is due to the scalability and accuracy trade-off of NN reachability, which is an independent problem from our paper. We plan to explore other NN reachability methods and more complicated hybrid systems in real-world applications. The NASA University Leadership initiative (grant #80NSSC20M0163) and Ford Motor Company provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not any NASA or Ford entity.

## A Generalization error bound for the learning framework

With sufficient amount of data and a large enough neural network, we can approximate the state and density estimation at arbitrary small errors [leshno1993multilayer]. In the language of statistical learning theory, the neural network generating functions is called a hypothesis and denoted by . The set containing all the possible hypotheses is called the hypothesis class . For a hypothesis generating , we denote where is an error tolerance term which is further used to derive the probabilistic guarantee. Assume that the optimization problem in Eq.(3) is feasible, and and solve Eq. (3). Let be the hypothesis that generates . Furthermore, assume that , and denote the sample distribution (where the training sample trajectories are sampled from). Then according to Theorem 5 in [srebro2010smoothness], the following statement holds with probability at least over a training data set consisting of i.i.d. random trajectories:

 P(Eξ∼Dl(^hN,ξ)>0)≤K(log3Nγ2R2N(H)+2log(log(4Bl/γ)/δ)N) (8)

where is a universal constant, and is the Rademacher complexity for defined as:

 RN(H)=supξ1,ξ2,⋯,ξN[Eσ[suph∈H1NN∑i=1σil(h,ξi)]] (9)

where

are i.i.d. random variables with

.

Remarks: Here we reduce bounding the generalization error to bounding the Rademacher complexity , where can be further bounded as for Lipschitz parametric function classes (including neural networks) where denotes the number of learnable parameters [boffi2020learning][Theorem 4.2.]. In this way, we show that for a fixed error threshold , as the number of training samples increases, the probability that our learning framework fails to satisfy the Liouville equation or fails to estimate the system dynamics will gradually decrease to zero. We show an empirical result to support this in Figure 1. For the Van der Pol Oscillator benchmark example, we train the neural network with different numbers of training samples (from ) and report the testing error (mean square error for the state estimation and density concentration function comparing to the groundtruth) for a fixed testing set. As the number of training samples increases, the testing error gradually converges to zero.

Assume the functions on the right hand side of Eq. (2) are uniformly Lipschitz continuous in , then the function will have a unique solution according to Picard-Lindelöf theorem[coddington1955theory][Theorem I.3.1]. Then if our estimator satisfies the Liouville equation everywhere, we can recover the groundtruth density concentration function as well as the system dynamics.

## B Implementation details for system reachable set probability computation using RPM

### b.1 Online query set probability bound computation under different initial state distributions

The problem formulation is: given a query set with density concentration function constraints (the range that the density concentration function can change from the initial condition to the terminal condition; if this constraint is not specified, the default value is ), compute the probability that the system will reach this query set (with optional density constraints).

In our case, when using RPM to compute the reachable sets, we represent as a polyhedron, and since is a set of linear inequality constraints, the set is also a polyhedron. At each time step , from Sec. 3.2 we can represent the NN input cells, affine mapping and output cells at this time step as (here we omit the subscript for for the brevity in the notation) where each input cell is a polyhedron , with an affine mapping and the resulting output cell is also polyhedron . Then for each output cell , we check for the intersection between the query cell and the output cell . Next we can derive the intermediate density concentration function bound on

by solving the following Linear Programming problem (here taking

as an example; to solve we just need to change the “min” to “max” in the objective function in Eq. 10; and here denotes the first coordinate of , thus as we denote ):

 {min [y]0s.t. y∈Mqk (10)

After we derive the bound for on , the density bound for is computed as (similar to Eq.(6)):

 {ρk,min=ρ0(~xk)et⋅zk,minρk,max=ρ0(~xk)et⋅zk,max (11)

where is the center of . And the probability bound can be computed by and where the is the volume for the intersection. Finally the probability of the system reach this query set at time is bounded by . An illustrative figure is shown in Fig. 2

Remarks: This algorithm can be used for online safety verification under different initial state distributions by just representing the dangerous set in , and changing the function in (11) on the fly. Here we approximate the density distribution in using the density evaluated at which is the center of - the accuracy of this approximation will converge to 1 as the partition on gets finer.

### b.2 Backward reachable set probability computation

The problem formulation is: given a query set with density concentration function constraints (the range that the density concentration function can change from the initial condition to the terminal condition; if this constraint is not specified, the default value is ), compute for all possible initial conditions as well as probabilities that lead the system to reach the query set (with optional density constraints).

Similar to Sec. B.1, we can denote this query set as . At each time step , the NN input cells, affine mapping and output cells are where each input cell is a polyhedron , with an affine mapping and the resulting output cell is also polyhedron . Then for each output cell , we check for the intersection between the query cell and the output cell . Using the affine mapping with invertible 777In practice, is in high probability to be invertible. This is because the set of all non-invertible random matrices forms a hyper-surface with Lebesque measure zero. When is singular, we can use elimination method like Fourier-Motzkin elimination as in [vincent2020reachable] to derive the set representation in the input side., we can derive the pre-image of this intersection to be . Thus the reachable set can be computed using projection: and the corresponding probability is where is the initial state distribution function and is the center of . By performing this for all output cells and for all time steps , we derive the backward reachable set .

### b.3 Speed up the probability computation by using hyper-rectangle heuristic

The computation in both Sec. B.1 and Sec. B.2 requires checking the intersection between polyhedral and , where one approach is to check whether a feasible solution exists for the linear programming problem : . Solving this for requires time when the interior method is used. To speed up the intersection checking process, we introduce a hyper-rectangle heuristic: at the pre-processing stage, we over-approximate each polyhedron by its outer hyper-rectangle (derived by computing the range for the vertices of in each dimension). When checking for the polyhedron intersection between and , we first check whether their corresponding hyper-rectangles and will intersect. If and do not intersect, then it is guaranteed that the polyhedra and won’t intersect. Otherwise, we further check the intersection of and by using the interior method. Checking hyper-rectangles’ intersection can be implemented in , hence greatly accelerates the computation process. A detailed computation time comparison will be presented in Sec. E.

## C Simulation environments

In this section, we present the implementation details for all 10 simulation environments used in our main paper, sorted in the same order as shown in Table. 2.

### c.1 Van der Pol Oscillator

Consider the Van der Pol Oscillator problem: where the position variable is a function of and the scalar parameter indicates the strength of the system damping effect. By doing a transformation: , the original problem can be shaped to the following 2d system dynamics:

 {˙x=y˙y=μ(1−x2)y−x (12)

where the divergence term used in (2) can be computed as: . In the simulation, we set , the initial state distribution as an uniform distribution and the time step duration . We run each simulation for 50 time steps to collect the trajectories.

### c.2 Double Integrator with an NN controller

We consider a discrete double integrator system introduced in [hu2020reach]:

 (xt+1yt+1)=[1101](xtyt)+[0.51]ut (13)

where denotes the 2d state variable, and is the output of a neural network controller which is trained to mimic the behavior of an MPC controller [hu2020reach, everett2021efficient]. We convert the system to the continuous system with state and time step duration as :

 ˙(xy)=[0100](xy)+[0.51]u (14)

and here the divergence term used in (2) can be computed as: , where the is the gradient of the neural network controller output with respect to the input (and similar for and ) and can be calculated using automatic differentiation engine in PyTorch [paszke2019pytorch]. We set the initial state distribution as an uniform distribution . Similar to [hu2020reach, everett2021efficient], we run each simulation for 10 time steps to collect the trajectories.

### c.3 Kraichnan-Orszag system

The system dynamics of the Kraichnan-Orszag problem [orszag1967dynamical, nakamura2019density] is defined as:

 ⎧⎨⎩˙x1=x1x3˙x2=−x2x3˙x3=−x21+x22 (15)

and here an interesting fact is that the divergence term used in (2) is just: , which means the density along each trajectory won’t change over time, and only depends on the initial state distribution. Similar to [nakamura2019density], we set the initial state distribution as an Gaussian distribution with:

 ⎧⎪⎨⎪⎩x1(0)∼N(1,1/42)x2(0)∼N(0,1/22)x3(0)∼N(0,1/22) (16)

where we further truncate the initial state within the range . We set the time step duration and run each simulation for 80 time steps to collect the trajectories.

### c.4 Inverted pendulum

The inverted pendulum problem [chang2020neural] is defined as , where denotes the pendulum’s relative angle to the the up-right position, are pre-defined parameters and denotes the output of an LQR controller [chang2020neural] where and are scalar-valued coefficients. To test for the system performance under different coefficient settings for the LQR controller, we include , into the system state variable and study the following system dynamics:

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩˙θ=ω˙ω=1m⋅L2(mgLsinθ−bω+u)˙k1=0˙k2=0 (17)

where . Now the divergence term used in (2) can be computed as: . Based on  [chang2020neural], we set , the time step duration . We set the initial state distribution as a uniform distribution and run each simulation for 50 time steps to collect the trajectories.

### c.5 Ground robot navigation with an NN controller

We design a ground robot navigation experiment (as shown in Fig. 3), where the objective is to reach the green region while avoiding to enter the red region . The robot is following an Dubins car model:

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩˙x=vcosθ˙y=vsinθ˙θ=uw˙v=ua (18)

where represent robot’s x and y position, heading angle and velocity respectively. We use an NN controller to output control signals . The NN controller is a feedforward NN with 2 hidden layers and 32 hidden units in each layer. We use ReLU for the intermediate activation functions and use Tanh as the activation function for the last layer to make sure the control output is always bounded. During training, we use this NN controller to collect trajectory data and do back-propagation with the loss function: where . Here the divergence term used in (2) can be computed as: . We set the initial state distribution as an uniform distribution . We run each simulation for 50 time steps with time duration to collect the trajectories.

### c.6 FACTEST car tracking system

Consider a rearwheel kinematic car in 2D scenarios where the dynamics is:

 ⎡⎢⎣˙x˙y˙θ⎤⎥⎦=⎡⎢⎣cos(θ)0sin(θ)001⎤⎥⎦[vω] (19)

and the corresponding errors are measured by:

 ⎡⎢⎣exeyeθ⎤⎥⎦=⎡⎢⎣cos(θ)sin(θ)0−sin(θ)cos(θ)0001⎤⎥⎦⎡⎢ ⎢⎣xref−xyref−yθref−θ⎤⎥ ⎥⎦ (20)

with being some predefined tracking points (in this experiment, we assume the tracking points are not changing over time). With the following tracking controller defined in ( and are referenced angular velocity and velocity respectively, are the parameters controlling how fast the system will converge to the reference point) [fan2020fast]:

 {v=vrefcos(eθ)+k1exω=ωref+vref(k2ey+k3sin(eθ)) (21)

and with an uncertainty error in the dynamics of and (denoted as ), the error dynamics become:

 ⎡⎢ ⎢ ⎢ ⎢⎣˙ex˙ey˙eθ˙a⎤⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢⎣(ωref+vref(k2ey+k3sin(eθ)))ey−k1ex+aex−(ωref+vref(k2ey+k3sin(eθ)))ex+vrefsin(eθ)+aey−vref(k2ey+k3sin(eθ))0⎤⎥ ⎥ ⎥ ⎥⎦ (22)

The uncertain parameter . We will show that althought now the reachable set will be much larger than the case when , the probability that the system does not converge to the origin (zero-error) is very low.

Here the divergence term used in (2) can be computed as: . In our experiment, we set . We set the initial state distribution as an uniform distribution . We run each simulation for 50 time steps with time duration to collect the trajectories.