The complexity of today’s technological applications induces a quest for automation, leading to many black-box intelligent cyber-physical systems and thus being difficult to reason about . Many of these systems operate in safety-critical context and hence safety-critical systems themselves . Therefore, reasonable performance guarantees should be obtained before the systems are deployed.
Black-box checking, introduced by Peled at al. , is often used for verifying non-stochastic black-box systems, based on experiments that interface with them. It performs checks on the system itself. The black-box checking is a combination of model checking and testing: model checking  checks properties of a model of the system, but not the system itself. In contrary, testing is usually applied to the actual system and checks whether the system conforms with the model, further serving to improve the model. They are two complementary approaches for enhancing the reliability of black-box systems. In the black-box checking, whenever a model is created, model checking may reveal a fault in the system or show that the model was not good enough and needs to be learned further if the fault is spurious. If model checking does not reveal a fault, equivalence between the model and the black-box system is checked via testing. In case, non-equivalence is detected, then the model needs to be further learned. The checking-testing-learning repeated process is costly generally. Recently, a method combining optimization-based falsification and black-box checking was proposed to falsify specifications for black-box cyber-physical systems in .
Another technique to verification of black-box systems is statistical model checking (SMC) [35, 45]. SMC is pioneered by Younes and Simmons in the discrete case in , which is based on Sequential Probability Ratio Test . It is a compromise between verification and testing, which is based on sampling executions of the system and then deciding whether the samples provide a statistical evidence for the satisfaction or violation of the specification based on hypothesis testing . SMC is now widely accepted in various research areas such as software engineering, in particular for industrial applications , or even for solving problems originating from systems biology . There are several reasons for this success. First, SMC is very simple to understand, implement and use. Second, it does not require extra modelling or specification effort, but simply an executable system that can be simulated and checked against state-based properties. Third, it avoids the state space explosion in verification and thus can be applied to analyze systems with large state spaces. Consequently, there are variety of SMC tools such as PLASMA-Lab , Ymer , VeStA , MRMC , MC2 , UPPAAL-SMC  and so on. In order to further improve the efficiency of SMC, Bayesian SMC was proposed in [23, 48]
, which is a SMC based on Bayesian statistics. The aforementioned SMC approaches for black-box systems are free of mathematical models and perform checks on the system itself by sampling executions of the system. However, the usefulness of mathematical models is well documented. The mathematical models not only help us to understand the system, but also are instrumental to yield insight into the complex processes involved in the system by extracting the essential meaning of some hypotheses. Also, they allow to study the effects of changes in their components and/or environmental conditions on the system’s trajectories, i.e., they allow the control and optimization of the system. Thus, the introduction of mathematical models with appropriate degree of complexity into SMC would contribute a lot to the analysis of the black-box system, not only in the verification of its specifications but also in understanding the complex mechanisms underlying and thus further optimizing the system. Consequently, model learning based SMC approaches are also proposed. For example,[26, 27, 28, 1]
considered black-box systems modelled by Markov decision processes and inferred probabilistic models with the purpose of model checking. The work in combined stochastic learning and abstraction with respect to some property for analyzing black-box systems modelled by Markov decision processes. The work in 
presented an approach for black-box systems modelled by Markov decision processes to unbounded reachability analysis via SMC. The technique is based on delayed Q-learning, a form of reinforcement learning. Generally, the exact learning algorithms require checking equivalence between the model and the system, which is difficult and undecidable. Regression models were used in for finding the regions in the parameter space that lead to satisfaction or violation of given specification with probabilistic coverage guarantees based on conformal regression. Recently, learning procedure within the PAC learning framework is proposed, e.g., [19, 10, 2, 30].
In this paper we propose a novel SMC approach for finite-time safety verification of black-box continuous-time dynamical systems within the framework of PAC learning . The black-box continuous-time dynamical systems are the ones, for which no model is given but whose states changing continuously through time over finite time horizons can be observed at some discrete time instants for a given input. The proposed new model checking, also termed as PAC model checking, is built upon learned models within the framework of PAC learning. In the PAC model checking, correctness guarantees of the learned models are expressed using the terms error probability and confidence level. We show that the time-evolving trajectories of the black-box system over a specified finite time horizon fall within the range of the learned model plus a bounded interval with statistical guarantees, which is further used to characterize the satisfiability of safety requirements. Given an error probability and a confidence level, which are two fundamental parameters in PAC learning, the model together with the bounded interval is computed via scenario optimization, which is widely used for computing solutions to robust optimization problems based on finite randomization of infinite constraints . The scenario optimization, which finally boils down to a linear program in our approach, is constructed from a family of independent and identically distributed datum collected by executing the system. Three examples demonstrate the performance of our approach. Our contributions are summarized as follows.
1). We propose a novel PAC model checking approach for finite-time safety verification of black-box continuous-time dynamical systems. In this approach the trajectories of the black-box system over finite time horizons are shown to fall within the range of a model plus a bounded interval with error probabilities and confidence levels. This reachability analysis is instrumental in characterizing the satisfiability of safety requirements of the black-box system.
2). A linear programming based approach is proposed to synthesize the model and the bounded interval. The size of the linear programming problem could be independent of the one of the black-box system, thus rendering our approach suitable for large-scale systems.
As mentioned above, there are many works on verifying black-box systems. In this subsection we just discuss the closely related works to the present one.
proposed an algorithm for constructing PAC confidence sets for deep neural networks. The work in computed safe inputs for a black-box system such that the system’s final outputs fall within a safe range with PAC guarantees. In contrast, our approach focuses on analysis of continuous-time systems, and infers that the time-evolving trajectories of the black-box system over finite time horizons fall within the range of a model plus a bounded interval with PAC guarantees. The closest work in spirit to the present one is , which considered verification of sequential programs by learning models of the set of feasible paths of programs within the framework of PAC learning. The model learning algorithm in  is based on counterexample guided abstraction refinement. However, our approach considers continuous-time systems and infers an approximation to the trajectories of the system over the specified finite time horizon within the framework of PAC learning, in which linear programs are used for learning models.
In the framework of simulation-driven reachability analysis , a PAC based method was proposed for learning discrepancy functions in  for safety verification of hybrid systems with black-box modules. The problem of learning discrepancy functions is reduced to a problem of learning linear separators. Although a PAC discrepancy function is computed in , a characterization on how well the trajectories satisfy the learned discrepancy function is not given and thus a formal quantitative assessment on the satisfiability of safety properties is not presented if a valid discrepancy function is not obtained. Generally, valid discrepancy functions rather than PAC ones for black-box systems are challenging to obtain. In contrast, a formal characterization of the satisfiability of safety properties is given based on the computation of PAC models in our PAC model checking method.
When the continuous-time systems of interest are modeled by ordinary differential equations or delay differential equations, and the equations are explicitly given, there are many well-developed model-based reachability analysis techniques over finite time horizons, e.g., Taylor-model method, simulation-driven reachability method  and set-boundary reachability method , for safety verification of these systems. However, our method focuses on black-box continuous-time dynamical systems, whose mathematical abstractions are not acquired and which are only represented by a family of datum. Such systems can not be handled by existing model-based reachability analysis techniques.
The remainder of this paper is structured as follows. In Section II we formalize the concept of black-box continuous-time dynamical systems and the problem of interest in this paper. Section III elucidates our PAC model checking approach. After demonstrating the performance of our approach on three examples in Section IV, we conclude this paper in Section V.
In this section we present the concept of black-box continuous-time dynamical systems and the related problems, as well as a brief introduction on scenario optimization. The notations are used throughout this paper: denotes the set of nonnegative real values.
denotes the set of positive real values. Vectors are denoted by boldface letters.
Besides, the ground truth trajectories in all examples are obtained based on the combination of Runge-Kutta simulation methods and linear interpolation methods.
Ii-a Problem Formulation
In this paper we consider a black-box continuous-time dynamical system, whose dynamics are governed by a formula of the following form:
where is the input of the system, the set is compact, with is the time variable, is the state of the system at time , and is the system mapping which is unknown. Besides, we have the following assumptions.
1). The system (1) runs well, including the on-board sensors, and thus it can provide us any family of finite datum we need. Also, the provided datum are free of noise.
2). Suppose that the time horizon is endowed with a algebra and a probability over is assigned. Also, we assume that the set of inputs is endowed with a algebra and that a probability over is assigned. Throughout this paper, we use the uniform distribution
is assigned. Throughout this paper, we use the uniform distributionon and on to illustrate our method, although our method is not confined to this particular distribution.
Systems of the form (1) are all around us, especially nowadays. For example, many AI systems such as robotics and self-driving cars are leaving academic laboratories and entering real-world applications. Unfortunately, many of these systems can not explain their results even to their makers, let alone to end-users . They operate like black boxes, which can be viewed in terms of a family of observed datum, without any knowledge of their internal workings.
In this paper we propose a PAC model checking approach for finite-time safety verification of the system (1). The safety verification problem is widely studied in computer science, e.g.,. In our approach, the key is to obtain a model with appropriate degree of complexity, which is learned based on a family of collected datum within the framework of PAC learning and can characterize the system (1) with correctness guarantees expressed with error probabilities and confidence levels. For computing such models, we should address the problems summarized below:
What datum should we use?
How can we learn a mathematical model efficiently based on the collected datum?
What is the discrepancy between the trajectories of the learned mathematical model and the system (1)?
After computing the model, we will address the safety verification problem below.
Given a set of unsafe states, when the trajectories of the computed model are shown to avoid the set , how can we formally characterize the satisfiability of the safety property of avoiding the unsafe set for the black-box system (1) over the time horizon ?
Our method can be straightforwardly extended to vector valued mappings of the form with , but the scalar valued mappings are considered for ease of exposition.
Ii-B Scenario Optimization
This subsection gives a brief introduction on scenario optimization. It provides statistical solutions to robust optimization problems based on solving finite randomization of infinite convex constraints.
A robust optimization problem of interest is as follows:
where are continuous and convex functions over the dimensional optimization variable for every . Also, the sets and are convex and closed.
Suppose that is endowed with a algebra and that a probability P over is assigned. The scenario optimization of (2) is to obtain an approximate solution to (2) via solving the convex program (3), which is constructed by extracting independent and identically distributed samples from according to the probability distribution
according to the probability distributionP:
(3) relaxes (2) in that it only considers a finite subset of the infinitely many constraints of (2). A mathematically rigorous relation, which holds irrespective of the underlying probability P, between the solutions of the two systems can be drawn .
If (3) is feasible and attains a unique optimal solution , and
where and are respectively a user-chosen error level and confidence level, then with at least confidence, satisfies all constraints in but at most a fraction of probability measure , i.e., , where the confidence is the fold probability in , which is the set to which the extracted sample belongs.
The above conclusion still holds if the uniqueness of optimal solutions to (3) is removed , since a unique optimal solution can always be obtained according to Tie-break rule if multiple optimal solutions occur. Moreover, since appears under the sign of logarithm in (4), it can be made small, like or , without increasing significantly. Recently, scenario optimization was used to compute probably approximately safe inputs for a black-box system such that the system’s final outputs fall within a safe range in , and perform safety verification of hybrid systems in .
Iii PAC Model Checking
Iii-a Datum Extraction
In this subsection we introduce what datum to use in learning a model of the system (1) in our approach and how to obtain them, i.e., solve Problem 1.1.
We first extract a family of independent and identically distributed time instances from the time interval according to the probability distribution . Moreover, a family of independent and identically distributed inputs is also extracted from the set according to the probability distribution . The process of obtaining and does not need to run or /simulate the system (1). The numbers and rely on how accurate one wants the learned model to achieve. The relationship is elucidated in Subsection III-B.
Next we need to run the system (1) to obtain its internal datum. For each extracted input , , we feed it to the system (1) and then run it until the time . In this process, the on-board sensors will help observe and record the states of the system (1) at the time instance ,
. This is realistic for some systems nowadays, since smart sensors are taking over almost every sphere of human life. For example, RADAR, LIDAR, GPS and computer vision are widely used to work coherently for identifying the position, velocity and other states of the vehicle. We denote the family of observed states by, where denotes the state of the system (1) at time with the input , ,
So far, we obtain a family of datum . Each data is a triple , where is the input of the system (1), is the time instance and is the state of the system (1) with the input at time . The process of running the system (1) can be regarded as a testing process. However, our method goes further than testing techniques. We meanwhile collect a family of datum and then use these datum to compute models for characterizing the system (1) formally.
In our experiment, we assume that the input is noise-free and the on-board sensors work perfectly such that the observed datum are free of noise as well, i.e., is the exact state of the system (1) with the input at time , , This assumption may be too ideal in practice since input and sensor noise often exists. We would relax it in our future work.
Iii-B Safety Verification
In this section we elucidate our approach for solving Problems 1.2, 1.3 and 2 based on the family of datum obtained from the process in Subsection III-A. We first consider the system (1) with one trajectory, and then multiple trajectories and finally all trajectories from the input set .
Iii-B1 One Trajectory Verification
In this subsection, we solve Problems 1.2, 1.3 and 2 for the system (1) with a single input. Concretely, given a discrete-time trajectory of the system (1) with the input , which is represented by a family of datum with and obtained in Subsection III-A, we would compute a model with to characterize .
In computing a model, we consider a linearly-parameterized model template , such that is for a linear function in , which are unknown parameters. This model can be a polynomial function over , or a more general nonlinear function over . For instance, consider a two-dimensional system with input state variable , is a linear function in and , and is also a linear function over and . Such models can be the ones parameterized with orthonormal basis functions, which are able to represent a set of physical systems . For ease of exposition, we use to denote in the reminder of this paper. Generally, a model template of appropriate degree of complexity should be chosen in order to avoid the over-fitting issue and facilitate the reachability analysis. In practice, engineering insight and physical knowledge would facilitate the selection of model templates.
Then we construct the following linear program over for computing a mathematical model based on the family of given datum :
which is equivalent to
where is a pre-specified upper bound for , , and is a pre-specified upper bound for .
Denote the optimal solution to (6) by . Thus, we obtain a model , whose discrepancy with the system (1) is characterized by two approximation parameters: error probability and confidence level . This is formally stated in Theorem 2.
Let be an optimal solution to (6), , and
Then we have that with at least confidence,
The conclusion is easily obtained by Theorem 1. ∎
Actually, the computed mathematical model is a PAC model [38, 37] with accuracy level and confidence level . The accuracy parameter in Theorem 2 determines how far the learned model can be from the real one. This corresponds to the ”approximately correct”. A confidence parameter indicates how likely the learned model is to meet that accuracy requirement. This corresponds to the ”probably” part. Under the data access model that we are investigating, these approximations are inevitable. Since the training set is randomly generated, there may always be a small chance that it will happen to be noninformative (for example, there is always some chance that the training set will contain only one domain point, sampled over and over again). Furthermore, even when we are lucky enough to get a training sample that does faithfully represent , because it is just a finite sample, there may always be some finite details of that it fails to reflect. The accuracy parameter allows forgiving the learned model for making minor errors.
One Trajectory Verification
We first characterize the reachability of the trajectory using the mathematical model plus the computed . We denote the trajectory of the mathematical model by . From Theorem 2, we have that with confidence of at least ,
for all in but at most a fraction of probability measure , i.e., with confidence of at least , the amount of time for the trajectory staying within the neighborhood of the trajectory exceeds . A graph explanation is further presented in Fig. 2 to enhance the understanding of (9). In Fig. 2, for . According to Theorem 2, with confidence of at least .
Then we solve Problem 2 based on the formal reachability characterization given above. That is,
if does not intersect the unsafe set for , i. e., for , we have that the amount of time the system (1) with the input spends inside the unsafe set does not exceed , with confidence of at least .
If in Theorem 2 is extremely small (smaller than ), then we have a priori practical certainty that the total amount of unsafe time does not exceed . As explained in Subsection II-B, the confidence level can be made large without increasing the size of samples significantly. This framework is useful in those situations where the system (1) is able to tolerate the exposure to a deteriorating agent for a limited amount of time. For example, let us consider a solar-powered autonomous vehicle. Regions without solar exposure are considered to be unsafe, since the vehicle’s battery could be drained after a period of time. However, it would be inefficient to plan a path for the vehicle completely avoiding all these shaded regions. Instead, a more reasonable requirement would be that the amount of time the vehicle spends in the shaded regions is small.
Our approach can also be used to characterize the case that there exists such that . For this case, we need to compute a value , which is larger than or equal to the amount of time such that . Further, we have that the amount of time the system (1) with the input spends inside the unsafe set does not exceed , with confidence of at least .
In the following we use an example from a Van-der-Pol oscillator to enhance the understanding of our approach.
Consider a system with , and , whose internal dynamics are described by an ordinary differential equation which generally describes a Van-der-Pol oscillator :
We assume that the trajectory of the system (1) in this example describes the time evolution of the state in (10), i.e., for . The ground truth trajectory , is illustrated in Fig. 3. It is used to extract datum and perform comparisons. The method of constructing the ground truth trajectory is introduced in the beginning of Section II.
Let and . In this example we use and a polynomial of degree over as a mathematical model to perform computations. Since is known, is of the form . Note that the number of decision variables in (6) is and consequently according to Theorem 2.
We obtain via solving the linear program (6) with . Therefore, we have that with confidence of at least ,
for all in except at most a fraction of probability measure , where is the trajectory of the mathematical model . We also take the time step and the corresponding states on the ground truth trajectory to verify the satisfiability of (11), i.e., whether holds for . The satisfiability ratio is .
Since for , we have that the amount of time the system (1) with the input spends inside the unsafe set does not exceed , with confidence of at least .
Iii-B2 Multiple Trajectories Verification
In Subsection 3.2.1 we considered one trajectory characterization of the system (1). In this subsection we extend the method in Subsection 3.2.1 to multiple trajectories characterization. These trajectories are the ones of the system (1) with inputs .
This extension is straightforward. We just need to enrich the constraints in (6) by incorporating these discrete-time trajectories , , , consequently resulting in the following linear program:
where is a given upper bound for , , and is a given upper bound for . Denote the optimal solution to (12) by .
We denote the trajectory of the mathematical model with the input by . Similarly, we have the following theorem for the solution obtained via solving the linear program (12).
Let be an optimal solution to (12), , and
Then for each input , , we have that with at least confidence,
According to the scenario optimization in Subsection II-B, we have that with at least confidence,
for , the conclusion follows directly. ∎
for all in but at most a fraction of probability measure , i.e., with confidence of at least , each of the trajectories of the system (1) deviates from the corresponding one of the mathematical model by at most for all but at most a fraction .
If does not intersect the unsafe set for , , we have that the amount of time the system (1) with the input spends inside the unsafe set does not exceed , with confidence of at least .
It is worth remarking that the family of inputs here does not require to be extracted independently according to the probability distribution . They can be arbitrary inputs of interest in the set .
Let’s take the system in Example 1 as an instance to illustrate the case of two trajectories verification. These two trajectories, which are presented in Fig. 4, respectively describe the time evolution of the state in (10) with two different inputs and .
Let and . In this example we use and a polynomial of degree as a mathematical model, which is input-dependent and is linear in , to perform computations. The number of decision variables in (6) is and thus from Theorem 2.
We obtain via solving the linear program (12) with . Thus, for each , we have that with confidence of at least , for all except a small fraction , where is the trajectory of the mathematical model . Like Example 1, within the Monte-Carlo testing framework, we take the time step and the corresponding states on the ground truth trajectory with the input to verify whether for , where . The satisfiability ratio is for both of these two trajectories.
Since for and , we have that the amount of time the system (1) with each of the two inputs and spends inside the unsafe set does not exceed , with confidence of at least .
Iii-B3 All Trajectories Verification
In this subsection we further extend the method in Subsection 3.2.2 for multiple trajectories verification to all trajectories verification of the system (1) with the input set . Unlike in Subsection 3.2.2, the family of inputs in this situation should be extracted independently according to the probability distribution .
Let be an optimal solution to (12), , , , , and
Then we have that with at least confidence, , where
Let us fix the time instances firstly, we have that with confidence of at least ,
Let . Obviously, , . For , we can add the constraints involving to the linear program (12) and obtain the following linear program:
for . Thus, we have and consequently the conclusion follows. ∎
From Theorem 4, we have that with confidence of at least , the probability measure of the set is larger than . The set is a set of inputs such that the trajectory of the system (1) with each of them does not deviate from the corresponding one of the model by for all but at most a fraction .
If for and , we have that with confidence of at least , the probability measure of inputs in such that the amount of time the system (1) with each of them spends inside does not exceed with confidence of at least , is larger than .
Although the size of the linear program (12) for computing PAC models does not depend on the dimension of the system (1), it heavily depends on and the number of unknown parameters in a pre-specified PAC model template according to inequalities (14) and (15) in Theorem 4.
Let’s take the system in Example 1 again as an instance to illustrate the case of all trajectories characterization. The input set is assumed to be .
Let , , and . In this example we use , and a polynomial of degree as a mathematical model, which is input-independent and is linear in , to perform computations. The number of decision variables in (12) is and consequently and according to Theorem 4. The computation time for solving the resulting linear program is seconds. The reason that an input-independent model is used is to reduce the number of decision variables in (12), which further results in reduction of the size of extracted samples according to inequalities (14) and (15) and thus reduction of the size of the linear program (12). These computations were performed on an i7-7500U 2.70GHz CPU with 32G RAM running Windows 10.
We obtain via solving the linear program (12) with . Therefore, with confidence of at least , the probability measure of inputs in such that with confidence of at least ,
for all but at most a fraction , is larger than , where is the trajectory of the mathematical model . Within the Monte-Carlo testing framework, we extract inputs from independently according to the probability distribution and then obtain their corresponding ground truth trajectories for validating the above conclusion. Like Example 1, we take the time step and the states on the ground truth trajectory with the input to verify the satisfiability of (17), where . The satisfiability ratio of inputs such that
for all but at most a fraction , is .
Since for and , we have that with at least confidence, the probability measure of inputs in such that the amount of time the system (1) with each of them spends inside does not exceed with at least confidence, is larger than .