Real-Time Verification for Distributed Cyber-Physical Systems

09/19/2019 ∙ by Hoang-Dung Tran, et al. ∙ 0

Safety-critical distributed cyber-physical systems (CPSs) have been found in a wide range of applications. Notably, they have displayed a great deal of utility in intelligent transportation, where autonomous vehicles communicate and cooperate with each other via a high-speed communication network. Such systems require an ability to identify maneuvers in real-time that cause dangerous circumstances and ensure the implementation always meets safety-critical requirements. In this paper, we propose a real-time decentralized reachability approach for safety verification of a distributed multi-agent CPS with the underlying assumption that all agents are time-synchronized with a low degree of error. In the proposed approach, each agent periodically computes its local reachable set and exchanges this reachable set with the other agents with the goal of verifying the system safety. Our method, implemented in Java, takes advantages of the timing information and the reachable set information that are available in the exchanged messages to reason about the safety of the whole system in a decentralized manner. Any particular agent can also perform local safety verification tasks based on their local clocks by analyzing the messages it receives. We applied the proposed method to verify, in real-time, the safety properties of a group of quadcopters performing a distributed search mission.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The emergence of 5G technology has inspired a massive wave of the research and development in science and technology in the era of IoT where the communication between computing devices has become significantly faster with lower latency and power consumption. The power of this modern communication technology influences and benefits all aspects of Cyber-Physical Systems (CPSs) such as smart grids, smart homes, intelligent transportation and smart cities. In particular, the study of autonomous vehicles has become an increasingly popular research field in both academic and industrial transportation applications. Automotive crashes pose significant financial and life-threatening risks, and there is an urgent need for advanced and scalable methods that can efficiently verify a distributed system of autonomous vehicles.

Over the last two decades, although many methods have been developed to conduct reachability analysis and safety verification of CPS, such as the approaches proposed in [le2009reachability, girard2006efficient, althoff2015introduction, henzinger1997hytech, frehse2011spaceex, chen2013flow, kong2015dreach, bak2017hylaa, bak2017simulation, tran2017order, tran2019formats], applying these techniques to real-time distributed CPS remains a big challenge. This is due to the fact that, 1) all existing techniques have intensive computation costs and are usually too slow to be used in a real-time manner and, 2) these techniques target the safety verification of a single CPS, and therefore they naturally cannot be applied efficiently to a distributed CPS where clock mismatches and communication between agents (i.e., individual systems) are essential concerns. Since the future autonomous vehicles systems will work distributively involving effective communication between each agent, there is an urgent need for an approach that can provide formal guarantees of the safety of distributed CPS in real-time. More importantly, the safety information should be defined based on the agents local clocks to allow these agents to perform “intelligent actions” to escape from the upcoming dangerous circumstances. For example, if an agent A knows based on its local clock that it will collide with an agent B in the next 5 seconds, it should perform an action such as stopping or quickly finding a safe path to avoid the collision.

In this paper111This paper is an extension of [tran2019forte], we propose a decentralized real-time reachability approach for safety verification of a distributed CPS with multiple agents. We are particularly interested in two types of safety properties. The first one is a local safety property which specifies the local constraints of the agent operation. For example, each agent is only allowed to move within a specific region, does not hit any obstacles, and its velocity needs to be limited to specific range. This type of property does not require the information of other agents and can be verified locally at run-time. The second safety property is a global property defined on the states of multiple agents. Particularly, we consider a peer-to-peer collision free property and a generalized property where we want to verify if all agents satisfy a set of linear constraints (on the states of all agents) defining the property, e.g., two agents do not go into the same region at the same time.

Our decentralized real-time reachability approach works as follows. Each agent locally and periodically computes the local reachable set from the current local time to the next seconds, and then encodes and broadcasts its reachable set information to the others via a communication network. When the agent receives a reachable set message, it immediately decodes the message to read the reachable set information of the sender, and then performs peer-to-peer collision checking based on its current state and the reachable set of the sender. Verifying a generalized global property involving the states of agents is done at the time an agent receives all needed reachable sets from other agents. Additionally, the local safety property of the agent is verified simultaneously with the reachable set computation process at run-time. The proposed verification approach is based on an underlying assumption that is, all agents are time-synchronized to some level of accuracy. This assumption is reasonable as it can be achieved by using existing time synchronization protocols such as the Network Time Protocol (NTP). Our approach has successfully verified in real-time the local safety properties and collision occurrences for a group of quadcopters conducting a search mission.

The rest of the paper is organized as follows. Section 2 presents briefly the distributed CPS modeling and its verification problems. Section 3 gives the detail of real-time reachability for single agent and how to use it for real-time local safety verification. Section 4 addresses the utilization reachable set messages for checking peer-to-peer collision. Section 5 investigates the global safety verification problem. Section 6 presents the implementation and evaluation of our approach via a distributed search application using quadcopters.

2 Problem Formulation

In this paper, we consider a distributed CPS with agents that can communicate with each other via an asynchronous communication channel.

Communication Model

The communication between agents is implemented by the actions of sending and receiving messages over an asynchronous communication channel. We formally model this communication model as a single automaton, , which stores the set of in-flight messages that have been sent, but are yet to be delivered. When an agent sends a message , it invokes a send(m) action. This action adds to the in-flight set. At any arbitrary time, the chooses a message in the in-flight set to either delivers it to its recipient or removes it from the set. All messages are assumed to be unique and each message contains its sender and recipient identities. Let be the set of all possible messages used in communication between agents. The sending and receiving messages by agent are denoted by and , respectively.

Agent Model

The agent is modeled as a hybrid automaton [henzinger1996lics, lynch1996hybrid] defined by the tuple , where:

  1. is a set of variables consisting of the following: i) a set of continuous variables including a special variable which records the agent’s local time, and ii) a set of discrete variables including the special variable that records all sent and received messages. A valuation is a function that associates each to a value in its type. We write for the set of all possible valuations of . We abuse the notion of to denote a state of , which is a valuation of all variables in . The set is called the set of states.

  2. is a set of actions consisting of the following subsets: i) a set of send actions (i.e., output actions), ii) a set of receive actions (i.e., input actions), and iii) a set of other, ordinary actions.

  3. is called the set of transitions. For a transition , we write in short. i) If or , then all the components of and are identical except that is added to in . That is, the agent’s other states remain the same on message sends and receives. Furthermore, for every state and every receive action , there must exist a such that , i.e., the automaton must have well-defined behavior for receiving any message in any state. ii) If , then .

  4. is a collection of trajectories for . Each trajectory of is a function mapping an interval of time to , following a flow rate that specifies how a real variable evolving over time. We denote the duration of a trajectory as , which is the right end-point of the interval .

Agent Semantics

The behavior of each agent can be defined based on the concept of an execution which is a particular run of the agent. Given an initial state , an execution of an agent is a sequence of states starting from , defined as , and for each index in the sequence, the state update from to is either a transition or trajectory. A state is reachable if there exists an executing that ends in . We denote as the reachable set of agent .

System Model

The formal model of the complete system, denoted as , is a network of hybrid automata that is obtained by parallel composing the agent’s models and the communication channel. Formally, we can write, . Informally, the agent and the communication channel are synchronized through sending and receiving actions. When the agent sends a message to the agent , it triggers the action. At the same time, this action is synchronized in the automaton by putting the message in the in-flight set. After that, the will trigger (non-deterministically) the action. This action is synchronized in the agent by putting the message into the .

In this paper, we investigate three real-time safety verification problems for distributed cyber-physical systems as defined in the following.

Problem 1 (Local safety verification in real-time)

The real-time local safety verification problem is to compute online the reachable set of the agent and verify if it violates the local safety property, i.e., checking , where is the unsafe set of the agent.

Problem 2 (Decentralized real-time collision verification)

The decentralized real-time collision verification problem is to reason in real-time whether an agent will collide with other agents from its current local time to the computable, safe time instance in the future based on i) the clock mismatches, and ii) the exchanging reachable set messages between agents. Formally, we require that , where is the distance between agents and at the time of the agent local clock, and is the allowable safe distance between agents.

Problem 3 (Decentralized real-time global safety verification)

The decentralized real-time global safety verification problem is to construct online (at each agent) the reachable set of all agents and verify if it violates the global safety property, i.e., checking , where , , is the unsafe set of the whole system.

3 Real-Time Local Safety Verification

The first important step in our approach is, each agent computes forwardly its reachable set of states from the current local time to the next seconds which is defined by . Since there are many variables used in the agent modeling that are irrelevant in safety verification, we only need to compute the reachable set of state that is related to the agent’s physical dynamics (so called as motion dynamics) which is defined by a nonlinear ODE , where

is state vector and

is the control input vector. The agent can switch from one mode to the another mode via discrete transitions, and in each mode, the control law may be different. When the agent computes its reachable set, the only information it needs are its current set of states and the current control input . It should be clarified that although the control law may be different among modes, the control signal is updated with the same control period . Consequently, is a constant vector in each control period.

Assuming that the agent’s current time is , using its local sensors and GPS, we have the current state of the agent . Note that the local sensors and the provided GPS can only provide the information of interest to some accuracy, therefore the actual state of the agent is in a set . The control signal is computed based on the state and a reference signal, e.g., a set point denoting where the agent needs to go to, and then computed control signal is applied to the actuator to control the motion of the agent. From the current set of states and the control signal , we can compute the forward reachable set of the agent for the next seconds. This reachable set computation needs to be completed after an amount of time because if , a new will be updated. The control period is chosen based on the agent’s motion dynamics, and thus to control an agent with fast dynamics, the control period needs to be sufficiently small. This is the source of the requirement that the allowable run-time for reachable set computation be small.

To compute the reachable set of an agent in real-time, we use the well-known face-lifting method [dang1998hscc, bak2014real] and a hyper-rectangle to represent the reachable set. This method is useful for short-time reachability analysis of real-time systems. It allows users to define an allowable run-time , and has no dynamic data structures, recursion, and does not depend on complex external libraries as in other reachability analysis methods. More importantly, the accuracy of the reachable set computation can be iteratively improved based on the remaining allowable run-time.

Input: , , , , , ,

Output: , or

1:procedure Initialization
2:               % Reach time step
3:         % Remaining run-time
4:procedure Reachability Analysis
5:     while  do
6:                  % Current reachable set
8:                   % Remaining reach time
9:          while  do
10:               % Do Single Face Lifting
12:                   % Update reach set
13:                   % Update remaining reach time
14:               if  then:                
16:          % Update remaining runtime
18:          if  then break
19:          else
20:                   % Reduce reach time step                
21:     return
Algorithm 3.1 Real-time reachability analysis for agent .

Algorithm 3.1 describes the real-time reachability analysis for one agent. The Algorithm works as follows. The time period is divided by steps. The reach time step is defined by . Using the reach time step and the current set , the face-lifting method performs a single-face-lifting operation. The results of this step are a new reachable set and a remaining reach time . This step is iteratively called until the reachable set for the whole time period of interest is constructed completely, i.e., the remaining reach time is equal to zero. Interestingly, with the reach time step size defined above, the face-lifting algorithm may be finished quickly after an amount of time which is smaller than the allowable run-time specified by user, i.e., there is still an amount of time called remaining run time that is available for us to recall the face-lifting algorithm with a smaller reach time step size, for example, we can recall the face-lifting algorithm with a new reach time step . By doing this, the conservativeness of the reachable set can be iteratively improved. The core step of face-lifting method is the single-face-lifting operation. We refer the readers to [bak2014real] for further detail. As mentioned earlier, the local safety property of each agent can be verified at run-time simultaneously with the reachable set computation process. Precisely, let be the unsafe region of the agent, the agent is said to be safe from to if . Since the reachable set is given by the face-lifting method at run-time, the local safety verification problem for each agent can be solved at run-time. Since the Algorithm 3.1 computes an over-approximation of the reachable set of each agent in a short time interval, it guarantees the soundness of the result as described in the following lemma.

Lemma 1

[dang1998hscc, bak2014real] The real-time reachability analysis algorithm is sound, i.e., the computed reachable set contains all possible trajectories of agent from to .

4 Decentralized Real-Time Collision Verification

Our collision verification scheme is performed based on the exchanged reachable set messages between agents. For every control period , each agent executes the real-time reachability analysis algorithm to check if it is locally safe and to obtain its current reachable set with respect to its current control input. When the current reachable set is available, the agent encodes the reachable set in a message and then broadcasts this message to its cooperative agents and listens to the upcoming messages sent from these agents. When a reachable set message arrives, the agent immediately decodes the message to construct the current reachable set of the sender and then performs peer-to-peer collision detection. The process of computing, encoding, transferring, decoding of the reachable set along with collision checking is illustrated in Figure 1 based on the agent’s local clock.

Figure 1: Timeline for reachable set computing, encoding, transferring, decoding and collision checking.

Let , , , , and respectively be the instants that we compute, encode, transfer, decode the reachable set and do collision checking on the agent . Note that these time instants are based on the agent ’s local clock. The actual run-times are defined as follows.

Note that we do not know the exact transfer time since it depends on two different local time clocks. The above transfer time formula describes its approximate value when neglecting the mismatch between the two local clocks. The actual reachable set computation time is close to the allowable run-time chosen by user, i.e., . We will see later that the encoding time and decoding time are fairly small in comparison with the transferring time, i.e., . All of these run-times provide useful information for selecting an appropriate control period for an agent. However, for collision checking purpose, we only need to consider the time instants that an agent starts computing reachable set and checking collision .

A reachable set message contains three pieces of information: the reachable set which is a list of intervals, the time period (based on the local clock) in which this reachable set is valid, i.e., the start time and the end time and the time instant that this message is sent. Based on the timing information of the reachable set and the time-synchronization errors, an agent can examine whether or not a received reachable set contains information about the future behavior of the sent agent which is useful for checking collision. The usefulness of the reachable sets used in collision checking is defined as follows.

Figure 2: Useful reachable set.
Definition 1 (Useful reachable sets)

Let and respectively be the time-synchronization errors of agent and in comparison with the virtual global time t, i.e, and , where and are current local times of and respectively. The reachable sets and of the agent that are available at the agent at time are useful for checking collision between and if:


Assume that we are at a time instant where the agent checks if a collision occurs. This means that the current local time is . Note that agent and are synchronized to the global time with errors and respectively. The reachable set is useful if it contains information about the future behavior of agent under the view of the agent based on its local clock. This can be guaranteed if we have: . Additionally, the current reachablet set of agent contains information about its future behavior if as depicted in Figure 2. We can see that if , then the reachable set of contains a past information, and thus it is useless for checking collision. One interesting case is when . In this case, we do not know whether the received reachable set is useful or not.

Remark 1

We note that the proposed approach does not rely on the concept of Lamport happens-before relation [lamport1978time] to compute the local reachable set of each agent. If the agent could not receive reachable messages from others until a requested time-stamp expires, it still calculates the local reachable set based on its current state and the state information of other agents in the messages it received previously. In other words, our method does not require the reachable set of each agent to be computed corresponding to the ordering of the events (sending or receiving a message) in the system, but only relies on the local clock period and the time-synchronization errors between agents. Such implementation ensures that the computation process can be accomplished in real-time, and is not affected by the message transmission delay.

Input: ,  % safe distance between agents

Output:   % collision flag and safe time interval in the future

1:procedure Peer-to-Peer Collision Detection
2:     if new message arrive then
3:          decode message
4:             % current time
5:             % current reachable set start time
6:          if  and  then   % check usefulness
7:               compute possible minimum distance between two agents
8:               if  then
9:                    Collision = false
11:               else
12:                    Collision = uncertain,                
13:               store the message                
Algorithm 4.2 Decentralized Real-Time Collision Verification at Agent .

The peer-to-peer collision checking procedure depicted in Algorithm 4.2 works as follows: when a new reachable set message arrives, the receiving agent decodes the message and checks the usefulness of the received reachable set and its current reachable set. Then, the agent combines its current reachable set and the received reachable set to compute the minimum possible distance between two agents. If the distance is larger than an allowable threshold , there is no collision between two agents in some known time interval in the future, i.e., .

Lemma 2

The decentralized real-time collision verification algorithm is sound.


From Lemma 1, we know that the received reachable set contains all possible trajectories of the agent from to . Also, the current reachable set of the agent , , contains all possible trajectories of the agent from to . If those reachable sets are useful, then they contains all possible trajectories of two agents from to sometime in the future based on the agent clock. Therefore, the minimum distance between two agents computed from two reachable sets is the smallest distance among all possible distances in the time interval . Consequently, the collision free guarantee is sound in the time interval .

We have studied how to use exchanged reachable sets to do peer-to-peer collision detection. Next, we consider how to verify online the global behavior of a distributed CPS in decentralized manner.

5 Decentralized Real-Time Global Safety Verification

Definition 2 (Globally useful reachable set.)

Consider a distributed CPS with agents with time synchronization errors , a globally useful reachable set of the whole system under the view of agent based on its current local time clock is defined below:


For any time such that for , we have . In other words, contains all possible trajectories of all agents from the current local time of agent to the future time defined by .

It should be noted that to construct a global reachable set, an agent needs to wait for all messages arrive and then decodes all these messages. This process may have an expensive computation cost especially when the number of agents increases. Since this global reachable set is only valid in an interval of time, the amount of time that is available for verify the global property may be small and not enough for the agent to perform the global safety verification. Having additional hardware for handling in parallel the processes of receiving/decoding messages is a good solution to overcome this challenge.

Using the globally useful reachable set, the global safety verification problem is equivalent to checking whether the globally useful reachable set intersects with the global unsafe region defined by , where and is the state vector of agent . The procedure for global safety verification is summarized in Algorithm 5.3.

Input: ,   % global unsafe constraints

Output:   % global safe flag and safe time interval in the future

1:procedure Initialization
2:           % global safety flag
3:procedure Global Safety Verification
4:     if all useful messages are available then
6:          recheck if all messages are still useful
7:          construct globally useful reach set
8:          if  then
11:          else
Algorithm 5.3 Decentralized Real-Time Global Safety Verification at Agent .
Lemma 3

The decentralized real-time global safety verification algorithm is sound.


Similar to Lemma 2, the soundness of the verification algorithm is guarantee because of the soundness of the globally useful reachable set containing all possible trajectories of all agents at any time , where .

6 Case study

The decentralized real-time safety verification for distributed CPS proposed in this paper is implemented in Java as a package called . This package is currently integrated as a library in StarL, which is a novel platform-independent framework for programming reliable distributed robotics applications on Android [DBLP:journals/corr/LinM15]. StarL is specifically suitable for controlling a distributed network of robots over WiFi since it provides many useful functions and sophisticated algorithms for distributed applications. In our approach, we use the reliable communication network of StarL which is assumed to be asynchronous and peer-to-peer. There may be message dropouts and transmission delays; however, every message that an agent tries to send is eventually delivered with some time guarantees. All experimental results of our approach are reproducible and available online at:

6.1 Experiment setup

Figure 3: Distributed Search Application Using Quadcopters.

We evaluate the proposed approach via a distributed search application using quadcopters222A video recording is available at: in which each quadcopter executes its search mission provided by users as a list of way-points depicted in Figure 3. These quadcopters follow the way-points to search for some specific objects. For safety reasons, they are required to work only in a specific region defined by users. In this case study, the quadcopters are controlled to operate at the same constant altitude. It has been shown from the experiments that the proposed approach is promisingly scalable as it works well for a different number of quadcopters. We choose to present in this section the experimental results for the distributed search application with eight quadcopters.

The first step in our approach is locally computing the reachable set of each quadcopter using face-lifting method. The quadcopter has nonlinear motion dynamics given in Equation 3 in which , , and are the pitch, roll, and yaw angles, is the sum of the propeller forces, is the mass of the quadcopter and is the gravitational acceleration constant. As the quadcopter is set to operate on a constant altitude, we have which yields the following constraint: . Let and be the velocities of a quadcopter along with x- and y- axes. Using the constraint on the total force, the motion dynamics of the quadcopter can be rewritten as a -dimensional nonlinear ODE as depicted in Equation 4.


A PID controller is designed to control the quadcopter to move from its current position to desired way-points. Details about the controller parameters can be found in the available source code. The PID controller has a control period of milliseconds. In every control period, the control inputs pitch and roll are computed based on the current positions of the quadcopter and the current target position (i.e., the current way-point it needs to go). Using the control inputs, the current positions and velocities given from GPS and the motion dynamics of the quadcopter, the real-time reachable set computation algorithm (Algorithm 3.1) is executed inside the controller. This algorithm computes the reachable set of a quadcopter from its current local time to the next seconds. The allowable run-time for this algorithm is milliseconds. The local safety property is verified by the real-time reachable set computation algorithm at run-time. The computed reachable set is then encoded and sent to another quadcopter. When a reachable set message arrives, the quadcopter decodes the message to reconstruct the current reachable set of the sender. The GPS error is assumed to be . The time-synchronization error between the quadcopters is milliseconds.

Figure 4: A sample of events for verifying local safety property and collision occurrence.
Figure 5: One sample of the reachable sets of eight quadcopters in time interval and their interval hulls.

We want to verify in real-time: 1) local safety property for each quadcopter; 2) collision occurrence; and 3) geospatial free property. The local safety property is defined by , i.e., the maximum allowable velocities along the x-axis of two arbitrary quadcopters are not larger than . The collision is checked using the minimum allowable distance between two arbitrary quadcopters . The geospatial free property requires that the some quadcopters never go into a specific region at the same time.

6.2 Verifying local safety property and collision occurrence

Figure 4 presents a sample of a sequence of events happening in the distributed search application. One can see that each quadcopter can determine based on its local clocks if there is no collision to some known time in the future. In addition, the local safety property can also be verified at run-time. For example, in the figure, the quadcopter receives a reachable set message from the quadcopter which is valid from to of the quadcopter ’s clock. After decoding this message, taking into account the time-synchronization error , quadcopter realizes that the received reachable set message is useful for checking collision for the next seconds of its clock. After checking collision, quadcopter 1 knows that it will not collide with the quadcopter 0 in the next 1.645 seconds (based on its clock).

It should be noted that we can intuitively verify the collision occurrences by observing the intermediate reachable sets of all quadcopters and their interval hulls. The intermediate reachable sets of the quadcopters in every time interval computed by the real-time reachable set computation algorithm (i.e., Algorithm 3.1) is described in Figure 5. The zoom plot within the figure presents a very short-time interval reachable set of the quadcopters. We note that the intermediate reachable set of a quadcopter is represented as a list of hyper-rectangles and is used for verifying the local safety property at run-time. The reachable set that is sent to another quadcopter is the interval hull of these hyper-rectangles. The intermediate reachable set cannot be transferred via a network since it is very large (i.e., hundreds of hyper-rectangles). The interval hull of all hyper-rectangles contained in the intermediate reachable set covers all possible trajectories of a quadcopter in the time interval of . Therefore, it can be used for safety verification. One may question why we use the interval hull instead of using the convex hull of the reachable set since the former one results in a more conservative result. The reason is that we want to perform the safety verification online, convex hull of hundreds of hyper-rectangles is a time-consuming operation. Therefore, in the real-time setting, interval hull operation is a suitable solution. From the figure, we can see that the interval hulls of the reachable set of all quadcopters do not intersect with each other. Therefore, there is no collision occurrence (in the next 2 seconds of global time).

Time Quad. 1 Quad. 2 Quad. 3 Quad. 4 Quad. 5 Quad. 6 Quad. 7 Quad. 8
Ecoding Time (ms) 0.058 0.055 0.0553 0.0525 0.0557 0.0583 0.0584 0.0597
Decoding Time (ms) 0.0169 0.0193 0.0197 0.019 0.0210 0.0181 0.0177 0.022
Transferring Time (ms) 2.64 2.48 1.42 1.11 1.12 1.08 1.05 1.13
Collision Checking Time (ms) 0.04 0.05 0.07 0.05 0.03 0.07 0.07 0.14
Total Verification Time (ms) 28.9363 27.9 20.6232 18.3055 18.2527 18.235 18.0223 19.1037
Table 1: The average encoding time , decoding time , transferring time , collision checking time and total verification time of the quadcopters.

Since we implement the decentralized real-time safety verification algorithm inside the quadcopter’s controller, it is important to analyze whether or not the verification procedure affects the control performance of the controller. To reason about this, we measure the average encoding, decoding, transferring and collision checking times for all quadcopters using samples which are presented in Table 1. We note that the transferring time is the average time for one message transferred from other quadcopters to the quadcopter. It can be seen that the encoding, decoding and collision checking times at each quadcopter constitute a tiny amount of time. The total verification time is the sum of the reachable set computation, encoding, transferring, decoding and collision checking times. Note that the allowable runtime for reachable set computation algorithm is specified by users as milliseconds. Therefore, the (average) total time for the safety verification procedure on each quadcopter is , where , and is the number of quadcopters. As shown in the Table, the (average) total verification time for each quadcopter is small ( milliseconds), compared to the control period milliseconds. Besides, from the experiment, we observe that the computation time for the control signal of the PID controller (not presented in the table) is also small, i.e., from to milliseconds. Since milliseconds, we can conclude that the verification procedure does not affect the control performance of the controller.

Interestingly, from the verification time formula, we can estimate the range of the number of agents that the decentralized real-time verification procedure can deal with. The idea is that, in each control period

, after computing the control signal, the remaining time bandwidth can be used for verification. Let , , , be the maximum (minimum) encoding, transferring, decoding and collision checking times on a quadcopter, be the maximum (minimum) control signal computation time for each control period , then the number of agents that the decentralized real-time safety verification procedure can deal with (with assumption that the communication network works well) satisfies the following constraint:


Let consider our case study, from the Table, we assume that , , , , , , , milliseconds. Also, we assume that and milliseconds. We can estimate theoretically the number of quadcopters that our verification approach can deal with is .

6.3 Verifying geospatial free property

To illustrate how our approach verifies the global behavior of a distributed CPS, we consider the geospatial free property which requires that the some (or all) quadcopters never go into a specific region at the same time. For simplification, we reconsider the distributed search application with two quadcopters (quad 1 and quad 2) whose forbidden region is defined by . Figure 6 describes a sample of events describing that the quadcopter 2 can verify based on its local clock that it will not collide with the quadcopter 1 and the global geospatial free property is guarantee in the next seconds.

Figure 6: A sample of events for verifying geospatial free property.

7 Discussion

Figure 7: Software architecture for deploying decentralized real-time safety verification approach on a real platform.

The current implementation of our approach deploys the safety verifier of each agent inside the controller, and a single thread is used to execute the control and verification tasks. The main drawback of this implementation is that it may decrease the overall performance of the controller and even cause the controller to crash. To prevent this happens, in practice, the controller and verifier should be implemented in two separate software components. In this case, the computation burden for safety checks in the verifier does not affect the performance of the controller. The control task and the verification task can be executed efficiently in parallel as depicted in Figure 7. More importantly, this software architecture adopts the architecture of a fault-tolerant system [goodloe2010monitoring] to prevent the propagation of failure from one component to others. It also benefits the use of simplex-architecture for safety control in the case of dangerous circumstances.

As shown in Figure 7, the verifier component consists of four sub-components including reachable set calculator, encoder, decoder, and safety checker. These sub-components should also be implemented conveniently for parallel execution. The local safety property is verified inside the reachable set calculator at runtime. As the number of reachable set messages needs to be decoded increases with the number of participating agents, it is necessary to have multiple decoders working in parallel. These decoders listen to upcoming reachable set messages on different ports assigned to them by the verifier and immediately decode any arrived message. This parallel decoding helps to reduce the decoding time significantly. The decoded reachable sets are then sent to the safety checker containing multiple checkers run in parallel in which each checker is responsible for checking collision between the agent with another. The checker and the decoder is a pair worker, i.e., the checker only waits for the decoded reachable set of its corresponding co-worker. Therefore, the pair to pair collision detection task can be done very quickly. The safety checker also has a global checker which is responsible for checking global properties. The global checker is only triggered when the decoder component finishes decoding all arrived reachable set messages. For this reason, having parallel working decoders is essential to speed up the overall verification time which is required to be very small to work in the real-time setting.

To analyze how fast our verification technique can achieved with the proposed software architecture, let and respectively be the worst case times of reachable set computation, encoding, transferring and decoding, and be the worst case times of peer-to-peer collision detection and global safety verification. For a system with agents, the total worst-case verification time is . If we do the verification in sequential way, i.e., using only one port for reachable set communication and one checker for all peer-to-peer collision detection and global safety verification, the total worst-case verification is: .

Scalability. From the above discussion, one can see that the software architecture plays an important role when we implement our approach in a real platform. In practice, if each participating agent has the powerful hardware for communication and computation, and the software for our approach is implemented in a parallel manner as proposed above, then the worst-case verification time does not depend on the number of agents in the system. Therefore, our decentralized real-time safety verification approach is scalable for systems with a large number of agents. Also, the proposed software architecture is especially useful in the case that there are losses of reachable set messages. In this hazardous situation, the agent still has some partial information to check if a collision occurs based on the available, reachable set messages. Therefore, the planner still can re-perform path planning algorithm based on the current information and past information it has to find the safest path for the agent for this incomplete information situation.

8 Related Work

Our work is inspired by the static and dynamic analysis of timed distributed traces [duggirala2012static] and the real-time reachability analysis for verified simplex design [bak2014real]. The former one proposes a sound method of constructing a global reachable set for a distributed CPS based on the recorded traces and time synchronization errors of participating agents. Then the global reachable set is used to verify a global property using Z3 [de2008z3]. This method can be considered to be a centralized analysis where the reachable set of the whole system is constructed and verified by one analyzer. Such a verification approach is offline which is fundamentally different from our approach as we deal with online verification in a decentralized manner. Our real-time verification method borrows the face-lifting technique developed in [bak2014real] and applies it to a distributed CPS.

Another interesting aspect of real-time monitoring for linear systems was recently published in [chen2017model]. In this work, the authors proposed an approach that combines offline and online computation to decide if a given plant model has entered an uncontrollable state which is a state that no control strategy can be applied to prevent the plant go to the unsafe region. This method is useful for a single real-time CPS, but not a distributed CPS with multiple agents.

Additionally, there has been other significant works for verifying distributed CPS. Authors of [eidson2012distributed, tang2012unified, zhang2008reconfigurable] presented a real-time software for distributed CPS but did not perform a safety verification of individual components and a whole system. The works presented in [johnson2012parametrized, bae2015designing, kumar2012hybrid] can be used to verify distributed CPS, but they do not consider a real-time aspect. An interesting work proposed in [loos2011adaptive] can formally model and verify a distributed car control system against several safety objectives such as collision avoidance for an arbitrary number of cars. However, it does not address the verification problem of distributed CPS in a real-time manner. The novelty of our approach is that it can over-approximate of the reachable set of each agent whose dynamics are non-linear with a high precision degree in real-time.

The most related work to our scheme was recently introduced in [liu2017provably]. The authors proposed an online verification using reachability analysis that can guarantee safe motion of mobile robots with respective to walking pedestrians modeled as hybrid systems. This work utilizes CORA toolbox [althoff2015introduction] to perform reachability analysis while our work uses a face-lifting technique. However, this work does not consider the time-elapse for encoding, transferring and decoding the reachable set messages between each agent, which play an important role in distributed systems.

9 Conclusion and Future Work

We have proposed a decentralized real-time safety verification method for distributed cyber-physical systems. By utilizing the timing information and the reachable set information from exchanged reachable set messages, a sound guarantee about the safety of the whole system is obtained for each participant based on its local time. Our method has been successfully applied for a distributed search application using quadcopters built upon StarL framework. The main benefit of our approach is that it allows participants to take advantages of formal guarantees available locally in real-time to perform intelligent actions in dangerous situations. This work is a fundamental step in dealing with real-time safe motion/path planing for distributed robots. For future work, we seek to deploy this method on a real-platform and extend it to distributed CPS with heterogeneous agents where the agents can have different motion dynamics and thus they have different control periods. In addition, the scalability of the proposed method can be improved by exploiting the benefit of parallel processing, i.e., each agent handles multiple reachable set messages and checks for collision in parallel.


The material presented in this paper is based upon work supported by the Air Force Office of Scientific Research (AFOSR) through contract number FA9550-18-1-0122 and the Defense Advanced Research Projects Agency (DARPA) through contract number FA8750-18-C-0089. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or DARPA.