1 Introduction
The emergence of 5G technology has inspired a massive wave of the research and development in science and technology in the era of IoT where the communication between computing devices has become significantly faster with lower latency and power consumption. The power of this modern communication technology influences and benefits all aspects of CyberPhysical Systems (CPSs) such as smart grids, smart homes, intelligent transportation and smart cities. In particular, the study of autonomous vehicles has become an increasingly popular research field in both academic and industrial transportation applications. Automotive crashes pose significant financial and lifethreatening risks, and there is an urgent need for advanced and scalable methods that can efficiently verify a distributed system of autonomous vehicles.
Over the last two decades, although many methods have been developed to conduct reachability analysis and safety verification of CPS, such as the approaches proposed in [le2009reachability, girard2006efficient, althoff2015introduction, henzinger1997hytech, frehse2011spaceex, chen2013flow, kong2015dreach, bak2017hylaa, bak2017simulation, tran2017order, tran2019formats], applying these techniques to realtime distributed CPS remains a big challenge. This is due to the fact that, 1) all existing techniques have intensive computation costs and are usually too slow to be used in a realtime manner and, 2) these techniques target the safety verification of a single CPS, and therefore they naturally cannot be applied efficiently to a distributed CPS where clock mismatches and communication between agents (i.e., individual systems) are essential concerns. Since the future autonomous vehicles systems will work distributively involving effective communication between each agent, there is an urgent need for an approach that can provide formal guarantees of the safety of distributed CPS in realtime. More importantly, the safety information should be defined based on the agents local clocks to allow these agents to perform “intelligent actions” to escape from the upcoming dangerous circumstances. For example, if an agent A knows based on its local clock that it will collide with an agent B in the next 5 seconds, it should perform an action such as stopping or quickly finding a safe path to avoid the collision.
In this paper^{1}^{1}1This paper is an extension of [tran2019forte], we propose a decentralized realtime reachability approach for safety verification of a distributed CPS with multiple agents. We are particularly interested in two types of safety properties. The first one is a local safety property which specifies the local constraints of the agent operation. For example, each agent is only allowed to move within a specific region, does not hit any obstacles, and its velocity needs to be limited to specific range. This type of property does not require the information of other agents and can be verified locally at runtime. The second safety property is a global property defined on the states of multiple agents. Particularly, we consider a peertopeer collision free property and a generalized property where we want to verify if all agents satisfy a set of linear constraints (on the states of all agents) defining the property, e.g., two agents do not go into the same region at the same time.
Our decentralized realtime reachability approach works as follows. Each agent locally and periodically computes the local reachable set from the current local time to the next seconds, and then encodes and broadcasts its reachable set information to the others via a communication network. When the agent receives a reachable set message, it immediately decodes the message to read the reachable set information of the sender, and then performs peertopeer collision checking based on its current state and the reachable set of the sender. Verifying a generalized global property involving the states of agents is done at the time an agent receives all needed reachable sets from other agents. Additionally, the local safety property of the agent is verified simultaneously with the reachable set computation process at runtime. The proposed verification approach is based on an underlying assumption that is, all agents are timesynchronized to some level of accuracy. This assumption is reasonable as it can be achieved by using existing time synchronization protocols such as the Network Time Protocol (NTP). Our approach has successfully verified in realtime the local safety properties and collision occurrences for a group of quadcopters conducting a search mission.
The rest of the paper is organized as follows. Section 2 presents briefly the distributed CPS modeling and its verification problems. Section 3 gives the detail of realtime reachability for single agent and how to use it for realtime local safety verification. Section 4 addresses the utilization reachable set messages for checking peertopeer collision. Section 5 investigates the global safety verification problem. Section 6 presents the implementation and evaluation of our approach via a distributed search application using quadcopters.
2 Problem Formulation
In this paper, we consider a distributed CPS with agents that can communicate with each other via an asynchronous communication channel.
Communication Model
The communication between agents is implemented by the actions of sending and receiving messages over an asynchronous communication channel. We formally model this communication model as a single automaton, , which stores the set of inflight messages that have been sent, but are yet to be delivered. When an agent sends a message , it invokes a send(m) action. This action adds to the inflight set. At any arbitrary time, the chooses a message in the inflight set to either delivers it to its recipient or removes it from the set. All messages are assumed to be unique and each message contains its sender and recipient identities. Let be the set of all possible messages used in communication between agents. The sending and receiving messages by agent are denoted by and , respectively.
Agent Model
The agent is modeled as a hybrid automaton [henzinger1996lics, lynch1996hybrid] defined by the tuple , where:

is a set of variables consisting of the following: i) a set of continuous variables including a special variable which records the agent’s local time, and ii) a set of discrete variables including the special variable that records all sent and received messages. A valuation is a function that associates each to a value in its type. We write for the set of all possible valuations of . We abuse the notion of to denote a state of , which is a valuation of all variables in . The set is called the set of states.

is a set of actions consisting of the following subsets: i) a set of send actions (i.e., output actions), ii) a set of receive actions (i.e., input actions), and iii) a set of other, ordinary actions.

is called the set of transitions. For a transition , we write in short. i) If or , then all the components of and are identical except that is added to in . That is, the agent’s other states remain the same on message sends and receives. Furthermore, for every state and every receive action , there must exist a such that , i.e., the automaton must have welldefined behavior for receiving any message in any state. ii) If , then .

is a collection of trajectories for . Each trajectory of is a function mapping an interval of time to , following a flow rate that specifies how a real variable evolving over time. We denote the duration of a trajectory as , which is the right endpoint of the interval .
Agent Semantics
The behavior of each agent can be defined based on the concept of an execution which is a particular run of the agent. Given an initial state , an execution of an agent is a sequence of states starting from , defined as , and for each index in the sequence, the state update from to is either a transition or trajectory. A state is reachable if there exists an executing that ends in . We denote as the reachable set of agent .
System Model
The formal model of the complete system, denoted as , is a network of hybrid automata that is obtained by parallel composing the agent’s models and the communication channel. Formally, we can write, . Informally, the agent and the communication channel are synchronized through sending and receiving actions. When the agent sends a message to the agent , it triggers the action. At the same time, this action is synchronized in the automaton by putting the message in the inflight set. After that, the will trigger (nondeterministically) the action. This action is synchronized in the agent by putting the message into the .
In this paper, we investigate three realtime safety verification problems for distributed cyberphysical systems as defined in the following.
Problem 1 (Local safety verification in realtime)
The realtime local safety verification problem is to compute online the reachable set of the agent and verify if it violates the local safety property, i.e., checking , where is the unsafe set of the agent.
Problem 2 (Decentralized realtime collision verification)
The decentralized realtime collision verification problem is to reason in realtime whether an agent will collide with other agents from its current local time to the computable, safe time instance in the future based on i) the clock mismatches, and ii) the exchanging reachable set messages between agents. Formally, we require that , where is the distance between agents and at the time of the agent local clock, and is the allowable safe distance between agents.
Problem 3 (Decentralized realtime global safety verification)
The decentralized realtime global safety verification problem is to construct online (at each agent) the reachable set of all agents and verify if it violates the global safety property, i.e., checking , where , , is the unsafe set of the whole system.
3 RealTime Local Safety Verification
The first important step in our approach is, each agent computes forwardly its reachable set of states from the current local time to the next seconds which is defined by . Since there are many variables used in the agent modeling that are irrelevant in safety verification, we only need to compute the reachable set of state that is related to the agent’s physical dynamics (so called as motion dynamics) which is defined by a nonlinear ODE , where
is state vector and
is the control input vector. The agent can switch from one mode to the another mode via discrete transitions, and in each mode, the control law may be different. When the agent computes its reachable set, the only information it needs are its current set of states and the current control input . It should be clarified that although the control law may be different among modes, the control signal is updated with the same control period . Consequently, is a constant vector in each control period.Assuming that the agent’s current time is , using its local sensors and GPS, we have the current state of the agent . Note that the local sensors and the provided GPS can only provide the information of interest to some accuracy, therefore the actual state of the agent is in a set . The control signal is computed based on the state and a reference signal, e.g., a set point denoting where the agent needs to go to, and then computed control signal is applied to the actuator to control the motion of the agent. From the current set of states and the control signal , we can compute the forward reachable set of the agent for the next seconds. This reachable set computation needs to be completed after an amount of time because if , a new will be updated. The control period is chosen based on the agent’s motion dynamics, and thus to control an agent with fast dynamics, the control period needs to be sufficiently small. This is the source of the requirement that the allowable runtime for reachable set computation be small.
To compute the reachable set of an agent in realtime, we use the wellknown facelifting method [dang1998hscc, bak2014real] and a hyperrectangle to represent the reachable set. This method is useful for shorttime reachability analysis of realtime systems. It allows users to define an allowable runtime , and has no dynamic data structures, recursion, and does not depend on complex external libraries as in other reachability analysis methods. More importantly, the accuracy of the reachable set computation can be iteratively improved based on the remaining allowable runtime.
Algorithm 3.1 describes the realtime reachability analysis for one agent. The Algorithm works as follows. The time period is divided by steps. The reach time step is defined by . Using the reach time step and the current set , the facelifting method performs a singlefacelifting operation. The results of this step are a new reachable set and a remaining reach time . This step is iteratively called until the reachable set for the whole time period of interest is constructed completely, i.e., the remaining reach time is equal to zero. Interestingly, with the reach time step size defined above, the facelifting algorithm may be finished quickly after an amount of time which is smaller than the allowable runtime specified by user, i.e., there is still an amount of time called remaining run time that is available for us to recall the facelifting algorithm with a smaller reach time step size, for example, we can recall the facelifting algorithm with a new reach time step . By doing this, the conservativeness of the reachable set can be iteratively improved. The core step of facelifting method is the singlefacelifting operation. We refer the readers to [bak2014real] for further detail. As mentioned earlier, the local safety property of each agent can be verified at runtime simultaneously with the reachable set computation process. Precisely, let be the unsafe region of the agent, the agent is said to be safe from to if . Since the reachable set is given by the facelifting method at runtime, the local safety verification problem for each agent can be solved at runtime. Since the Algorithm 3.1 computes an overapproximation of the reachable set of each agent in a short time interval, it guarantees the soundness of the result as described in the following lemma.
Lemma 1
[dang1998hscc, bak2014real] The realtime reachability analysis algorithm is sound, i.e., the computed reachable set contains all possible trajectories of agent from to .
4 Decentralized RealTime Collision Verification
Our collision verification scheme is performed based on the exchanged reachable set messages between agents. For every control period , each agent executes the realtime reachability analysis algorithm to check if it is locally safe and to obtain its current reachable set with respect to its current control input. When the current reachable set is available, the agent encodes the reachable set in a message and then broadcasts this message to its cooperative agents and listens to the upcoming messages sent from these agents. When a reachable set message arrives, the agent immediately decodes the message to construct the current reachable set of the sender and then performs peertopeer collision detection. The process of computing, encoding, transferring, decoding of the reachable set along with collision checking is illustrated in Figure 1 based on the agent’s local clock.
Let , , , , and respectively be the instants that we compute, encode, transfer, decode the reachable set and do collision checking on the agent . Note that these time instants are based on the agent ’s local clock. The actual runtimes are defined as follows.
Note that we do not know the exact transfer time since it depends on two different local time clocks. The above transfer time formula describes its approximate value when neglecting the mismatch between the two local clocks. The actual reachable set computation time is close to the allowable runtime chosen by user, i.e., . We will see later that the encoding time and decoding time are fairly small in comparison with the transferring time, i.e., . All of these runtimes provide useful information for selecting an appropriate control period for an agent. However, for collision checking purpose, we only need to consider the time instants that an agent starts computing reachable set and checking collision .
A reachable set message contains three pieces of information: the reachable set which is a list of intervals, the time period (based on the local clock) in which this reachable set is valid, i.e., the start time and the end time and the time instant that this message is sent. Based on the timing information of the reachable set and the timesynchronization errors, an agent can examine whether or not a received reachable set contains information about the future behavior of the sent agent which is useful for checking collision. The usefulness of the reachable sets used in collision checking is defined as follows.
Definition 1 (Useful reachable sets)
Let and respectively be the timesynchronization errors of agent and in comparison with the virtual global time t, i.e, and , where and are current local times of and respectively. The reachable sets and of the agent that are available at the agent at time are useful for checking collision between and if:
(1) 
Assume that we are at a time instant where the agent checks if a collision occurs. This means that the current local time is . Note that agent and are synchronized to the global time with errors and respectively. The reachable set is useful if it contains information about the future behavior of agent under the view of the agent based on its local clock. This can be guaranteed if we have: . Additionally, the current reachablet set of agent contains information about its future behavior if as depicted in Figure 2. We can see that if , then the reachable set of contains a past information, and thus it is useless for checking collision. One interesting case is when . In this case, we do not know whether the received reachable set is useful or not.
Remark 1
We note that the proposed approach does not rely on the concept of Lamport happensbefore relation [lamport1978time] to compute the local reachable set of each agent. If the agent could not receive reachable messages from others until a requested timestamp expires, it still calculates the local reachable set based on its current state and the state information of other agents in the messages it received previously. In other words, our method does not require the reachable set of each agent to be computed corresponding to the ordering of the events (sending or receiving a message) in the system, but only relies on the local clock period and the timesynchronization errors between agents. Such implementation ensures that the computation process can be accomplished in realtime, and is not affected by the message transmission delay.
The peertopeer collision checking procedure depicted in Algorithm 4.2 works as follows: when a new reachable set message arrives, the receiving agent decodes the message and checks the usefulness of the received reachable set and its current reachable set. Then, the agent combines its current reachable set and the received reachable set to compute the minimum possible distance between two agents. If the distance is larger than an allowable threshold , there is no collision between two agents in some known time interval in the future, i.e., .
Lemma 2
The decentralized realtime collision verification algorithm is sound.
Proof
From Lemma 1, we know that the received reachable set contains all possible trajectories of the agent from to . Also, the current reachable set of the agent , , contains all possible trajectories of the agent from to . If those reachable sets are useful, then they contains all possible trajectories of two agents from to sometime in the future based on the agent clock. Therefore, the minimum distance between two agents computed from two reachable sets is the smallest distance among all possible distances in the time interval . Consequently, the collision free guarantee is sound in the time interval .
We have studied how to use exchanged reachable sets to do peertopeer collision detection. Next, we consider how to verify online the global behavior of a distributed CPS in decentralized manner.
5 Decentralized RealTime Global Safety Verification
Definition 2 (Globally useful reachable set.)
Consider a distributed CPS with agents with time synchronization errors , a globally useful reachable set of the whole system under the view of agent based on its current local time clock is defined below:
(2) 
For any time such that for , we have . In other words, contains all possible trajectories of all agents from the current local time of agent to the future time defined by .
It should be noted that to construct a global reachable set, an agent needs to wait for all messages arrive and then decodes all these messages. This process may have an expensive computation cost especially when the number of agents increases. Since this global reachable set is only valid in an interval of time, the amount of time that is available for verify the global property may be small and not enough for the agent to perform the global safety verification. Having additional hardware for handling in parallel the processes of receiving/decoding messages is a good solution to overcome this challenge.
Using the globally useful reachable set, the global safety verification problem is equivalent to checking whether the globally useful reachable set intersects with the global unsafe region defined by , where and is the state vector of agent . The procedure for global safety verification is summarized in Algorithm 5.3.
Lemma 3
The decentralized realtime global safety verification algorithm is sound.
Proof
Similar to Lemma 2, the soundness of the verification algorithm is guarantee because of the soundness of the globally useful reachable set containing all possible trajectories of all agents at any time , where .
6 Case study
The decentralized realtime safety verification for distributed CPS proposed in this paper is implemented in Java as a package called . This package is currently integrated as a library in StarL, which is a novel platformindependent framework for programming reliable distributed robotics applications on Android [DBLP:journals/corr/LinM15]. StarL is specifically suitable for controlling a distributed network of robots over WiFi since it provides many useful functions and sophisticated algorithms for distributed applications. In our approach, we use the reliable communication network of StarL which is assumed to be asynchronous and peertopeer. There may be message dropouts and transmission delays; however, every message that an agent tries to send is eventually delivered with some time guarantees. All experimental results of our approach are reproducible and available online at: http://www.verivital.com/rtreach/.
6.1 Experiment setup
We evaluate the proposed approach via a distributed search application using quadcopters^{2}^{2}2A video recording is available at: https://youtu.be/YC_7BChsIf0 in which each quadcopter executes its search mission provided by users as a list of waypoints depicted in Figure 3. These quadcopters follow the waypoints to search for some specific objects. For safety reasons, they are required to work only in a specific region defined by users. In this case study, the quadcopters are controlled to operate at the same constant altitude. It has been shown from the experiments that the proposed approach is promisingly scalable as it works well for a different number of quadcopters. We choose to present in this section the experimental results for the distributed search application with eight quadcopters.
The first step in our approach is locally computing the reachable set of each quadcopter using facelifting method. The quadcopter has nonlinear motion dynamics given in Equation 3 in which , , and are the pitch, roll, and yaw angles, is the sum of the propeller forces, is the mass of the quadcopter and is the gravitational acceleration constant. As the quadcopter is set to operate on a constant altitude, we have which yields the following constraint: . Let and be the velocities of a quadcopter along with x and y axes. Using the constraint on the total force, the motion dynamics of the quadcopter can be rewritten as a dimensional nonlinear ODE as depicted in Equation 4.
A PID controller is designed to control the quadcopter to move from its current position to desired waypoints. Details about the controller parameters can be found in the available source code. The PID controller has a control period of milliseconds. In every control period, the control inputs pitch and roll are computed based on the current positions of the quadcopter and the current target position (i.e., the current waypoint it needs to go). Using the control inputs, the current positions and velocities given from GPS and the motion dynamics of the quadcopter, the realtime reachable set computation algorithm (Algorithm 3.1) is executed inside the controller. This algorithm computes the reachable set of a quadcopter from its current local time to the next seconds. The allowable runtime for this algorithm is milliseconds. The local safety property is verified by the realtime reachable set computation algorithm at runtime. The computed reachable set is then encoded and sent to another quadcopter. When a reachable set message arrives, the quadcopter decodes the message to reconstruct the current reachable set of the sender. The GPS error is assumed to be . The timesynchronization error between the quadcopters is milliseconds.
We want to verify in realtime: 1) local safety property for each quadcopter; 2) collision occurrence; and 3) geospatial free property. The local safety property is defined by , i.e., the maximum allowable velocities along the xaxis of two arbitrary quadcopters are not larger than . The collision is checked using the minimum allowable distance between two arbitrary quadcopters . The geospatial free property requires that the some quadcopters never go into a specific region at the same time.
6.2 Verifying local safety property and collision occurrence
Figure 4 presents a sample of a sequence of events happening in the distributed search application. One can see that each quadcopter can determine based on its local clocks if there is no collision to some known time in the future. In addition, the local safety property can also be verified at runtime. For example, in the figure, the quadcopter receives a reachable set message from the quadcopter which is valid from to of the quadcopter ’s clock. After decoding this message, taking into account the timesynchronization error , quadcopter realizes that the received reachable set message is useful for checking collision for the next seconds of its clock. After checking collision, quadcopter 1 knows that it will not collide with the quadcopter 0 in the next 1.645 seconds (based on its clock).
It should be noted that we can intuitively verify the collision occurrences by observing the intermediate reachable sets of all quadcopters and their interval hulls. The intermediate reachable sets of the quadcopters in every time interval computed by the realtime reachable set computation algorithm (i.e., Algorithm 3.1) is described in Figure 5. The zoom plot within the figure presents a very shorttime interval reachable set of the quadcopters. We note that the intermediate reachable set of a quadcopter is represented as a list of hyperrectangles and is used for verifying the local safety property at runtime. The reachable set that is sent to another quadcopter is the interval hull of these hyperrectangles. The intermediate reachable set cannot be transferred via a network since it is very large (i.e., hundreds of hyperrectangles). The interval hull of all hyperrectangles contained in the intermediate reachable set covers all possible trajectories of a quadcopter in the time interval of . Therefore, it can be used for safety verification. One may question why we use the interval hull instead of using the convex hull of the reachable set since the former one results in a more conservative result. The reason is that we want to perform the safety verification online, convex hull of hundreds of hyperrectangles is a timeconsuming operation. Therefore, in the realtime setting, interval hull operation is a suitable solution. From the figure, we can see that the interval hulls of the reachable set of all quadcopters do not intersect with each other. Therefore, there is no collision occurrence (in the next 2 seconds of global time).
Time  Quad. 1  Quad. 2  Quad. 3  Quad. 4  Quad. 5  Quad. 6  Quad. 7  Quad. 8 
Ecoding Time (ms)  0.058  0.055  0.0553  0.0525  0.0557  0.0583  0.0584  0.0597 
Decoding Time (ms)  0.0169  0.0193  0.0197  0.019  0.0210  0.0181  0.0177  0.022 
Transferring Time (ms)  2.64  2.48  1.42  1.11  1.12  1.08  1.05  1.13 
Collision Checking Time (ms)  0.04  0.05  0.07  0.05  0.03  0.07  0.07  0.14 
Total Verification Time (ms)  28.9363  27.9  20.6232  18.3055  18.2527  18.235  18.0223  19.1037 
Since we implement the decentralized realtime safety verification algorithm inside the quadcopter’s controller, it is important to analyze whether or not the verification procedure affects the control performance of the controller. To reason about this, we measure the average encoding, decoding, transferring and collision checking times for all quadcopters using samples which are presented in Table 1. We note that the transferring time is the average time for one message transferred from other quadcopters to the quadcopter. It can be seen that the encoding, decoding and collision checking times at each quadcopter constitute a tiny amount of time. The total verification time is the sum of the reachable set computation, encoding, transferring, decoding and collision checking times. Note that the allowable runtime for reachable set computation algorithm is specified by users as milliseconds. Therefore, the (average) total time for the safety verification procedure on each quadcopter is , where , and is the number of quadcopters. As shown in the Table, the (average) total verification time for each quadcopter is small ( milliseconds), compared to the control period milliseconds. Besides, from the experiment, we observe that the computation time for the control signal of the PID controller (not presented in the table) is also small, i.e., from to milliseconds. Since milliseconds, we can conclude that the verification procedure does not affect the control performance of the controller.
Interestingly, from the verification time formula, we can estimate the range of the number of agents that the decentralized realtime verification procedure can deal with. The idea is that, in each control period
, after computing the control signal, the remaining time bandwidth can be used for verification. Let , , , be the maximum (minimum) encoding, transferring, decoding and collision checking times on a quadcopter, be the maximum (minimum) control signal computation time for each control period , then the number of agents that the decentralized realtime safety verification procedure can deal with (with assumption that the communication network works well) satisfies the following constraint:(5) 
Let consider our case study, from the Table, we assume that , , , , , , , milliseconds. Also, we assume that and milliseconds. We can estimate theoretically the number of quadcopters that our verification approach can deal with is .
6.3 Verifying geospatial free property
To illustrate how our approach verifies the global behavior of a distributed CPS, we consider the geospatial free property which requires that the some (or all) quadcopters never go into a specific region at the same time. For simplification, we reconsider the distributed search application with two quadcopters (quad 1 and quad 2) whose forbidden region is defined by . Figure 6 describes a sample of events describing that the quadcopter 2 can verify based on its local clock that it will not collide with the quadcopter 1 and the global geospatial free property is guarantee in the next seconds.
7 Discussion
The current implementation of our approach deploys the safety verifier of each agent inside the controller, and a single thread is used to execute the control and verification tasks. The main drawback of this implementation is that it may decrease the overall performance of the controller and even cause the controller to crash. To prevent this happens, in practice, the controller and verifier should be implemented in two separate software components. In this case, the computation burden for safety checks in the verifier does not affect the performance of the controller. The control task and the verification task can be executed efficiently in parallel as depicted in Figure 7. More importantly, this software architecture adopts the architecture of a faulttolerant system [goodloe2010monitoring] to prevent the propagation of failure from one component to others. It also benefits the use of simplexarchitecture for safety control in the case of dangerous circumstances.
As shown in Figure 7, the verifier component consists of four subcomponents including reachable set calculator, encoder, decoder, and safety checker. These subcomponents should also be implemented conveniently for parallel execution. The local safety property is verified inside the reachable set calculator at runtime. As the number of reachable set messages needs to be decoded increases with the number of participating agents, it is necessary to have multiple decoders working in parallel. These decoders listen to upcoming reachable set messages on different ports assigned to them by the verifier and immediately decode any arrived message. This parallel decoding helps to reduce the decoding time significantly. The decoded reachable sets are then sent to the safety checker containing multiple checkers run in parallel in which each checker is responsible for checking collision between the agent with another. The checker and the decoder is a pair worker, i.e., the checker only waits for the decoded reachable set of its corresponding coworker. Therefore, the pair to pair collision detection task can be done very quickly. The safety checker also has a global checker which is responsible for checking global properties. The global checker is only triggered when the decoder component finishes decoding all arrived reachable set messages. For this reason, having parallel working decoders is essential to speed up the overall verification time which is required to be very small to work in the realtime setting.
To analyze how fast our verification technique can achieved with the proposed software architecture, let and respectively be the worst case times of reachable set computation, encoding, transferring and decoding, and be the worst case times of peertopeer collision detection and global safety verification. For a system with agents, the total worstcase verification time is . If we do the verification in sequential way, i.e., using only one port for reachable set communication and one checker for all peertopeer collision detection and global safety verification, the total worstcase verification is: .
Scalability. From the above discussion, one can see that the software architecture plays an important role when we implement our approach in a real platform. In practice, if each participating agent has the powerful hardware for communication and computation, and the software for our approach is implemented in a parallel manner as proposed above, then the worstcase verification time does not depend on the number of agents in the system. Therefore, our decentralized realtime safety verification approach is scalable for systems with a large number of agents. Also, the proposed software architecture is especially useful in the case that there are losses of reachable set messages. In this hazardous situation, the agent still has some partial information to check if a collision occurs based on the available, reachable set messages. Therefore, the planner still can reperform path planning algorithm based on the current information and past information it has to find the safest path for the agent for this incomplete information situation.
8 Related Work
Our work is inspired by the static and dynamic analysis of timed distributed traces [duggirala2012static] and the realtime reachability analysis for verified simplex design [bak2014real]. The former one proposes a sound method of constructing a global reachable set for a distributed CPS based on the recorded traces and time synchronization errors of participating agents. Then the global reachable set is used to verify a global property using Z3 [de2008z3]. This method can be considered to be a centralized analysis where the reachable set of the whole system is constructed and verified by one analyzer. Such a verification approach is offline which is fundamentally different from our approach as we deal with online verification in a decentralized manner. Our realtime verification method borrows the facelifting technique developed in [bak2014real] and applies it to a distributed CPS.
Another interesting aspect of realtime monitoring for linear systems was recently published in [chen2017model]. In this work, the authors proposed an approach that combines offline and online computation to decide if a given plant model has entered an uncontrollable state which is a state that no control strategy can be applied to prevent the plant go to the unsafe region. This method is useful for a single realtime CPS, but not a distributed CPS with multiple agents.
Additionally, there has been other significant works for verifying distributed CPS. Authors of [eidson2012distributed, tang2012unified, zhang2008reconfigurable] presented a realtime software for distributed CPS but did not perform a safety verification of individual components and a whole system. The works presented in [johnson2012parametrized, bae2015designing, kumar2012hybrid] can be used to verify distributed CPS, but they do not consider a realtime aspect. An interesting work proposed in [loos2011adaptive] can formally model and verify a distributed car control system against several safety objectives such as collision avoidance for an arbitrary number of cars. However, it does not address the verification problem of distributed CPS in a realtime manner. The novelty of our approach is that it can overapproximate of the reachable set of each agent whose dynamics are nonlinear with a high precision degree in realtime.
The most related work to our scheme was recently introduced in [liu2017provably]. The authors proposed an online verification using reachability analysis that can guarantee safe motion of mobile robots with respective to walking pedestrians modeled as hybrid systems. This work utilizes CORA toolbox [althoff2015introduction] to perform reachability analysis while our work uses a facelifting technique. However, this work does not consider the timeelapse for encoding, transferring and decoding the reachable set messages between each agent, which play an important role in distributed systems.
9 Conclusion and Future Work
We have proposed a decentralized realtime safety verification method for distributed cyberphysical systems. By utilizing the timing information and the reachable set information from exchanged reachable set messages, a sound guarantee about the safety of the whole system is obtained for each participant based on its local time. Our method has been successfully applied for a distributed search application using quadcopters built upon StarL framework. The main benefit of our approach is that it allows participants to take advantages of formal guarantees available locally in realtime to perform intelligent actions in dangerous situations. This work is a fundamental step in dealing with realtime safe motion/path planing for distributed robots. For future work, we seek to deploy this method on a realplatform and extend it to distributed CPS with heterogeneous agents where the agents can have different motion dynamics and thus they have different control periods. In addition, the scalability of the proposed method can be improved by exploiting the benefit of parallel processing, i.e., each agent handles multiple reachable set messages and checks for collision in parallel.
Acknowledgments
The material presented in this paper is based upon work supported by the Air Force Office of Scientific Research (AFOSR) through contract number FA95501810122 and the Defense Advanced Research Projects Agency (DARPA) through contract number FA875018C0089. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or DARPA.
Comments
There are no comments yet.