Real-time wireless networks (RTWNs) are fundamental to many Industrial Internet-of-Things (IIoT) applications in a broad range of fields such as military, civil infrastructure and industrial automation [1, 2, 3]. These applications have stringent timing and reliability requirements to ensure timely collection of environmental data and reliable delivery of control decisions. The Quality of Service (QoS) offered by a RTWN is thus often measured by how well it satisfies the end-to-end (from sensors via controllers to actuators) deadlines of the real-time tasks executed in the RTWN. Packet scheduling in RTWNs plays a critical role in achieving the desired QoS. Though packet scheduling in RTWNs has been studied for a long time, the explosive growth of IIoT applications especially in terms of their scale and complexity has dramatically increased the level of difficulty in tackling this inherently challenging undertaking. The fact that most RTWNs must deal with unexpected disturbances and the lossy nature of wireless links in industrial environments further aggravates the problem.
Unexpected disturbances in RTWNs in general can be classified intointernal disturbances within the network infrastructure (e.g., link failure due to multi-user interference or weather related changes in channel signal to noise ratio (SNR)) and external disturbances from the environment being monitored and controlled (e.g., detection of an emergency, sudden pressure or temperature changes). When an external disturbance is detected by a certain sensor node, the workload associated to this sensor node needs to be changed for a certain time duration to more frequently monitor the environment. Many centralized dynamic scheduling approaches have been proposed in the literature, but most of them are designed for handling changes in network resource supply (e.g., [4, 5, 6]). Studies on addressing external disturbances in RTWNs, the focus of this paper, are relatively few. Most of those work rely on centralized decision making and assume reliable network environments. This motivates us to explore a fully distributed framework for handling external disturbances in lossy RTWNs. In the rest of the paper, we simply refer to external disturbance as disturbance.
The challenge of handling disturbances in RTWNs comes from the unpredictability of disturbance occurrence at run time. Specifically, it is generally unknown when/which disturbance will occur and what is the network status at that point (e.g., how many packets have been delivered to their destinations). Since it is computationally infeasible to enumerate all possibilities before the network starts, on-line dynamic scheduling approaches is required to react fast to unexpected workload changes incurred by disturbances.
The existence of lossy wireless links in the industrial environments raises another challenge in handling disturbances in RTWNs. Specifically, the uncertainty of lossy links in the network introduces packet losses with a certain non-zero possibility. Packet loss in a sensing process can significantly degrade the data freshness, and packet loss in a feedback control may lead to system instability and cause safety concerns. Further, if a packet that delivers disturbance-related information is lost, it may cause catastrophe to the system. Thus, most industrial RTWNs require a desired end-to-end Packet Delivery Ratio (PDR), e.g. , for all packets running in the system.
In this work, we introduce a fully distributed packet scheduling framework, referred to as FD-PaS, to handle disturbances in lossy RTWNs.111An earlier version of the paper appeared in . FD-PaS makes on-line decisions locally without any
centralized control point when disturbances occur. This is achieved by sending the disturbance information only to a subset of all nodes via the routing paths of the tasks running in the network. In such a manner, a broadcast task is no longer needed in FD-PaS for notifying all nodes about the disturbance information, which significantly reduces the response time to handle the disturbance. To ensure this partial disturbance propagation scheme works properly, we need to overcome several challenges. For example, to avoid transmission collision among different nodes with inconsistent schedules, we propose a multi-priority wireless packet preemption mechanism called MP-MAC in the data link layer to ensure that high-priority packets can always be delivered by preempting the transmissions of low-priority packets. Further, to minimize the timing and reliability degradation, we formulate a transmission dropping problem to determine a temporary dynamic schedule for individual nodes to handle the disturbance. We prove that the transmission dropping problem is NP-hard, and introduce an efficient heuristic to be executed by individual nodes locally. Both the MP-MAC design and the dynamic schedule construction method (they jointly comprise the FD-PaS framework) are implemented on our RTWN testbed. Our extensive performance evaluation validates the correctness of the FD-PaS design and demonstrates its effectiveness in providing fast response for handling disturbances.
The remainder of this paper is organized as follows. The related work is dicussed in Section 2 and Section 3 describes the system model. Section 4 gives an overview of the FD-PaS framework. We discuss how to propagate disturbances and avoid transmission collisions in Section 5 and 6, respectively. Section 7 formulates the dynamic transmission dropping problem and presents the method to determine the time duration for handling disturbance. Section 8 discusses the dynamic schedule generation in both reliable and lossy RTWNs. Performance evaluation are summarized in Section 9. We conclude the paper and discuss future work in Section 10.
2 Related Work
Network resource management in RTWNs in the presence of unexpected disturbances has drawn a lot of attention in recent years. Traditional static packet scheduling approaches (e.g., [8, 9, 10]), where decisions are made offline or only get updated infrequently can support deterministic real-time communication, but either cannot properly handle unexpected disturbances or must make rather pessimistic assumptions. Many centralized dynamic scheduling approaches for handling internal disturbances have been proposed (e.g., [4, 5, 6]). Studies on addressing external disturbances are relatively few and mostly rely on centralized decision making. The approach in  stores a predetermined number of link layer schedules in the system and chooses the appropriate one when disturbances are detected. However, this approach is either incapable of handling arbitrary disturbances or needs to make some approximation. Both  and  support admission control in response to adding/removing tasks for handling disturbances in the network. They however do not consider scenarios when not all tasks can meet their deadlines. The protocol in  proposes to allocate reserved slots for occasionally occurring emergencies (i.e., disturbances), and allow regular tasks to steal slots from the emergency schedule when no emergency exists. However, how to satisfy the deadlines of regular tasks in the presence of emergencies is not considered.
In recent years, a number of algorithms have been designed for packet scheduling in Time Slotted Channel Hopping (TSCH) networks, in both centralized (e.g. [15, 16, 17]) and distributed manner (e.g. [18, 19, 20]). Most of those approaches, however, assume static network topologies and fixed network traffic which limit their applications in dynamic networks. To overcome this drawback,  proposes Orchestra, a distributed scheduling solution that schedules packet transmissions in TSCH networks to support real-time applications. However, Orchestra does not consider real-time constraint, i.e., ignores the hard deadlines associated with tasks running in the network. It only provides best effort but no guarantee on the end-to-end latency of each task.
In , a centralized dynamic approach, named OLS, to handle disturbances in RTWNs is proposed. OLS is built on a dynamic programming based approach which can be rather time consuming even for relatively small RTWNs. Moreover, OLS may drop more periodic packets than necessary due to the limited payload size of the packet in RTWNs and thus further degrade the system performance. To overcome the drawbacks of OLS, D-PaS in [23, 24] proposes to offload the computation of the dynamic schedules to individual nodes locally by leveraging their local computing capabilities, that is, letting each node construct its own schedule so as to achieve better performance than OLS in terms of fewer dropped packets and lower time overhead. However, as observed from the motivating example presented in , centralized approaches, including D-PaS, suffer from long disturbance response time especially in large RTWNs.
Most MAC layer designs for supporting packet prioritization are based on star topology. For example, the wireless arbitration (WirArb) method  is designed to use different frequencies to indicate different priorities. It only supports star topology where the gateway keeps sensing the arbitration signals and determines which user has a higher priority to access the channel. 
studies a similar problem in the context of vehicular Ad Hoc networks. The proposed multi-priority MAC protocol has seven channels, among which one is the public control channel (CCH) for safety action messages and the others are service channels for non-safety applications. The protocol transmits packets of different priorities with optimal transmission probabilities in a dynamic manner. The PriorityMAC proposes to add two very short sub-slots before each time slot to indicate the priority. Four priority levels are defined but only three levels of over-the-air preemption can be achieved. The last priority level is only used for buffer reordering. In PriorityMAC, a higher priority packet indicates the priority in the sub-slots to deter the transmissions of lower priority packets. PriorityMAC is also based on star topology so each device must be directly connected to the coordinator.
A rich set of methods have been designed for RTWNs to improve the reliability of wireless packet transmission over lossy links in most RTWN solutions (e.g., WirelessHART , ISA 100.11a , and 6TiSCH ).  proposed a set of reliable graph routing algorithms in WirelessHART networks to explore path diversity to improve reliability. [31, 32] proposed algorithms to allocate a necessary number of retransmision time slots to guarantee a desired success ratio of packet delivery. However, all aforementioned studies focus on packet scheduling in static RTWN settings over lossy links, and cannot be easily extended to handle abruptly increased network traffic caused by unexpected disturbances.
3 System Model
We adopt the system architecture of a typical RTWN, in which multiple sensors and actuators are wirelessly connected to a controller node directly or through relay nodes. (Note that the controller node is for initial network setup and performing control computations. FD-PaS does not need it for making any on-line decision and updating schedules.) We refer to non-controller nodes as device nodes. We assume that all device nodes have routing capability and are equipped with a single omni-directional antenna to operate on a single channel in half-duplex mode. The network is modeled as a directed graph , where the node set and represents the controller node. A direct link represents a wireless link from node to with a Packet Delivery Ratio (PDR) , which represents the probabilistic transmission success rate on link 222Link PDR is usually measured during the site survey and is stable during normal network operations. In case the value of changes significantly, the new value is assumed to be broadcast to all the nodes in the network.. connects to all the nodes via some routes and is responsible for executing relevant control algorithms. also contains a network manager which conducts network configuration and resource allocation.
We use the concept of task to describe packet transmission from sensor nodes to actuator nodes. Specifically, the system runs a fixed set of unicast tasks . Each task follows a designated single routing path with hops and we use to represent the routing path of . It periodically generates a packet which originates at a sensor node, passes through the controller node (not necessary for FD-PaS but to carry out control computations) and delivers a control message to an actuator. Fig. 1 depicts an example RTWN with three tasks running on 7 nodes and task parameters are given in Table I.
When external disturbances (e.g., sudden change in temperature or pressure) occur, many IIoT applications would require more frequent sampling and control actions, which in turn increase network resource demands. To capture such abrupt increase in network resource demands, we adopt the rhythmic task model  which has been shown to be effective for handling disturbances in event-triggered control systems . (Note that our FD-PaS framework is not limited to the rhythmic task model and is applicable to any task model that provides workload changing patterns for handling disturbances.) In the rhythmic task model, each unicast task has two states: nominal state and rhythmic state. In the nominal state, follows nominal period and nominal relative deadline ), which are all constants. When a disturbance occurs,
enters the rhythmic state in which its period and relative deadline are first reduced in order to respond to the disturbance, and then gradually return to their nominal values by following some monotonically non-decreasing pattern. We use vectorsand to represent the periods and relative deadlines of when it is in the rhythmic state. As soon as enters the rhythmic state, its period and relative deadline adopt sequentially the values specified by and , respectively. returns to the nominal state when it starts using and again.
Here we assume that at most one task can be in the rhythmic state at any time during the network operation. To simplify the notation, we refer to any task currently in the rhythmic state as rhythmic task and denote it as while task is a periodic task which is currently not in the rhythmic state. As shown in Fig. 2, when enters the rhythmic state, we also say that the system switches to the rhythmic mode. The system returns to the nominal mode when the disturbance has been completely handled, typically some time after returns to the nominal state. Since disturbances may cause catastrophe to the system, the rhythmic task has a hard deadline when the system is in the rhythmic mode while periodic tasks can tolerate occasional deadline misses.
Each task consists of an infinite sequence of instances. The -th instance of , referred to as packet , is associated with release time , deadline and finish time . Without loss of generality, we assume that enters the rhythmic state at (denoted as ) and returns to the nominal state at (denoted as ). Thus, stays in its rhythmic state during , and . Any packet of released in the system rhythmic mode is referred to as a rhythmic packet while the packets of task are periodic packets. The delivery of packet at the -th hop is referred to as a transmission denoted as .
Traditionally, RTWNs employ Link-based Scheduling (LBS) to allocate time slots for individual tasks where each slot is allocated to a link by specifying the sender and receiver . If packets from different tasks share a common link and are both buffered at the same sender, their transmission order is decided by a node-specified policy (e.g., FIFO). This approach introduces uncertainty in packet scheduling and may violate the end-to-end (e2e) timing constraints on packet delivery. To tackle this problem, Transmission-based Scheduling (TBS) and Packet-based Scheduling (PBS) are proposed in  and , respectively, to construct deterministic schedules. Each of the two scheduling models has its own advantages and disadvantages and is preferred in different usage scenarios as discussed in . Hence, we consider both models in our FD-PaS framework.
In the TBS model, each time slot is allocated to the transmission of a specific packet at a particular hop or kept idle. Once the network schedule is constructed, packet transmission in each time slot is unique and fixed. In the PBS model, each time slot is allocated to a specific packet or kept idle. Within each time slot assigned to , every node along ’s routing path decides the action to take (e.g., transmit, receive or idle), depending on whether the node has received or not. For example, consider a task with two slots being assigned in each period. In the TBS model, the first and second slots are dedicated for ’s first and second hops, respectively. In the PBS model, the two slots are allocated to each packet of and the second slot can be used to transmit ’s first hop if the transmission fails in the first slot.
Since each link in the network may suffer packet losses, i.e., , packet transmissions may fail, which can significantly affect the timely delivery of real-time packets. To handle such cases, a retransmission mechanism is commonly employed in RTWNs [28, 30]. Specifically, if a sender node does not receive any ACK from the receiver node within the current slot, it automatically retransmits the packet in the next possible time slot.
To quantify the reliability requirement of the e2e packet delivery for each task, a required e2e PDR for , denoted as , is introduced. Based on , the transmission of any packet of is reliable if and only if the achieved e2e PDR of is larger than or equal to , i.e., . To simplify presentation, we assume that all tasks in the network share a common required e2e PDR value, denoted as . However, our proposed approach can be easily extended to support different ’s for different tasks. Table II summarizes the frequently used symbols in this paper.
Based on the above system model, the problem that we aim to solve in this paper is presented as follows.
Problem 1: Assume that for a given RTWN, a static schedule is provided which can guarantee both the e2e timing and reliability requirements of all tasks when there are no disturbances. That is, required number of slots are assigned for each packet (either in the TBS model or PBS model) in the system nominal mode. Upon detection of a disturbance at (a release time of ’s packet333We assume that disturbances can be detected only at the time when the sensor samples the environment data, i.e., the release time of a certain packet.), determine the dynamic schedule in the system rhythmic mode such that (i) the system can start handling rhythmic packets no later than , (ii) timing and reliability requirements of all the rhythmic packets are satisfied, and (iii) the system can safely return to the nominal mode after which all packets can be reliably delivered by their nominal deadlines. The objective is to minimize the total reliability degradation on all packets from periodic tasks in the system rhythmic mode.
Constraint (i) ensures that disturbances can be handled in the earliest possible time (i.e., before the nominal arrival time of the next packet). If Constraint (i) were violated, the corresponding control system could become unstable or suffer from severe performance degradation. The meaning of Constraints (ii) and (iii) are self explanatory.
It has been shown through a motivational example in  that centralized packet scheduling approaches (e.g. OLS and D-PaS) have two main drawbacks when solving the above problem. First, they rely on a single point (e.g. the controller) in the network to make on-line decisions for handling the disturbance. This is a significant roadblock in scaling up the packet scheduling framework to be deployed in large-scale RTWNs. Secondly, centralized approaches suffer from a considerably long response time to the disturbances especially for large RTWNs. This is because centralized approaches require to first send the disturbance information to the controller. After that, a broadcast packet is needed to disseminate the generated dynamic schedule to all nodes in the network to handle the disturbance. In this work, we propose a new approach to address these drawbacks.
|(),||Device nodes and controller node||,||Slot when leaves its nominal state|
|Unicast tasks||and its rhythmic state, respectively|
|Number of hops of||, ,||Start point, end point, end point candidate|
|()||Nominal period (deadline) of||,||and end point upper bound|
|()||Rhythmic period (deadline) vector of||,||Static schedule and dynamic schedule|
|The -th released packet of task||Set of nodes receiving the disturbance information|
|The h-th transmission of packet||,||Set of active rhythmic and periodic packets|
|Required e2e packet delivery ratio (for all tasks)||Set of dropped periodic packets|
|,||E2e PDR value and retry vector of||and transmissions within|
|Number of trials for -th hop assigned by||PDR degradation of|
4 Overall Framework of FD-PaS
In order to achieve fast response to disturbances in RTWNs, in this work we propose a fully distributed packet scheduling framework, referred to as FD-PaS. The key idea of FD-PaS is to make dynamic, local schedule adaptation at each node along the path of the rhythmic task while avoiding transmission collisions from other nodes that still follow their static schedules in the system rhythmic mode.
Fig. 3 gives an overview of the execution model of FD-PaS. After network initialization, each node generates locally a static schedule, , using the local schedule generation mechanism in D-PaS and follows to transmit packets. When a disturbance is detected by rhythmic task at , a notification is propagated to all the nodes responsible for handling the disturbance. Let these nodes be . Upon receiving the notification, each node in determines the time duration of the network being in the rhythmic mode and generates a dynamic schedule for handling the disturbance. Starting from , one nominal period of after detecting the disturbance, the nodes in follow while all other nodes keep using static schedule to transmit periodic packets. Thus, by not relying on a broadcast packet to disseminate the dynamic schedule generated by a centralized point in the network, FD-PaS is able to significantly reduce the response time of reacting to disturbances. For ease of discussion, in the rest of the paper, we refer to disturbance response time (DRT) as the time duration from to the start time of system rhythmic mode and disturbance handling latency (DHL) as the time duration of system rhythmic mode (see Fig. 3).
To ensure that FD-PaS works properly, several challenges need to be tackled. First, when a disturbance occurs, only the sensor node that has detected it knows which task will enter the rhythmic state, while the rest of the nodes in that are to handle the disturbance have no knowledge about this. Second, if the nodes in follow the dynamic schedule while other nodes follow the static schedule , transmission collisions would occur which may cause rhythmic packets to violate their timing and reliability requirements (e.g. missing deadlines). Third, to properly handle disturbances, efficient methods are needed by the nodes in to determine a dynamic schedule in which the reliability degradation on periodic packets is minimized. We discuss in detail how FD-PaS tackles these challenges in the following sections.
5 Propagating Disturbance Information
In centralized approaches, all nodes in the RTWN must know the disturbance information since a dynamic schedule must be deployed at each node. However, such a network-wide propagation mechanism does not scale and often violates constraint (i) in Problem 1 as shown by the motivating example. To overcome this drawback, we propose to disseminate the disturbance information to only a subset of all nodes, denoted as , to minimize the DRT. This scheme requires the following three questions be answered: (1) which nodes in the network belong to , (2) how to propagate the disturbance information to nodes in , and (3) does each node in have sufficient time to generate the dynamic schedule before the system enters the rhythmic mode? Below we present our answers to these questions.
Consider questions (1) and (2) above. Recall that when a disturbance occurs, the rhythmic task will enter its rhythmic state following reduced periods and deadlines as specified in and . An updated schedule is needed to accommodate the increased workload of . To ensure that each (re)transmission can be successful, both the sender and the receiver of must follow the same schedule. Thus, all nodes along the routing path of must know the disturbance information to generate a consistent dynamic schedule, and should be included in . For example, for the example in Fig. 1 when enters the rhythmic state. When a disturbance is detected at , its information can be piggybacked onto and transmitted to all nodes in . Propagating disturbance information in this manner guarantees that all nodes in receive the disturbance information within one nominal period of , i.e., , since the static schedule ensures that each task is assigned with the required number of transmission and retransmission slots along its routing path within in order to meet the e2e timing and reliability requirements.
Now consider question (3). As required in Constraint (i) of Problem 1, the system should start handling the rhythmic packets from after the disturbance is detected at . This requires that (i) the disturbance information be successfully propagated to the relevant nodes before enters its rhythmic state at , and (ii) each node in completes the construction of the dynamic schedule before it starts receiving/transmitting the first rhythmic packet. The propagation scheme discussed above ensures that condition (i) is met. Regarding condition (ii), our prior work showed that one idle slot (10ms) is sufficient for a typical device node in RTWNs (e.g., TI CC2538 SoC) to complete its local schedule computation . The theorem below establishes that such an idle slot indeed exists within the time frame specified in condition (ii).
If an RTWN system is schedulable under a given static schedule, any node () in has at least one idle slot (neither receiving nor sending any transmission) between time () when it receives the disturbance information and time () when it is involved in the transmission of the first rhythmic packet after enters its rhythmic state at .
We first recall the following lemma from .
If an RTWN system is schedulable under a given static schedule, i.e. each packet completes all its transmissions before the deadline, for any node and task passing through , there exists at least one idle slot at among any three consecutive transmissions of passing .
Since in our system model, sensors and actuators are connected via the controller node, every task follows a routing path with at least two hops corresponding to two transmissions (assigned with multiple transmission and retransmission slots). Suppose occurs at and is the transmission from which receives the disturbance information444If is the sensor, it detects the disturbance at .. There exists at least one transmission between and (the first transmission that is involved in the dynamic schedule, occurring at ). Then, according to Lemma 5, has at least one idle slot between and (i.e., between and ). Thus, the theorem holds.
Based on Theorem 5 and the disturbance propagation time bound, the proposed partial disturbance propagation scheme guarantees that any disturbance can be promptly responded within one nominal period of the rhythmic task and Constraint (i) in Problem 1 can be satisfied.
6 Avoiding Transmission Collisions
According to the disturbance propagation mechanism presented in Section 5, only the nodes on the path of the rhythmic task are included in . Nodes in construct their local schedules individually and employ them in the system rhythmic mode. All other nodes in the network follow the static schedule. With this execution model, unless the disturbance information is propagated to the entire RTWN, inconsistencies between the dynamic and static schedules in the system rhythmic mode may easily arise, which would result in transmission collisions. To ensure that the disturbances are handled appropriately, in the FD-PaS framework, the transmissions of rhythmic packets need to be always successful even in the presence of collision with other periodic packets.
In conventional RTWNs such as WirelessHART  and 6TiSCH , TDMA-based data link layer are widely adopted to provide synchronized and collision-free channel access. In addition, most of those protocols employ the Clear Channel Assessment (CCA) operation at the beginning of each transmission for collision avoidance. CCA, however, cannot prioritize packet transmissions. When multiple transmissions happen in the same time slot sharing the same destination, it cannot guarantee the more important packets (e.g., rhythmic packets) are granted the access to the channel.
To tackle this challenge, we propose an enhancement to the IEEE 802.15.4e standard , called Multi-Priority MAC (MP-MAC), to support prioritization of packet transmissions in RTWNs. Several attempts have been made in the literature towards supporting this feature. For example, the PriorityMAC was proposed in  to prioritize critical traffic in RTWNs. It introduces the concept of subslots, in which the transmitter does a very short transmission to indicate the priority of the packet to be transmitted in the following time slot. By adding two subslots before each time slot, PriorityMAC is able to create three priority levels. Different from PriorityMAC, the design of the MP-MAC aims to be lightweight and scalable. In MP-MAC, the transmitter does not explicitly conduct a short transmission to indicate the priority. Instead it implicitly indicates the priority of the transmission by adjusting the Start-Of-Frame (SOF) time offset. Compared with PriorityMAC, MP-MAC is more energy efficient (by avoiding transmissions in the subslots), and able to support more priority levels.
Fig. 4 gives a comparison of the slot timing of 802.15.4e (top) and MP-MAC (bottom). In a 802.15.4e time slot, the sender transmits a packet and the receiver responds with an acknowledgement (ACK) if the packet is successfully received555No acknowledgement is provided for broadcast and multicast packets.. The packet transmission starts at TxOffset after the start of the time slot, while the ACK starts at TxAckDelay after the completion of the packet transmission. A long Guard Time (LGT) and a short Guard Time (SGT) are used by the receiver and sender respectively to tolerate clock drift and radio/CPU operation delays. With this standard design of 802.15.4e, if multiple senders transmit packets in the same time slot, they are not aware of the other transmissions, and thus will cause interference. The slot timing of MP-MAC is presented at the bottom of Fig. 4. In MP-MAC, instead of being set as a constant, TxOffset is varied to implicitly indicate the priority of the packet (shown as red dashed lines). A packet with a higher priority is associated with a shorter TxOffset to start the transmission earlier. In addition, a CCA operation will be performed before each transmission to ensure that there is no higher priority packet transmission present in the channel. This enhancement ensures that only the highest priority packet (with the shortest TxOffset) is transmitted, and all lower priority transmissions yield to it.
Similar to the guard times, the TxOffset values for different priorities need to be set sufficiently apart so that different senders and receivers have consensus on the priorities. In MP-MAC, we define PriorityTick as the difference between two consecutive TxOffsets. To support different priorities in MP-MAC, the length of the time slot, compared to the standard design, needs be extended by . A longer PriorityTick can ensure successful packet prioritization, but either leads to longer SlotDuration and reduced network throughput, or smaller number of supported priorities if the size of the time slot is fixed. Since PriorityTick is a hardware-dependent parameter, we will elaborate the selection of PriorityTick in our testbed experiments and demonstrate the effectiveness of MP-MAC in Section 9.1.
7 System Rhythmic Mode
MP-MAC ensures that once the dynamic schedules are generated locally, the nodes in can follow those schedules to handle the disturbance without transmission collisions with other nodes in the network. Since all the nodes in receive the same disturbance information, the dynamic schedules generated locally at these nodes are all consistent. The construction of a dynamic schedule must guarantee that 1) all rhythmic packets meet their timing and reliability requirements, 2) the reliability degradation of periodic packets is minimized, and 3) the system can reuse the static schedule after the rhythmic mode ends and all packets can be reliably delivered by their nominal deadlines.
7.1 Problem Formulation
In FD-PaS, the network starts operation by following a static schedule which guarantees that all tasks meet their timing and reliability requirements if no disturbance occurs. The static schedule is generated at each node locally using the local schedule generation technique proposed in . To satisfy the reliability requirement, the retransmission mechanism introduced in [32, 31] is employed for each task to achieve the desired PDR value, i,e,, . In the following, we assume the network adopts the TBS model where additional time slots are assigned to individual transmissions. (The case is similar for the PBS model where slots are assigned to individual packets.) We denote the static schedule as , where is the slot ID, is the task ID and is the hop index. For any given time slot , we have if is assigned to the -th transmission of . Otherwise, to indicate an idle slot. Let be the retry vector of packet used in the static schedule in which denotes the number of slots assigned to hop of . We use to denote the number of slots assigned to (i.e., ) in the static schedule which guarantees the e2e PDR value to be larger than in the system nominal mode.
As shown in Fig. 2, when a disturbance is detected at , requires to enter its rhythmic state from the next release time , i.e., . Then, the system enters the rhythmic mode with an increased workload induced by . A dynamic schedule is thus needed before the system switches back to the nominal mode and reuses static schedule . starts from and ends at a carefully chosen end point of the system rhythmic mode. To achieve guaranteed fast disturbance handling, we further define as a user specified parameter which bounds the maximum allowed DHL, and is often application dependent. Though it is natural to use idle slots in to accommodate the increased rhythmic workload, they are not always sufficient to guarantee the timing and reliability requirements of all rhythmic packets. In this case, some periodic transmissions have to be dropped. Since any node keeps following the static schedule to transmit periodic packets, periodic transmissions cannot be adjusted in the dynamic schedule666Some periodic tasks may share common nodes with on their routing paths, which indicates that the periodic transmissions at these nodes can be adjusted in the dynamic schedule. Due to page limit, we leave this discussion to our future work and focus on the case that all periodic transmissions should not be adjusted in the dynamic schedule.. Therefore if any periodic transmission in is replaced by a rhythmic transmission in , the number of elements in is reduced such that the reliability of packet is degraded. If the remaining number of assigned slots (denoted as ) is less than , the timing requirement of is also violated since at least slots are needed to guaratee the delivery of . To capture the reliability degradation for periodic packet , let represent the difference between the required PDR and the updated PDR value in the dynamic schedule, i.e., . Note that the timing degradation of each packet can also be captured by where if is dropped. Then, the question is which periodic transmissions should be replaced by rhythmic transmissions to generate dynamic schedule such that (i) all rhythmic packets meet their timing and reliability requirements and (ii) the total reliability degradation of periodic packets is minimized.
Formally, to satisfy Constraints (ii), (iii) and (iv) in Problem 1, we aim to solve the following two subproblems.
Problem 1.1 – End Point Selection: Given task set , , and static schedule , this subproblem determines the end point that satisfies the following two constraints.
Here, is the finish time of the last packet released in ’s rhythmic state. ensures that the current rhythmic event can be completely handled before the system switches back to the nominal mode.
The system can switch back to the nominal mode and reuse the static schedule from and all packets after can be reliably delivered by their nominal deadlines.
Problem 1.2 – Dynamic Schedule Generation: this subproblem generates the dynamic schedule such that the total reliability degradation of periodic packets is minimized and the following two constraints are satisfied.
All rhythmic packets meet their timing and reliability requirements.
In the dynamic schedule , any periodic transmission slot can only either be replaced by a rhythmic transmission slot or kept unchanged.
Below we first discuss how FD-PaS solves the first problem.
7.2 End Point Selection
Determining the right end point for the dynamic schedule is vital since it impacts not only the DHL but also the number of dropped periodic packets. A concept similar to the end point is used by OLS and is referred to as switch point . Since both OLS and FD-PaS require the system to reuse the static schedule after , to select the end point in FD-PaS, we borrow some ideas in OLS including aligning the actual release time of to its nominal one and reducing the number of end point candidates by only considering the actual release times of .
FD-PaS and OLS have two key differences for end point selection. First, to satisfy Constraint 7.1, we need to determine which packets must be completed before the system reuses the static schedule at end point . Since OLS must obey a user-specified bound on the number of adjusted transmissions in dynamic schedule , a transmission set containing all transmissions to be scheduled in must be constructed. However, FD-PaS has no such requirement (due to its distributed nature), thus only needs to construct an active packet set containing all packets to be scheduled. Second, according to Constraint 7.1, transmissions of periodic packets must not be adjusted and can only be replaced by rhythmic transmissions in the dynamic schedule. Thus, for the active packet set, we only need to consider rhythmic packets to be scheduled by . These differences require modifications to the end point selection process, which are detailed below.
Let denote the active packet set containing all rhythmic packets to be scheduled within and denote the periodic packet set in which each periodic packet has at least one transmission slot in the static schedule . Naturally, any rhythmic packet with both release time and deadline in must be included in . The question is how to treat the rhythmic packet released before with a deadline after . As shown in Fig. 2, let be such a packet. To ensure the system can reuse the static schedule from , the actual release time of must be aligned to its nominal release time after . Same as OLS, we shorten the time interval between and by shifting backward to the closest nominal release time of , denoted as . The more challenging part is adjusting the deadline and execution time of since the assigned number of transmission slots may vary depending on which hop occurs after . We construct by adjusting its execution time and deadline according to the position of by considering the following two cases.
Case 1: If , is adjusted to . Suppose the first transmission slot assigned to after is at in the static schedule. If , it indicates that is the first assigned transmission slot for the first hop of , i.e., . Then the execution time of is set to . If and suppose is the -th transmission slot assigned for , the execution time is set to correspondingly.
Case 2: If , suppose the first assigned transmission slot for the first hop of is at in the static schedule, i.e., . is adjusted to to guarantee that the deadline of is smaller than or equal to the first transmission of . Also the execution time of is set to be equal to .
Given , any time slot within can be selected as end point . However, to avoid checking every time instant which is time consuming, we only need to consider the actual release times of within as end point candidates, denoted as 777Such a space reduction scheme is safe and can be proved in a similar way as Lemma 2 in  which is thus omitted due to page limit.. That is,
Then the dynamic schedule generation subproblem can be refined as follow.
8 Dynamic Schedule Generation
In this section, we discuss how FD-PaS determines the dynamic schedule to solve Problem 1.2. For the sake of clarity, we first assume that all links in the network are reliable, i.e. . We then generalize the network model to consider lossy wireless links and extend FD-PaS to satisfy both the timing and reliability requirements of all tasks in Section 8.2.
8.1 Reliable Network Setting
For RTWNs in which all links are reliable, time slots are required for each packet to guarantee its e2e delivery. If any of the transmission slots in the static schedule is replaced by a rhythmic transmission in the dynamic schedule, cannot be delivered and has to be dropped. Thus, the objective in Eq. (2) is reduced to minimize the number of dropped periodic packets. We use to denote the dropped periodic packet set and in the following we illustrate that determining is a non-trivial problem by the following Lemma.
We prove the lemma by reducing the set cover problem  to a special case of the packet dropping problem.
The set cover problem is defined as follows: Given a set of elements and a collection of nonempty subsets of where . The set cover problem is to identify a sub-collection whose union equals such that is minimized.
Given a set cover problem, we can construct a special case of the packet dropping problem in polynomial time as follows:
(1) Suppose that after utilizing the original transmission slots of and the idle slots in to accommodate rhythmic transmissions in , there still remain packets of , denoted as , to be scheduled. Each packet only needs one slot to transmit.
(2) In the static schedule , there are periodic packets, denoted as . For each packet , if there exists a transmission of falls into the time window of rhythmic packet (i.e., ), we have .
Thus, one can determine the minimum number of dropped packet set that can accommodate all the rhythmic packets if and only if the smallest sub-collection whose union equals can be identified. The Lemma is proved.
After the dropped packet set is determined, the dynamic schedule can be obtained in linear time by assigning the transmissions of the rhythmic packets to the static schedule using both idle slots and transmission slots of the dropped packets. Thus Lemma 8.1 readily leads to Theorem 8.1 and the proof is omitted.
Generating a dynamic schedule with the minimum number of dropped packets in reliable RTWNs is NP-hard.
Below we focus on solving the packet dropping problem. An ILP based formulation can be derived by associating each periodic packet with a binary variable indicating whether the packet should be dropped or not. The objective is to minimize the number of dropped packets subject to the constraint that the total number of transmission slots freed from the dropped packets should be sufficient to meet the demand of all the rhythmic transmissions in.
We introduce the following notation:
denotes the transmission vector of periodic packet where each is the number of transmissions from in the static schedule that can be replaced by transmissions of rhythmic packet in the dynamic schedule. Specifically, transmission of can be replaced by if and .
denotes the dropping decision of periodic packet . if is dropped. Otherwise, .
denotes the available slot vector where each represents the total number of idle slots and rhythmic transmission slots in the static schedule that can be used by rhythmic packet .
To drop the minimum number of periodic packets to guarantee the timing requirements of all rhythmic packets, we have the following objective function in the ILP formulation:
Since rhythmic transmissions are at the highest priority, the deadline of each rhythmic packet can be guaranteed only if at least time slots are reserved for in the dynamic schedule. Also, both idle slots and rhythmic transmission slots in the static schedule can be used to satisfy ’s transmission demand. Therefore, objective function (3) is subject to the following constraint.
Given that the packet dropping algorithm is to be deployed on resource-constrained device nodes and the sizes of both the rhythmic packet set () and periodic packet set () can become large as the network grows, we propose a greedy heuristic to solve the packet dropping problem which is time- and space-efficient to be deployed in practical RTWNs. The key idea of the greedy heuristic is to drop the periodic packet which contributes the maximum number of slots to all rhythmic packets.
Alg. 1 describes how the greedy heuristic drops periodic packets. Given the static schedule , a periodic packet set in which each maintains a transmission vector is constructed (Lines 12). Considering the idle slots and rhythmic transmission slots in , the greedy heuristic constructs a demand vector for all rhythmic packets in where captures the number of additional slots required by () (Line 3). If all elements in the demand vector equal , which means that the idle slots and rhythmic transmission slots in the static schedule are sufficient to accommodate all rhythmic packets in , no packet needs to be dropped and an empty set is returned (Lines 46). Otherwise, the heuristic drops packets in a greedy fashion as follows. In each iteration, periodic packet with the maximum in is added into the dropped packet set and removed from (Line 89). Then the algorithm updates ’s demand vector by subtracting for each (Lines 1012). If all rhythmic packets are schedulable, i.e., each equals , after dropping , the dropped packet set is returned (Lines 1315). Otherwise, the transmission vector of each periodic packet is updated according to the status of rhythmic packets (Line 16). Specifically, if rhythmic packet is already schedulable, i.e., , is set to . If which means dropping is redundant to schedule , we have . This process repeats until all rhythmic packets are schedulable and a dropped packet set is returned.
The time complexity of the packet dropping heuristic, Algorithm 1, is where and are the number of rhythmic and periodic packets in the dynamic schedule, respectively.
8.2 Unreliable Network Setting
In the discussion above, we have assumed that all links in the RTWN are reliable, i.e., . With this assumption, both timing and reliability requirements of each task can be directly satisfied when transmission slots are allocated for each packet and no retransmission slot is needed. Although this assumption simplifies the algorithm design and analysis, it is not realistic in real-life settings considering the lossy nature of wireless links. Thus, in this subsection we consider unreliable links and extend FD-PaS to handle disturbance considering both timing and reliability requirements for each task.
For RTWNs containing unreliable links, a retransmission mechanism is required and each packet may be assigned multiple retransmission slots in the static schedule according to the link quality on the routing path. After the system enters the rhythmic mode, Alg. 1 can still be applied if we do not differentiate transmission and retransmission slots allocated for each packet. That is, if any assigned slot of a periodic packet is determined to be occupied by a rhythmic transmission in the dynamic schedule, all its associated transmissions and retransmissions along the routing path will be dropped as well. However, this causes the system performance, in terms of both timing and reliability, to drop significantly since some of the dropped periodic transmissions may be kept to deliver this periodic packet. Then, the challenge is to determine the dropped periodic transmission set, denoted as , which leads to the minimum reliability degradation on periodic tasks (i.e., solving the problem defined in Eq. (2)).
Apparently, the packet dropping problem in Sec. 8.1, where dropping any transmission leads the same reliability degradation , is a special case of the transmission dropping problem considering unreliable link. Thus, according to Lemma 8.1, the following theorem holds and the proof is omitted.
Generating a dynamic schedule with the minimum reliability degradation, i.e. solving Problem 1.2, is NP-hard.
Next we focus on solving the transmission dropping problem and propose another heuristic. Note that, a packet may still be delivered even if some retransmissions are replaced by rhythmic transmissions. Thus, instead of dropping the packet contributing the maximum number of slots in Alg. 1, the key idea of the heuristic is to drop the periodic transmission which results in the minimum reliability degradation at each iteration. In the following we first describe the calculation of the reliability degradation for each transmission.
Given the PDRs of all the links along the routing path of and the retry vector , the reliability value of , , can be derived as:
If a retransmission of at -th hop is dropped, the updated reliability value can be readily computed using Eq. (5) by updating in the retry vector. The reliability degradation, then, is the difference between the two PDR values.
Alg. 2 describes the generation of the dropped transmission set using the heuristic. In the initialization phase, the periodic packet set and the rhythmic demand vector are constructed (Lines 1 - 2), and in Lines 3 - 5 we check whether any periodic transmission needs to be dropped in the dynamic schedule. If so, we drop periodic transmissions in a greedy manner. At each iteration, we select the periodic transmission with the minimum reliability degradation according to the discussion above (Lines 7). If any time slot of falls into the time window of any rhythmic packet needing extra slot to transmit, it is added into the dropped transmission set and the rhythmic demand vector is updated correspondingly (Lines 8-11). Otherwise, is kept and cannot be selected in the future. If all rhythmic packets are schedulable, the dropped transmission set is returned.
The time complexity of the dropped transmission determination is where and are the numbers of rhythmic and periodic packets in the dynamic schedule, respectively. is the number of slots assigned to each periodic packet in the static schedule.
Finally, with the dropped packet (transmission) set being determined, each node in can readily generate the dynamic schedule to solve Problem 1.2 which is summarized in Alg. 3. According to our testbed experiments in Sec. 9.1, all nodes have runtime less than 1ms (within one time slot of 10ms) to complete the dynamic schedule generation.
Note that the proposed FD-PaS framework can be readily modified to handle disturbances in networks that adopt the PBS model. The only difference appears at the selection of the periodic transmission with the minimum reliability degradation (Line 7 in Alg. 2). Since time slots are allocated to each individual packet instead of transmission in the PBS model, we select the periodic packet with the minimum reliability degradation if one of the assigned slots is replaced by a rhythmic transmission. For computing the reliability value of each packet in the PBS model, readers can refer to .
9 Performance Evaluation
In this section, we present key performance results from both testbed experiments and simulation studies to evaluate the performance of the FD-PaS framework in RTWNs. The testbed implementation is to validate the correctness of the proposed FD-PaS framework and to obtain overhead in real applications. Extensive simulations are for performance evaluation since they allow us to easily vary taskset and network specifications to study the trend. Below we first introduce the experiments from our testbed.
9.1 Testbed Implementation and Evaluation
Our testbed is based on OpenWSN stack , an open source implementation of the 6TiSCH protocol . OpenWSN enables IPv6 network over the TSCH mode of IEEE 802.15.4e MAC layer. A typical OpenWSN network consists of an OpenWSN Root and several OpenWSN devices, as well as an optional OpenLBR (Open Low-Power Border Router) to connect to IPv6 Internet. It serves as a perfect platform to experiment our proposed FD-PaS framework on both the data link and application layers of the stack.
We implemented FD-PaS on our RTWN testbed to validate the correctness of the design and evaluate its effectiveness for ensuring prompt response to unexpected disturbances. The MP-MAC was implemented by enhancing the MAC layer of the OpenWSN stack and the dynamic schedule generation algorithm (using the same code as in the simulation) was implemented in the application layer. In the following, we first present the implementation of MP-MAC and its performance evaluation, and then validate the correctness of FD-PaS in a multi-task multi-hop RTWN.
As shown in Fig.5, our testbed consists of 7 wireless devices (TI CC2538 SoC + SmartRF evaluation board). One of them is configured as the root node (controller node) and the rest are device nodes to form a multi-hop RTWN. A CC2531 sniffer is used to capture the packet. A 8-Channel Logic Analyzer is used to record device activities by physical pins, in order to accurately measure the timing information among different devices. Fig. 6 shows the experiment setup for the measurement of application layer performance.
|Parameters||Value (s)||Parameters||Value (s)|
|TxAckDelay||1,000||PriorityTick||30 to 400|
|Ext. SlotDuration||10,800||Extended LongGT||3,000|
9.1.1 Implementation and Evaluation of MP-MAC
For fair comparison with PriorityMAC , we used the 10ms slot timing of 802.15.4e in the MP-MAC implementation. Since PriorityMAC adds two subslots (0.4ms each) before each time slot, we also extended the SlotDuration and LongGT of MP-MAC by 0.8ms each. Table III summarizes the slot timing of MP-MAC, and the Adjusted TxOffset is computed as follows:
With a given extended SlotDuration, the number of priority levels that MP-MAC can support, denoted as , is a function of PriorityTick. In our MP-MAC implementation, is computed by . Fig. 7(a) shows how changes when the PriorityTick varies from 30s to 400s with a step size of 30s (the timer resolution in the OpenWSN stack). Compared to PriorityMAC which can only support 3 effective priority levels, MP-MAC can support up to 14 priority levels in theory by extending the time slot with the same amount (0.8ms). Fig. 7(a) also illustrates the bandwidth improvement, defined as , when MP-MAC only needs to maintain 3 priority levels. It can be seen that the bandwidth is improved by due to the reduction of the PriorityTick from s to s with 3 priority levels.
Measurement of Packet Error Rate (PER): Reducing the size of PriorityTick can support a larger number of priority levels in the RTWN system. Setting the PriorityTick too small, however, either causes nodes to lose synchronization, or make low priority senders unable to detect high priority packet transmissions and cause transmission collisions. It is thus important to identify safe PriorityTick values to make MP-MAC work appropriately. For this purpose, we set up a testing network with two senders talking directly to one receiver. We intentionally configure the senders to transmit in the same time slot and assign them with different priorities (using to denote the distance between the priority levels), and measure the number of correctly received packets on the receiver side. We define Packet Error Rate (PER) as the number of the failed transmissions divided by the number of total transmissions. During the test, each sender generates 10,000 packets. Fig. 7(b) shows the PER of the high priority packets by varying the size of PriorityTick from s to s. The PER of the low priority packets are always 100%, and are thus omitted in the figure. It can be observed that MP-MAC works properly under most of the PriorityTick settings. Its PER only increases when the PriorityTick is reduced to s. This indicates that the MP-MAC implementation on our device node (TI CC2538 SoC) can safely support up to 9 priority levels when the PriorityTick is set to be no less than s. When the PriorityTick is set at s, it also can be observed from Fig. 7(b) that the PER will drop (from around 10% to 5%) when the distance between the two priority levels increases (from to ).
Measurement of Application Layer Performance: To see how MP-MAC behaves in terms of packet transmission latency and packet drop rate (PDR) for different priority levels, we set up a testing network with three senders and a controller node. The three senders are assigned with different priorities (high, medium and low). Their schedules are configured in a way that they transmit in the same time slot every slotframe (with a length of s). The retransmission mechanism is enabled on all the senders so that if collision happens, the failed transmission retries in the next slotframe until a maximum number of retries is reached, and the packet is then dropped. We define packet drop rate (PDR) as the number of dropped packets divided by the number of total packets. We connect the controller node to a STM32F103 MCU through a UART port to control the packet generation on the senders. This STM32F103 MCU connects to the GPIO of each sender, and uses a pulse signal to trigger the sender to generate a packet. In the experiments, the controller node initiates and timestamps the packet generation. By comparing it to the timestamp of the packet reception, the application layer latency is obtained. After a successful packet reception, the controller node waits for a randomly selected time interval, and then triggers the next packet generation. To test latency and packet drop rate, we gradually reduce this time interval to increase the traffic volume. This will cause more transmission collisions in the network, which leads to more packet retransmissions and packet drop.
Fig. 8(a) and (b) show the PDR and application layer latency respectively for the three senders during the test. From the results, we observe that the packets from the high priority sender can always be transmitted in its first attempt while the medium and low priority senders have to yield upon collision by retransmission in future slotframes and suffer longer application layer latency. Similarly, when collision happens with the packets from the medium priority sender, the low priority sender has to yield again thus it is observed to have the longest latency. In Fig. 8(a), we note that the high priority packets can always guarantee the delivery and thus its PDR is consistently 0. On the other hand, both the low priority sender and medium priority sender experience increasing packet losses when the volume of the network traffic grows, and the impact on the low priority sender is more severe.
9.1.2 Functional validation in a multi-task multi-hop RTWN
We validate the correctness and effectiveness of FD-PaS by deploying it on a 7-node multi-hop network as shown in Fig. 1. The system running in the network consists of three tasks,