DeepAI
Log In Sign Up

Dependability-Aware Routing and Scheduling for Time-Sensitive Networking

09/13/2021
by   Niklas Reusch, et al.
DTU
0

Time-Sensitive Networking (TSN) extends IEEE 802.1 Ethernet for safety-critical and real-time applications in several areas, e.g., automotive, aerospace or industrial automation. However, many of these systems also have stringent security requirements, and security attacks may impair safety. Given a TSN-based distributed architecture, a set of applications with tasks and messages, as well as a set of security and redundancy requirements, we are interested to synthesize a system configuration such that the real-time, safety and security requirements are upheld. We use the Timed Efficient Stream Loss-Tolerant Authentication (TESLA) low-resource multicast authentication protocol to guarantee the security requirements, and redundant disjunct message routes to tolerate link failures. We consider that tasks are dispatched using a static cyclic schedule table and that the messages use the time-sensitive traffic class in TSN, which relies on schedule tables (called Gate Control Lists, GCLs) in the network switches. A configuration consists of the schedule tables for tasks as well as the disjoint routes and GCLs for messages. We propose a Constraint Programming-based formulation which can be used to find an optimal solution with respect to our cost function. Additionally, we propose a Simulated Annealing based metaheuristic, which can find good solution for large test cases. We evaluate both approaches on several test cases.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

05/17/2020

Latency Analysis of Multiple Classes of AVB Traffic in TSN with Standard Credit Behavior using Network Calculus

Time-Sensitive Networking (TSN) is a set of amendments that extend Ether...
07/02/2020

The FORA Fog Computing Platform for Industrial IoT

Industry 4.0 will only become a reality through the convergence of Opera...
03/24/2021

Quantitative Performance Comparison of Various Traffic Shapers in Time-Sensitive Networking

Owning to the sub-standards being developed by IEEE Time-Sensitive Netwo...
02/12/2019

Time-aware Test Case Execution Scheduling for Cyber-Physical Systems

Testing cyber-physical systems involves the execution of test cases on t...
04/21/2022

A Real-time Calculus Approach for Integrating Sporadic Events in Time-triggered Systems

In time-triggered systems, where the schedule table is predefined and st...
06/20/2020

Makespan minimization of Time-Triggered traffic on a TTEthernet network

The reliability of the increasing number of modern applications and syst...

1 Introduction

Many modern safety-critical real-time systems are implemented on distributed architectures. They integrate functions with different security and safety requirements over the same deterministic communication network. For example, the network in a modern vehicle has to integrate high-bandwidth video and LIDAR data for Advanced Driver Assistance Systems (ADAS) functions with the highly critical but low bandwidth traffic of e.g. the powertrain functions, but also with the best-effort messages of the low-criticality diagnostic services. See Figure 1 for an example network architecture of a modern vehicle.

Time-Sensitive Networking (TSN[8021tsn], which is becoming the standard for communication in several application areas, e.g. automotive to industrial control, is comprised of a set of amendments and additions to the IEEE 802.1 standard, equipping Ethernet with the capabilities to handle real-time mixed-criticality traffic with high bandwidth. A TSN network consists of several end-systems, that run mixed-criticality applications, interconnected via network switches and physical links. Available traffic types are Time-Triggered (TT) traffic for real-time applications, Audio-Video Bridging (AVB) for communication that requires less stringent bounded latency guarantees, and Best-Effort (BE) traffic for non-critical traffic.

We assume that safety-critical applications are scheduled using static cyclic scheduling and use the TT traffic type with a given Redundancy Level (RL) for communication. We consider that the task-level redundancy is addressed using solutions such as replication [replication], and we instead focus on the safety and security of the communication in TSN. The real-time safety requirements of critical traffic in TSN networks are enforced through offline-computed schedule tables, called Gate Control Lists (GCLs), that specify the sending and forwarding times of all critical frames in the network. Scheduling time-sensitive traffic in TSN is non-trivial (and fundamentally different from e.g. TTEthernet), because TSN does not schedule communication at the level of individual frames as is the case in TTEthernet. Instead, the static schedule tables (GCLs) governs the behavior of entire traffic classes (queues) which may lead to non-deterministic frame transmissions  [craciunas_rtns_16].

Figure 1: Example automotive TSN-based CPS with redundant routing

Since link and connector failures in TSN could result in fatal consequences, the network topology uses redundancy, e.g., derived with methods such as [VoicaBahramRedundancy]. In TSN, IEEE 802.1CB Frame Replication and Elimination for Reliability (FRER) enables the transmission of duplicate frames over different (disjoint) routes, implementing merging of frames and discarding of duplicates.

Nowadays modern Cyber-Physical Systems

are becoming increasingly more interconnected with the outside world opening new attack vectors 

[industry4.0_security, automotive_security_survey] that may also compromise safety. Therefore, the security aspects should be equally important to the safety aspects. Timed Efficient Stream Loss-Tolerant Authentication (TESLA[tesla] has been investigated as a low resource authentication protocol for several networks, such as FlexRay and TTEthernet [security_aware_tte] networks. Adding security mechanisms such as TESLA after the scheduling stage is oftentimes not possible without breaking real-time constraints, e.g. on end-to-end latency, and degrading the performance of the system [security_aware_tte]. Thus we consider TESLA and the overhead and constraints it imposes part of our configuration synthesis problem formulation.

1.1 Related Work

Scheduling for TSN networks is a well-researched problem. It has been solved for a variety of different traffic type combinations (TT, AVB, BE) and device capabilities using methods such as Integer Linear Programming (ILP), Satisfiability Modulo Theories (SMT) or various metaheuristics such as tabu search 

[craciunas_rtns_16, SernaRTAS18, Frank16, AVBAwareRoutingScheduling].

Routing has also been extensively researched [MulticastRoutingQoS, PacketRoutingSurvey]. The authors in [AFDXRouting] presented an ILP solution to solve the routing problem for safety-critical AFDX networks. In [TamasTTERouting] the authors used a tabu search metaheuristic to solve the combined routing and scheduling problem for TT traffic in TTEthernet. In [QuorumcastRouting] the authors provide a simple set of constraints to solve a general multicast routing problem using constraint programming, which [VoicaBahramRedundancy] builds on that to solve a combined topology and route synthesis problem. In [LoadBalancingRouting]

the authors use a load-balancing heuristic to distribute the bandwidth usage over the network and achieve smaller latency for critical traffic.

Multiple authors have also looked at the combined routing and scheduling problem. The authors in [SuneRoutingScheduling] and [QBVRouting] showed that they are able to significantly reduce the latency by solving the combined problem with an ILP formulation. In [HeuristicRoutingScheduling] the authors presented a heuristic for a more complex application model that allows multicast streams. They were able to solve problems that were infeasible to solve using ILP or separate routing and scheduling.

Recently authors have started to present security- and redundancy-aware problem formulations. The authors in [security_aware_tte] provided a security-aware scheduling formulation for TTEthernet using TESLA for authentication. In [SecurityAwareRoutingAndScheduling] the authors solve the combined routing and scheduling problem and considered authentication using block ciphers. The authors in [ReliabilityAwareRorutingScheduling] and [TSNRoutingScheduling], on the other hand, present a routing and scheduling formulation that is redundancy-aware but has no security considerations.

To the best of our knowledge our work is the first one to provide a formulation that is both security and redundancy-aware.

1.2 Contributions

In this paper, we address TSN-based distributed safety-critical systems and solve the problem of configuration synthesis such that both safety and security aspects are considered. Determining an optimized configuration means deciding on the schedule tables for tasks as well as the disjoint routes and GCLs for messages. Our contributions are the following:

  1. We apply TESLA to TSN networks considering both the timing constraints imposed by TSN and the security constraints imposed by TESLA.

  2. We formulate an optimization problem to determine: (i) the redundant routing of all messages; (ii) the schedule of all messages, encapsulated into Ethernet frames, represented by the GCLs in the network devices, and (iii) the schedule of all related tasks on end-systems.

  3. We extend our Constraint Programming (CP) formulation from [RTSS_WIP] and propose a new Simulated Annealing (SA)-based metaheuristic to tackle large scale networks that cannot be solved with CP

  4. We evaluate the impact of adding the security from TESLA on the schedulability of applications and we evaluate the solution quality and scalability of the Constraint Programming (CP) and Simluated Annealing (SA) optimization approaches

We introduce the fundamental concepts of TSN in section 2 and of TESLA in section 3. In section 4 we present the model of our system, consisting of the architecture of the network, applications running on this architecture. Additionally we present a threat model and how it is addressed by TESLA with a security model. In section 5 we formulate the problem we are solving using the established models and present an example. In section 6 and section 7 we present the two different optimization approaches, CP and SA. Then, we evaluate these approaches using several test cases in section 8. section 9 concludes the paper.

2 Time-Sensitive Networking

Figure 2: Simplified TSN switch representation

Time-Sensitive Networking [8021tsn] has arisen out of the need to have more stringent real-time communication capabilities within standard Ethernet networks. Other technologies that offer real-time guarantees for distributed systems are TTEthernet (SAE AS6802 [sae-as6802, steiner11:CRC]), PROFINET, and EtherCAT [4638425]. TSN comprises a set of (sub-)standards and amendments for the IEEE 802.1Q standard, introducing several new mechanisms for Ethernet bridges, extensions to the IEEE 802.3 media access control (MAC) layer, as well as other standards and protocols (e.g., 802.1ASrev).

The fundamental mechanisms that enable deterministic temporal behavior over Ethernet are, on the one hand, the clock synchronization protocol defined in IEEE 802.1ASrev [8021asrev], which provides a common clock reference with bounded deviation for all nodes in the network, and on the other hand, the timed-gate functionality (IEEE 802.1Qbv [8021qbv]) enhancing the transmission selection on egress ports. The timed-gate functionality (IEEE 802.1Qbv [8021qbv]) enables the predictable transmission of communication streams according to the predefined times encoded in so-called Gate-Control Lists (GCL). A stream in TSN definition is a communication carrying a certain payload size from a talker (sender) to one or more listeners (receivers), which may or may not have timing requirements. In the case of critical streams, the communication has a defined period and a maximum allowed end-to-end latency.

Other amendments within TSN (c.f. [8021tsn]) provide additional mechanisms that can be used either in conjunction with 802.1Qbv or stand-alone. IEEE 802.1CB [802.1CB] enables stream identification, based on e.g., the destination MAC and VLAN-tag fields in the frame, as well as frame replication and elimination for redundant transmission. IEEE 802.1Qbu [8021qbu] enables preemption modes for mixed-criticality traffic, allowing express frames to preempt lower-priority traffic. IEEE 802.1Qci [8021qci] defines frame metering, filtering, and time-based policing mechanisms on a per-stream basis using the stream identification function defined in 802.1CB.

We detail the Time-Aware Shaper (TAS) mechanism defined in IEEE 802.1Qbv [8021qbv] via the simplified representation of a TSN switch in Figure 2. The figure presents a scenario in which communication received on one of two available ingress ports (A and B) will be routed to an egress port C. The switching fabric will determine, based on internal routing tables and stream properties, to which egress port a frame belonging to the respective stream will be routed (in our logical representation, there is only one egress port). Each port will have a priority filter that determines which of the available traffic classes (priorities) of that port the frame will be enqueued in. This selection will be made based on either the priority code point (PCP) contained in the VLAN-tag of 802.1Q frames or the stream gate instance table of 802.1Qci, which can be used to circumvent traffic class assignment of the PCP code. As opposed to regular 802.1Q bridges, where the transmission selection sends enqueued frames according to their respective priority, in 802.1Qbv bridges, there is a Time-Aware Shaper (TAS), also called timed-gate, associated with each traffic class queue and positioned before the transmission selection algorithm. A timed-gate can be either in an open (o) or closed (C) state. When the gate is open, traffic from the respected queue is allowed to be transmitted, while a closed gate will not allow the respective queue to be selected for transmission, even if the queue is not empty. The state of the queues is encoded in a local schedule called Gate-Control List (GCL). Each entry defines a time value and a state (o or C) for each of the queues. Hence whenever the local clock reaches the specified time, the timed-gates will be changed to the respective open or closed state. If multiple non-empty queues are open at the same time, the transmission selection selects the queue with the highest priority for transmission.

The Time-Aware Shaper functionality of 802.1Qbv, together with the synchronization protocol defined in 802.1ASrev, enables a global communication schedule that orchestrates the transmission of frames across the network such that real-time constraints (usually end-to-end latencies) are fulfilled. The global schedule synthesis has been studied in [craciunas_rtns_16, SernaRTAS18, Pop16, Frank16] focusing on enforcing deterministic transmission, temporal isolation, and compositional system design for critical streams with end-to-end latency requirements.

Craciunas et al. [craciunas_rtns_16] define correctness conditions for generating GCL schedules, resulting in a strictly deterministic transmission of frames with jitter. Apart from the technological constraints, e.g., only one frame transmitted on a link at a time, the deterministic behavior over TSN is enforced in [craciunas_rtns_16] through isolation constraints. Since the TAS determines the temporal behavior of entire traffic classes (as opposed to individual frames like in TTEthernet [steiner10]), the queue state always has to be deterministic. Hence, in [craciunas_rtns_16], critical streams are isolated from each other either in the time or space domain by either allowing only one stream to be present in a queue at a time or by isolating streams that are received at the same time in different queues. This condition is called frame/stream isolation in [craciunas_rtns_16]. In [SernaRTAS18], critical streams are allowed to overlap to some degree (determined by a given jitter requirement) in the same queue in the time domain, thus relaxing the strict isolation.

Both approaches enforce that gate states of different scheduled queues are mutually exclusive, i.e., only one gate is open at any time, thus preventing the transmission selection from sending frames based on their assigned traffic class’s priority. By circumventing the priority mechanism through the TAS, it is ensured that no additional delay is produced through streams of higher priorities, enforcing thus a highly deterministic temporal behavior.

3 Timed Efficient Stream Loss-Tolerant Authentication

TESLA provides a resource efficient way to do asymmetric authentication in a multicast setting [tesla]. It is described in detail in [tesla] and [tesla_rfc].

We are considering systems where one end-system wants to send a multicast-signal to multiple receiver end-systems, e.g., periodic sensor data. A message authentication code (MAC), which is appended to each signal, can guarantee authenticity, i.e., that the sender is who he claims to be, and integrity, i.e., that the message has not been altered. The MAC is generated and authenticated by a secret key that all end-systems share (i.e., symmetric authentication). The downside of this approach is that if any of the receiving end-systems is compromised, the attacker would be able to masquerade as the sender by knowing the secret key. In a multicast setting, an asymmetric approach, in which the receivers do not have to trust each other, is preferable.

The traditional asymmetric authentication approach is to use asymmetric cryptography with digital signatures (i.e., private and public keys); however, as stated in [tesla_article], the method is computationally intensive and not well suited for systems with limited resources and strict timing constraints.

TESLA, however, uses an approach where the source of asymmetry is a time-delayed key disclosure [tesla]. While this can be implemented with much less overhead, it requires time synchronization between the network nodes. For TSN, the time synchronization is given through the 802.1ASrev protocol.

Figure 3 visualizes the TESLA protocol. As described in [tesla_article], when using TESLA, time is divided into fixed intervals of length . At startup a one-way chain of self authenticating keys is generated using a hash function H, where . Each key is assigned to one interval. The protocol is bootstrapped by creating this chain and securely distributing to all receivers [security_aware_tte].

Figure 3: TESLA key chain (Adapted from [tesla_article])

Normally in TESLA, as described in [security_aware_tte], when a sender sends a message in the i-th interval, it appends to that message: i, a keyed-MAC using the key of that interval , and a previously used key . Thus, a key remains secret for intervals. When a receiver receives a message in the interval it can not yet authenticate it. It must wait until a message arrives in the interval . This message discloses , which can be used to decrypt the MAC of and thus authenticate it. To ensure that itself is valid, we can use any previously validated key. For example, we can check that , etc. This makes TESLA also robust to packet loss since any lost keys can be reconstructed from a later key, and any key can always be checked against .

Due to the deterministic nature of our schedule, we can make some modifications to the basic TESLA protocol without sacrificing security. The first modification is adopted from  [security_aware_tte]. Since bandwidth is scarce, we do not release the key with every message/stream. Instead, it will be released once in its own stream with an appropriate redundancy level. The second modification concerns the TESLA parameter . This parameter is useful in a non-deterministic setting. Since the arrival time of a stream is uncertain, a high value for makes it more likely that a stream can be authenticated, at the cost of an increased latency.  [tesla_rfc] However, in our case, we know the exact time a stream will be sent and arrive. Thus, we assume that a stream’s keyed-MAC will be generated using the key from the interval it arrives at the last receiver. Furthermore, we will release the key in the interval, minimizing the latency before a stream can be authenticated.

4 System Models

This section presents the architecture and application models, as well as the threat, security and fault models. Our application model is similar to the one used in related work [security_aware_tte], but we have extended it to consider TSN networks and the optimization of redundant routing in conjunction with scheduling.

Description Notation Unit
Header overhead Byte
Maximum transmission unit Byte
TESLA key size Byte
TESLA MAC size Byte
Hyperperiod H
TSN Network Graph
- Nodes
  - End-system
    - Hash computation time
  - Switch
- Links
  - Network link
    - Link speed
Application
- Tuple
- Period
- Communication Depth
- Tasks
  - Execution end-system
  - Worst-case execution time
  - Period
- Streams
  - Source task
  - Destination tasks
  - Size Byte
  - Period
  - Redundancy Level
  - Security Level
  - MAC generation task
  - MAC verification task
Security Application
- Key release task
- Key verification task
  - Key source end-system
- Key stream
Table 1: Notations

4.1 Architecture Model

We model our TSN network as a directed graph consisting of a set of nodes and a set of edges . The nodes of the graph are either end-systems (ESs) or switches (SWs): . The edges of the graph represent the network links.

We assume that all of the nodes in the network are TSN-capable, specifically that they support the standards 802.1ASrev [8021asrev] and 802.1Qbv [8021qbv]. Thus we assume the whole network, including the end-systems, to be time-synchronized with a known bounded precision . All nodes use the time-aware shaper mechanism from 802.1Qbv to control the traffic flow.

Each end-system features a real-time operating system with a periodic table-driven task scheduler. Hash computations, which will be necessary for TESLA operations on that end-system, take .

A network link between nodes and is defined as . Since in Ethernet-compliant networks all links are bi-directional and full-duplex, we have that for each there is also . A link is defined by a link speed .

(a) shows a small example architecture with four end-systems, two switches, and full-duplex links.

(a) Example Architecture
(b) Example Application
Figure 4: Example architecture and application models

4.2 Application Model

An application is modeled as a directed, acyclic graph consisting of a set of nodes representing tasks and a set of edges represents a data dependency between tasks.

A task is executed on a certain end-system . The worst-case execution time (WCET) of a task is defined by . A task needs all its incoming streams (incoming edges in the application graph) to arrive before it can be executed. It produces outgoing streams at the end of its execution time. Communication dependencies between tasks that run on the same end-system are usually done via, e.g., shared memory pools or message queues, where the overhead of reading/writing data is negligible and included in the WCET definition of the respective tasks. Dependencies between tasks on separate end-systems constitute communication requirements and are modeled by streams. A stream in the TSN context is a communication requirement between a sender and one (unicast) or multiple (multicast) receivers. An example application can be seen in (b). An application is periodic with a period , which is inherited by all its tasks and streams.

A stream originates at a source task and travels to set of destination tasks (since we consider multicast streams). The stream size is assumed to be smaller than the maximum transmission unit (MTU) of the network. Each stream has a redundancy level , which determines the amount of required disjunct redundant routes for the stream to take. For each of these routes we model a sub-stream: Hereby is a set containing all sub-streams of . This notation is useful to differentiate the different routes a stream takes through the network, and to make sure those routes do not overlap. A stream also has a binary security level which determines if it is authenticated using TESLA () or not ().

We define the hyperperiod as the least-common multiple of all application periods: We define the set to contain all tasks and the set to contain all streams (including redundant copies).

4.3 Fault Model

Reliability models discussed in [VoicaBahramRedundancy] (e.g., Siemens SN 29500) indicate that the most common type of permanent hardware failures is due to link failures (especially physical connectors) and that ESs and SWs are less likely to fail. These models are complementary to Mean Time to Failure (MTTF) targets established for various safety integrity levels in certification standards such as ISO 26262 for automotive [VoicaBahramRedundancy]. As mentioned, we assume we know the required redundancy level to protect against permanent link failures. Our disjoint routing can guarantee the transmission of a stream of RL despite any link failures. For example, for the routing of with RL 2 in (a), any 1-link failure would still result in a successful transmission.

4.4 Threat Model

We use a similar threat model as [security_aware_tte] and assume that an attacker is capable of gaining access to some end-systems of our system, e.g., through an external gateway or physical access.

We consider that the attackers have the following abilities:

  • They know about the network schedule and the content of the streams on the network;

  • They can replay streams sent by other ES;

  • They can attempt to masquerade as other ES by faking the source address of streams they send;

  • They have access to all keys released and received by the ES they control;

4.5 Security Model

We use TESLA to address the threats identified in the previous section, which means that additional security-related models are required. These additional applications, tasks and streams can be automatically generated from a given architecture and application model.

First off, we need to generate, send, and verify a key in each interval for each set of communicating end-systems. We generate a key authentication application for each sender end-system, which is modeled similarly to a normal application as a directed acyclic graph. The period is equal to (see section 3) and again inherited by tasks and streams. Each of these application consists of one key release task scheduled on the sending end-system . Additionally, it consists of key verification tasks on each end-system that receives a stream from . The release task sends a multicast key-stream to each of those verification tasks. The redundancy level of a key-stream is set to the maximum redundancy level of all streams emitted by . The size of a key stream is equal to the key size specified by the TESLA implementation. The security model for our example from Figure 4 can be seen in Figure 5.

For a key verification task is the end-system whose key this task is verifying. Its execution time is equal to the length of one hash execution on its execution end-system: . A key release task’s execution time is very short, since the key it releases has already been generated during bootstrapping. We model it to be last half the time of a hash execution:

Secondly, we need to append MACs to all non-key-streams with . Thus, their length increases by the MAC length specified by the TESLA implementation. For each stream , a MAC generation task is added to the sender and a MAC validation task to each receiver. Those tasks take the time of one MAC computation on the processing element to execute.

We define the set to contain all key release tasks and to contain all key verification tasks for a given node . Furthermore let contain all key streams.

(a) shows key release and verification tasks in orange and MAC generation and validation tasks in red.

Figure 5 shows the security applications for our example.

Figure 5: Example security model for the applications in Figure 4

5 Problem Formulation

Given a set of applications running on TSN-capable end-systems that are interconnected in a TSN network as described in the architecture, application, and security models in section 4, we want to determine a system configuration consisting of:

  • an interval duration for TESLA operations,

  • the routing of streams,

  • the task schedule,

  • the network schedule as 802.1Qbv Gate-Control Lists,

such that:

  • all deadline requirements of all applications are satisfied.

  • the redundancy requirements of all streams and the security conditions of TESLA are fulfilled.

  • the overall latency of applications is minimized.

5.1 Motivational Example

We illustrate the problem using the architecture and application from Figure 4. We have one application (b) with 4 tasks, 2 streams and a period and deadline of 1000 . The tasks are mapped to the end-systems as indicated in the figure. Stream will be multicast. The size of both streams is 50 B. For TESLA’s security requirements, i.e. , we generate two additional security applications (Figure 5).

We have a TSN network with a link speed of 10 Mbit/s and zero propagation delay. Our TESLA implementation uses keys that are 16 B and MACs that are 16 B. A hash computation takes 10  on every ES.

(a) Schedule without security & redundancy
(b) Schedule with security & redundancy
Figure 6: Example solution schedules for the models in Figure 4

A solution that does not consider the security and redundancy requirements is shown in (a). With the TSN stream isolation constraint outlined in section 2 taken into consideration, the GCLs are equivalent to frame schedules. We depict in (a) the GCLs as a Gantt chart, where the red rectangles show the transmission of streams and on network links, and the blue rectangles show the tasks’ execution on the respective end-systems. To guarantee deterministic message transmission in TSN, we have to isolate the frames in the time (or space) domain, leading to the delay of and thus . We refer the reader to [craciunas_rtns_16] for an in-depth discussion on the non-determinism problem and isolation solution in TSN.

In this paper, we are interested in solutions such as the one in (b), which considers both the redundancy and security requirements. The black dashed line in the figure separates the TESLA key release intervals, where was determined to be 500 . Streams carrying keys are orange, key generation tasks pink, key verification tasks green, and the MAC generation/validation operations on ESs are shown in red. The routing of the non-key streams can be seen in (a). Note how the two redundant copies of , and use non-overlapping paths.

Of particular importance is the delay incurred by the time-delayed release of keys: tasks and can only be executed after the keys authenticating and have arrived in the second interval, and after key verification and MAC validation tasks have been run.

Scheduling problems like the one addressed in this paper are NP-hard as they can be reduced to the Bin-Packing problem [8607243] and may be intractable for large input sizes. In the following sections, we will propose a Constraint Programming (CP) formulation to solve the problem optimally for small test cases, and a heuristic to solve the problem for large test cases.

6 Constraint Programming Formulation

Constraint Programming (CP) is a technique to solve combinatorial problems defined using sets of constraints on decision variables. For large scheduling problems it becomes intractable to use CP due to the exponential increase in the size of the solution space [cp_handbook]. In order to achieve reasonable runtime performance, we split the problem into sub-problems which we solve sequentially: (i) finding a route for all streams, (ii) finding , and (iii) finding the network and task schedule.

6.1 Optimizing redundant routing

The first step of solving the proposed problem is to find a set of (partially) disjoint routes for each stream, depending on the stream’s redundancy level. The constraints in this section are inspired by [VoicaBahramRedundancy] and [QuorumcastRouting].

We model the stream routes with an integer matrix , where the columns represent streams (including their redundant copies) and rows represent nodes of the network. An entry at the position of a stream and a node in this matrix referring to a node , represents a link from to on the route of stream . Alternatively the entry could be , in which case is not part of the route.

X s1 s2_0 s2_1
ES1 ES1 nil nil
ES2 nil ES2 ES2
ES3 SW1 SW1 SW2
ES4 nil SW1 SW2
SW1 ES1 ES2 nil
SW2 nil nil ES2
Table 2: Matrix X for example from subsection 5.1

Using the matrix , we can construct the route for each stream bottom-up as a tree, by starting at the receiver nodes. See Table 2 for the matrix of our example.

To determine the route for each stream , for each node we have the following optimization variables:

  • represents an entry of our matrix X. The domain of is defined as: . We refer to as the successor of on the path to the stream sender node.

  • represents the length of the path from to , i.e. the length of the path from node to the sender node of the stream.

Furthermore, we define a few helper variables and functions. First off, we define as the set of all distinct streams, i.e., excluding the redundant copies of streams with redundancy level (RL) greater than one. Additionally we define as the set of all redundant copies (including the stream itself) of . Then we define the following helper function:

(1)

This function allows us, for any given , to determine the number of redundant copies (including itself) that use the link from to (nil is counted as zero).

Then we have the following constraint optimization problem:

(RC1)

where

(RC2)

s.t.

(R1)
(R2)
(R3.1)
(R3.2)
(R3.3)
(R4)
(R5)
(R6)

Please note that == and != are boolean expressions that evaluate to 1 if true and to 0 otherwise.

The cost function we are minimizing ((RC1),(RC2)) measures the length of the route of each stream. 111For some use cases, fully disjoint routes are not necessary. Refer to Appendix A for an updated formulation for this case

The constraint (R1) prevents cycles in the route, as defined in [QuorumcastRouting]. The constraint (R2) disallows “loose ends”, i.e., a node that has a successor/predecessor must have a predecessor/successor itself. Please note that we refer to the successor on the path from receiver to sender, i.e., the predecessor on the route. The constraint (R3.1) states that all receivers of a stream have to have a successor. The constraints (R3.2), (R3.3), and (R4) impose that the sender of the stream has itself as the successor, no other end-system has a successor, and the path length is 0 at the sender node, respectively. The constraint (R5) restricts the bandwidth usage of each link to be under . If multiple copies of the same stream use the same link, only one of them is counted as consuming bandwidth since we assume that streams are intelligently split and merged using IEEE 802.1CB. The constraint (R6) forbids the routes of redundant copies of a stream to overlap at any point.

6.2 Optimizing

To set up the TESLA protocol, we need to choose the parameter . is the duration of one key disclosure interval. It has a big influence on the latency of secure streams and thus on the feasibility/quality of the schedule.

When choosing there is a trade-off between overhead and latency. A small reduces the latency of secure streams but necessitates more key generation/verification tasks and key streams. Thus, we want to determine the maximum value of for which the latency is still within all deadline bounds. To this end, we formulate constraints inspired by [security_aware_tte] for which we then determine the optimal solution. This value is used as a constant in the subsequent optimization of the schedule.

We introduce a new notation: For each application we define

to be the communication depth, i.e. the length of the longest path in the application graph where only edges with associated secure streams are counted (ES-internal dependencies and non-secure streams are ignored). This gives us a measure of the longest chain of secure communications within the application, which we can use to estimate the amount of necessary TESLA intervals.

Then we have the following formulation:

(P0)

s.t.

(P1)
(P2)
(P3)

The constraint (P1) guarantees that is small enough to accommodate the authentication of all secure streams for all applications. The communication depth of an application gives a lower bound of how many TESLA intervals are necessary to accommodate all these streams within the period of the application, since there have to be intervals to accommodate the authentication of secure streams.

The purpose of the constraints (P2) and (P3) is to align the TESLA intervals with the schedule. The  (P2) makes a divisor of the hyperperiod, while constraint (P3) makes either a multiple or a divisor of the greatest common divisor of all application periods.

6.3 Optimizing scheduling

In this step, we want to find a schedule for all tasks and streams which minimizes the overall latency of streams while fulfilling all constraints imposed by deadlines, TESLA, and TSN. The routes for each stream and are given by the previous scheduling steps and assumed constant here.

We define the following integer optimization variables:

  • as the offset of stream on link or node

  • as the transmission duration of stream on link or node

  • as the end-time of stream on link or node

  • as the index of the earliest interval where stream can be authenticated on any receiver

  • as the offset of task (on node )

  • as the end-time of task (on node )

As an example, let us assume a hyperperiod of 1000us and a stream with a period of 500us. would imply that the stream is scheduled on link in the following time intervals: (100, 150) and (600, 650).

Furthermore we define several helper variables. Let be the set containing all receiver end-systems of stream :

Let be the set containing all links on the route of stream as well as sender and receiver nodes:

(2)

Using these helper functions we define the following constraint-optimization problem for the task and network scheduling step:

(CS1)

where

(CS2)

s.t.

(S1)
(S2.1)
(S2.2)
(S3.1)
(S3.2)
(S4.1)
(S4.2)

The constraint (S1) sets the deadline for the completion of an application to its period. The constraints (S2.1) and (S2.2) set all optimization variables to zero for every stream, for all nodes and links not part of its route. For all other links and nodes constraints, (S3.1) and (S3.2) set the end-time to be the sum of offset a length. For each link on the route of a stream constraint (S4.1) sets the length to be the byte-size of the stream divided by the link-speed. In constraint (S4.2) the length of secure streams on end-systems is set to the length of one hash-computation on that end-system, approximating the duration of MAC generation/verification.

(S5)
(S6)
(S7.1)
(S7.2)
(S7.3)

In constraint (S5) the earliest authentication interval for a stream is bound to be after the latest interval where the stream is transmitted. In constraint (S6) the start time of the stream on any receiver end-system is then bound to be greater or equal to the start time of that interval plus the end-time of the necessary preceding key verification task. The constraints (S7.1), (S7.2) and (S7.3) make sure that every stream is scheduled consecutively along its route. Hereby constraint (S7.1) enforces the precedence among two links, (S7.2) among the MAC generation on the sender and the first link and (S7.3) among the last link and the following MAC verification.

(S8)
(S9)

The constraint (S8) prevents any streams from overlapping on any nodes or links. Furthermore, constraint (S9) guarantees that for each link connected to an output port of a switch, the frames arriving on all input ports of that switch that want to use this output port cannot overlap in the time domain. This is the frame isolation necessary for determinism in our TSN configuration, which is further explained in [craciunas_rtns_16].

(T1)
(T2.1)
(T2.2)
(T3.1)
(T3.2)
(T4)
(T5)

The constraint (T1) sets the end-time of a task to be the sum of offset and length. The constraints (T2.1) and (T2.2) model the dependency between a task and all its outgoing streams: such streams may only start after the task has finished. Similarly, constraints (T3.1) and (T3.2) model the dependency between a task and its incoming streams: such a task may only start after all incoming streams have arrived. Finally, constraint (T4) prevents any two tasks from overlapping, while constraint (T5) prevents a task from overlapping with a MAC generation/verification operation.

7 Metaheuristic Formulation

As mentioned in section 6, the scheduling problem addressed in this paper is NP-hard. As a consequence, a pure CP formulation solved using a CP solver is not tractable for large problem sizes. Hence, in this section, we propose a metaheuristic-based strategy, which aims to find good solutions (without the guarantee of optimality) in a reasonable time, even for large test cases.

An overview of our strategy is presented in algorithm 1. We use a Simulated Annealing (SA) metaheuristic [SimulatedAnnealing] to find solutions , consisting of a set of routes and a schedule . As an input, we provide our architecture model and the application model . SA randomly explorers the solution space in each iteration by generating “‘neighbors” of the current solution using design transformations (or “moves”). We consider both routing and scheduling-related moves, and the choice is controlled by a

parameter that gives the probability of a routing move. To measure the quality of a solution we use a cost function with two parameters

and which are factors for punishing overlap of redundant streams and missed deadlines for applications, respectively. While we always accept better solutions, the central idea of Simulated Annealing is to also accept worse solutions with a certain probability in order to not get stuck in local optima [burke2005search].

algorithm 1 shows the main loop of the heuristic. We start out with an initial solution, a cost value, and a positive temperature. (line 2-4). Then, we repeat the steps described below until a stopping criterion like a time- or iteration-limit is met. We create a slight permutation of the current solution by using the function (line 6). We calculate the cost of the new solution (line 7) and a delta of the new and old cost (line 8). Now, if the delta is smaller than 0, i.e., if is a better solution than , we choose as the current solution (line 10-12). Alternatively, the new solution is also accepted if a random chosen value between 0 and 1 is smaller than the value of the acceptance probability function . This acceptance probability will decrease with the temperature over time and is also influenced by , which gives a measure of how much worse the new solution is. Finally, since we will occasionally accept worse solutions, we keep track of the best cost achieved overall and adjust it if necessary (line 12-14).

1 Function 
2       = = InitialSolution(, );
3       = c = Cost(, , );
4       t = ;
5       while stopping-criterion not True do
6             = RandomNeighbour();
7             = Cost(, , );
8             = ;
9             if  or random[0,1)  then
10                   = ;
11                   = ;
12                   if  then
13                         = ;
14                         =
15                  
16             t = ;
17            
18       end while
19      return ;
20      
21
Algorithm 1 Simulated Annealing Metaheuristic

7.1 Precedence graph

Figure 7: Example precedence graph with associated order

We introduce a helper data structure in the form of a precedence graph. A precedence graph is a collection of special DAGs, one for each application. These DAGs are expanded versions of the DAGs from the application model. Here, streams are modeled as nodes instead of edges, and each redundant copy of a stream has its own node. See Figure 7 for an example. This data structure helps to model all the dependencies between tasks and streams in the scheduling algorithm. Additionally, we will use the set of all topological orders of this graph as our solution space for the scheduling step. An order can be seen as a scheduling priority assignment that respects all precedence constraints.

7.2 Initial solution

1 Function InitialSolution()
       // routing
2       foreach  do
3             foreach  do
4                   if IsFirstCopyOfStream() then
5                         = ShortestPaths();
6                        
7                   else
8                         = ShortestPathsWeighted();
9                        
10                   = ShortestPath() ;
11                  
12             end foreach
13            
14       end foreach
      // schedule
15       P = CreatePrecedenceGraph();
16       foreach  do
17             = TopologicalOrder();
18            
19       end foreach
20      foreach  do
21             = TopologicalOrder();
22            
23       end foreach
24      ;
25       = Schedule();
26       return ;
27      
28
Algorithm 2 InitialSolution

In the beginning, we create an initial solution from the given architecture and application model. A solution is a tuple consisting of a set of routes and a schedule . algorithm 2 details the function to find the initial solution.

To find an initial set of routes, we iterate through all streams and all pairs of sender and receiver ES (lines 2-3). For each such pair, we calculate and store shortest paths for the given topology (line 5). For each redundant copy of a stream beyond the first, we calculate the shortest path in a weighted graph, where we weight all link used by previous copies with instead of 1 (line 7). For the initial solution, we choose the shortest path for each pair (line 8). Note that our k-shortest-path algorithm only generates paths without repeated nodes that do not traverse any end-system.

To find an initial schedule, we have to create the precedence graph (line 11) and decide an order of this graph.

For the initial solution, we construct an order on the level of applications, i.e., we avoid interleaving nodes of different applications. We prioritize key applications (lines 12-14) before other (normal) applications (lines 15-17). This order is consequently used to create a schedule (line 19). See Figure 7 for an example order.

7.3 Neighbourhood function

1 Function RandomNeighbour()
2       p = ;
3       if  then
4             s = RandomStream();
5             = RandomReceiver(s);
6             = RandomPath();
7            
8      else
9             = RandomNormalApplication();
10             = RandomNormalApplication();
11             = SwitchSchedulingOrder(, , );
12             = Schedule();
13             = OptimizeLatency(, );
14            
15       end if
16      return ;
17      
18
Algorithm 3 RandomNeighbour

The neighbourhood function is detailed in algorithm 3. It is used during Simulated Annealing to create a slight permutation of a given solution/candidate . It contains two fundamental moves: Changing the routing or changing the schedule . Which move is taken is decided randomly (line 3). The parameter influences how likely it is that the routing move is taken, e.g., would result in a probability of 50%.

A routing move consists of choosing a random stream out of the set of all streams (algorithm 3), choosing a random receiver out of all receivers of that stream (algorithm 3) and then assigning a random path out of the set of k-shortest-paths calculated during the creation of the initial solution (algorithm 3).

A scheduling move consists of choosing two random normal (non-key) applications and (lines 3,3), switching their order in the precedence graph (algorithm 3) and recalculating the schedule (algorithm 3). Whenever a new schedule is calculated, we also optimize its latency (algorithm 3). This is further explained in subsection 7.6.

7.4 Cost function

The cost function is used in the simulated annealing metaheuristic to evaluate the quality of a solution. A lower cost means a better solution. algorithm 4 shows how our cost function is calculated. It consists of two components: a routing cost and a schedule cost . The routing cost is the sum of the number of overlaps of redundant stream (one for each stream for each link) which is punished with a factor and the total accrued length of all routes. The schedule cost is the sum of the number of infeasible applications, which is punished with a factor , and the total sum of all application latencies (distance between start-time of first task and end-time of the last task). The factors and should be sufficiently high such that solutions with less overlap and infeasible applications are preferred.

1 Function Cost(, , )
2       = * Overlaps() + Length();
3       = * Infeasible() + Latency();
4       return ;
5      
6
Algorithm 4 Cost

7.5 ASAP list scheduling

To calculate a schedule for a given precedence graph with associated order and routing, we use an ASAP list-scheduling heuristic [SinnenTaskSchedulingForParallelSystems], which schedules each node of the precedence graph in the given order.

The algorithm, presented in algorithm 5, starts by iterating through each entry of the given order (line 2). An entry may either be a task or a stream. For each entry, we determine where it will be scheduled and create an indexable list with all these locations (algorithm 5). For a task, that set would contain just one end-system, while for a stream, it may contain many links (which are synonymous to an output port of a switch/ES) and also multiple end-systems, if the stream is secure, thus requiring MAC generation/verification.

Using these locations we also create a set of blocks (algorithm 5). A block is a tuple which is associated to an entry (task/stream) and a location (node/link). represents the block offset. and are parameters representing a lower and upper bound on the offset, which are used during the algorithm. The set is implemented as a linked list, where and are references to neighboring blocks on the route . Note that in the case of multicast streams could contain references to multiple blocks.

1 Function Schedule()
2       foreach  do
3             L = GetRoute(n, );
4             B = CreateBlocks(n, L);
5             l = L[0];
6             i = 0;
7             while true do
8                   b = B[l];
9                   = CalculateLowerBound(n, b, , );
10                   = EarliestOffset(b, l);
11                   if  ==  then
12                         return false;
13                        
14                   else if  then
15                         = ;
16                         foreach  do
17                               if IsBlockOnLink(g) then
18                                     = LatestQueueAvailableTime(g, );
19                                    
20                              
21                         end foreach
22                        i = i + 1;
23                         if i len(L) then
24                               l = L[i];
25                              
26                        else
27                               break ;
28                              
29                         end if
30                        
31                   else
32                         g = b.prev;
33                         = EarliestQueueAvailableTime(b, );
34                         l = b.prev.l;
35                         i = L.indexOf(l);
36                        
37                   end if
38                  
39             end while
40             = UpdateSchedule(B);
41            
42       end foreach
43      return ;
44      
45
Algorithm 5 Scheduling - ASAP Heuristic

We now iterate over all these blocks (lines 7-8). For each block we begin by calculating the lower bound on the offset (line 9)222The algorithm can be found in Appendix B . Usually, this lower bound is going to be the end-time (offset+length) of the block on the previous link, making sure that a stream is scheduled consecutively along its route. The first block is the maximum of all end-times of the last blocks of the predecessors of the current entry in the precedence graph. For example for application in Figure 7, the lower bound of the offset of the block of would be set to the maximum of the end-times of the last blocks of , and .

Also, for a secure stream, for all blocks on receiver ESs (i.e., MAC validation tasks), the lower bound is set to the end-time of the corresponding key verification task in the TESLA interval after the stream was received on the ES, since, according to the TESLA security condition, the stream can only be authenticated from that point on.

In the next step, the earliest possible offset for the current block is calculated (line 10). This function returns the earliest offset greater or equal to the lower bound within the feasible region. For more detail see subsubsection 7.5.1.

If such an offset is found and it is smaller than or equal to the upper bound, we can assign it to the block (line 14). We then iterate through each of the following blocks and set their upper bound to the latest point in time when their node is available and has been since the offset (line 15-18). This is done to fulfill the TSN constraint which forbids different streams to interleave within a queue (c.f. [raagard], [craciunas_rtns_16] for a more detailed explanation).

If such an offset is found but it is larger than the upper bound, it is impossible to schedule the block while the port is still available, i.e., without it interleaving with other streams (line 25). Consequently, we have to backtrack and schedule the previous block at a later time. Therefore we set the lower bound of the previous block to the earliest time when the current port is available and remains so until the offset (line 26-27).

(a) Step 1
(b) Step 2
(c) Step 3
Figure 8: Backtrack example: Scheduling

Figure 8 gives an example of this process. In step 1, has already been scheduled, and we are in the process of scheduling . We have scheduled the first block on and are now trying to schedule the second one on . The lower bound of our offset is set to the end-time of the first block. The upper bound is set to the latest time after which is still available after the offset of the first block, i.e., the start time of on that link. Finally, we find the earliest offset to be only after the end time of . It cannot be earlier since then the blocks of and would overlap. However, scheduling at that time is not possible since it would mean that the two streams interleave at the same port. Consequently, in step 2, we backtrack and reschedule the first block of by setting the lower bound on its offset to the earliest time when its port is available and remains so until . In step 3, we are able to schedule the second block of without problems.

Once we have successfully found an offset for each block, we can update the schedule (algorithm 5). This will remove the found blocks from the feasible region.

7.5.1 Calculating the earliest offset

Calculating the earliest offset (algorithm 6 shows the function) for a given block is an important part of the heuristic. It takes a block as an input and calculates the feasible region for that block (line 2). It then returns the lowest possible time that is within the feasible region and greater or equal than the lower bound (lines 3-6).

1 Function EarliestOffset(b)
       /* ordered set of intervals */
2       I = GetFeasibleRegion(b);
3       foreach  do
4             = max(, i.begin);
5             if i.contains() then
6                   return ;
7                  
8            
9       end foreach
10      
11
Algorithm 6 ASAP Heuristic - EarliestOffset

The function to calculate the feasible regions for a given block is detailed in algorithm 7. We start by getting all free intervals on the node/link