DeltaPath: dataflow-based high-performance incremental routing

08/21/2018
by   Desislava Dimitrova, et al.
0

Routing controllers must react quickly to failures, reconfigurations and workload or policy changes, to ensure service performance and cost-efficient network operation. We propose a general execution model which views routing as an incremental data-parallel computation on a graph-based network model plus a continuous stream of network changes. Our approach supports different routing objectives with only minor re-configuration of its core algorithm, and easily accomodates dynamic user-defined routing policies. Moreover, our prototype demonstrates excellent performance: on Google Jupiter topology it reacts with a median time of 350ms to link failures and serves more than two million path requests per second each with latency under 1ms. This is three orders-of-magnitude faster than the popular ONOS open-source SDN controller.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/05/2019

D2R: Dataplane-Only Policy-Compliant Routing Under Failures

In networks today, the data plane handles forwarding—sending a packet to...
05/10/2019

RetroFlow: Maintaining Control Resiliency and Flow Programmability for Software-Defined WANs

Providing resilient network control is a critical concern for deploying ...
10/02/2021

Optimized Graph Based Routing Algorithm for the Angara Interconnect

JSC NICEVT has developed the Angara high-speed interconnect with 4D toru...
02/03/2019

Contra: A Programmable System for Performance-aware Routing

We present Contra, a system for performance-aware routing that can adapt...
08/04/2021

Randomized Local Fast Rerouting for Datacenter Networks with Almost Optimal Congestion

To ensure high availability, datacenter networks must rely on local fast...
02/16/2020

The HPIM-DM Multicast Routing Protocol

This paper proposes the HPIM-DM (Hard-state Protocol Independent Multica...
04/03/2018

Machine Learning-Assisted Least Loaded Routing to Improve Performance of Circuit-Switched Networks

The Least Loaded (LL) routing algorithm has been in recent decades the r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

We show how online route computations can be designed and implemented as incremental streaming computations over graphs. We first propose an abstract execution model which translates SDN routing logic into a streaming computation. The model is expressive: it accommodates a variety of routing algorithms with different cost functions (as used in traffic engineering), or routing policies (e.g. those used for NFV service chaining). It is also easy to use: it requires only basic declarative configuration to define an algorithm which is then automatically executed incrementally. It also leverages the high throughput and low latency of dataflow frameworks for scalability and performance.

We then present DeltaPath, a high-performance incremental implementation of this routing abstraction as a data-parallel computation in Timely Dataflow (Murray et al., 2013), a distributed execution platform. Network changes cause DeltaPath to identify only the affected set of forwarding rules for recomputation. To our knowledge, we are the first to propose such an approach for SDN routing applications.

SDNs offer flexible network control by aggregating network-related information at a single point. This global view allows for centralized route computation, replacing traditional distributed routing protocols. However, scaling centralized route computations to the size and traffic volumes of datacenters is challenging, prompting multi-threaded (Erickson, 2013; Flo, 2012) or distributed (Berde et al., 2014; Botelho et al., [n. d.]; Tootoonchian and Ganjali, [n. d.]) SDN controller design.

While scalability is a challenge, an SDN must also respond quickly to changes in underlying topology (due to failures or reconfigurations), traffic policies, and workloads in order to maintain connectivity and comply with service level agreements: a link failure must initiate re-routing to avoid traffic loss, or passing a link utilization threshold should trigger traffic redirection.

Existing SDN routing modules re-execute all required routing computations from scratch (Flo, 2012; ONO, 2016; Ope, 2016; Ryu, 2016), but this is inefficient (Narváez et al., [n. d.]): even forwarding rules not affected by the change will be recomputed. While much research has looked at algorithms for more flexible traffic control, little attention has been paid to efficient, incremental execution of those algorithms in SDN controllers, even though incremental routing algorithms are well-studied (Cisco, 2003; Gredler and Goralski, 2005).

Here, we show that incremental routing can be combined with the (so far) separate field of streaming graph computations to provide a powerful new model for constructing SDN control planes, and point to a large unexplored design space for network control systems and their execution models.

Contributions. We make the following contributions. First, we propose a generic model for executing routing computations in a pure streaming and incremental fashion (§ 3). Second, we present DeltaPath (§ 4): a prototype shortest-path routing application which also supports QoS routing (§ 4.1) and routing policies (§ 4.3). Third, we conduct evaluations with realistic-size topologies, including Google’s Jupiter (§ 5) and show that DeltaPath computes end-to-end paths efficiently (§ 5.1) and recovers extremely quickly from link and switch failures (§ 5.2) – it is significantly faster (7.9x tail and 30x median latency) than the popular ONOS controller (§ 5.2). We also evaluate DeltaPath’s performance for updates in link parameters (§ 5.3), other routing strategies, (§ 5.4), and policies evaluation (§ 5.5). Finally, we discuss how DeltaPath could be deployed in practice (§ 6).

2. Background and terms

Here we clarify constraint-based routing and other terms we use in the rest of the paper.

Constraint-based routing is a class of routing computations that comply with (i) administrative policies, and/or (ii) Quality of Service (QoS) requirements (Crawley et al., 1998; Younis and Fahmy, 2003) such as bandwidth, packet delay or loss, etc.

Policy-based routing, or policy routing, complies with high-level policies like those presented in (Vissicchio et al., 2015; Foster et al., 2011; Soulé et al., 2014). We focus on policies which restrict the type and number of nodes on a path and support service chaining (Anwer et al., 2015; Zave et al., 2017). A network administrator who wants to steer external traffic through a firewall but traffic from trusted clients should pass untouched, might use a waypoint policy to specify that routes for external traffic should contain a firewall node, and a NOT-constraint policy to exclude firewalls from paths used by trusted traffic. These policies are agnostic to the underlying routing algorithm.

QoS routing includes performance targets: video conferencing requires low-latency paths, while data backup prefers sufficient bandwidth (Casado et al., [n. d.]). In practice, QoS requirements are met via different cost functions and path selection criteria. A link cost function expresses the cost of a link as a function of a metric (e.g. bandwidth, load, monetary cost, delay). The costs of all links comprising a path define a path cost according to an additive or a min/max path cost function. Finally, a path selection criterion drives the choice among paths of equal cost. QoS routing is one tool for traffic engineering, which we discuss in § 7.

Shortest path routing, shortest length, or hop-based routing is the simplest QoS routing algorithm. In shortest path, the cost is an additive function over constant link costs of 1. Shortest path is insensitive to traffic dynamics but can be adapted to express other algorithms. For example, changing the link cost function to reflect link utilization or residual capacity results in shortest-distance routing (Ma and Steenkiste, [n. d.]) with bandwidth constraints. Further replacing the additive path cost function with results in widest-path routing. We show in § 4.1 how to implement these in our abstract execution model. Other link attributes can be represented in a similar way. We leave enhancements for load balancing (Gandhi et al., 2016; He et al., [n. d.]; Katta et al., 2016) and bandwidth provisioning (Nagaraj et al., [n. d.]; Tomovic and Radusinovic, [n. d.]) for future work.

3. Routing model

We now describe our execution model for routing computations: a graph-based representation of network topology, routing policies, and flow requests.

The routing model adopts a dataflow abstraction for routing computations with three types of inputs represented as event streams: (i) topology changes, (ii) policies, and (iii) path requests. The dataflow continuously performs three types of computation, reacting to the input events it receives.

First, it uses the stream of topology changes to produce a set of base forwarding rules, according to a QoS routing strategy (e.g. shortest path, widest path, least-delay path, etc.). Second, it combines the policy stream and the base rules to produces the policy forwarding rules which together with the base rules form the full set of forwarding rules. We discuss how his set of rules can be deployed in the dataplane in Section§ 6. Third, for each path request, it generates a routing path by traversing the full set of rules. Path requests enable network behavior troubleshooting and complement dataplane-driven routing (cf. § 5.1). The overall dataflow consists of three generic operators as shown in Figure 1 (top).

The topology representation used for routing computations is an undirected111We assume duplex links for the ease of presentation. Extending the network model to support unidirectional links is straight-forward and requires no changes to DeltaPath’s functionality. property graph where:

  • is a set of nodes, each with a unique NodeID, a label giving the node type (e.g. “switch”, “server”, etc.), and zero or more properties relevant to routing, such as the online status of a switch.

  • is a set of edges representing physical links. An edge is identified by its endpoints and has associated properties (such as link capacity, current utilization, average delay, etc.), a weight corresponding to the link’s cost (usually computed according to a function), and a value , used to represent changes in (explained below).

As the network topology changes, evolves correspondingly. Here, we consider three types of change: (i) adding or removing links, (ii) adding or removing nodes, and (iii) updating link weights. Changes in node and edge properties can be handled similarly.

Link additions and removals are represented by adding edges with delta values equal to +1 () and -1 (), respectively. Such edge collections are added to and result in the removal of existing edges by aggregating the deltas of edges with common attributes and discarding all edges whose deltas sum to zero. For instance, a link between switches and , represented by (, , , ), is removed by adding an edge (, , , ). Adding or removing a node (e.g. a switch) is reduced into adding or removing all edges with as an endpoint. Finally, updates to link weights are modeled by removing the corresponding edge, followed by adding a new edge with the same attributes and the updated weight. This delta-based approach is reminiscent to the one used in (Loo et al., 2006) and other view maintainance approaches in databases, and enables asynchronous incremental computations, as we show in § 4.

Policies express constraints on the nodes appearing in a path. Adopting the notation in (Foster et al., 2011), a policy is one or more node constraints on a path between an origin node and a target node . Node IDs and IP address are two possible ways to identify , and . In the example of Section § 2, the waypoint policy is written as , where is an external node, is an internal node and is a firewall node. A NOT-constraint denoting that a flow must not go through a firewall is expressed as , where is a node registered with a trusted client. Policies are added or removed via the policy operator of Figure 1.

The model also supports path policies such as the ones proposed in (Vissicchio et al., 2015), e.g., single failover path , 2-way multipath , and redundant transmission over two paths . The model complements switch-based access policies by providing forwarding rules for whitelisted traffic222If blacklisting is used this can be handled directly at the switch.. Other types of policies, such as those with general path constraints in the form of regular expressions and complex multipath policies, are beyond the scope of this paper and we leave them as future work.

Path requests represent either flow requests or troubleshooting queries. The former exhibits if not all flows can be handled at the switch due to limited memory. The latter lets us retrieve the path a flow should have taken and use it to troubleshoot its exhibited routing behavior. A request has the form , where is an ID that uniquely identifies a flow, and are the flow’s origin and target nodes respectively.

4. DeltaPath: dataflow-based routing

We now present DeltaPath’s design and detail the functionality and implementation of its core components.

Figure 1. DeltaPath’s logical dataflow with three operators (top) and physical dataflow (bottom). Rectangles in the physical dataflow denote data shuffle stages.

DeltaPath leverages the global network view of SDNs to first proactively compute forwarding rules for all pairs of network switches according to a QoS routing strategy. This happens in the routing operator of Figure 1 during DeltaPath’s initialization on an initial snapshot of the graph (cf. § 3). To minimize disruption of switch operations forwarding rules (and their policy-derived counterparts) are pre-installed at switches. DeltaPath thereafter employs efficient incremental computations to reactively update rules on need-to basis, e.g., upon a network or a policy change. DeltaPath’s approach applies to wide range of routing strategies (cf. § 4.1) and is agnostic to the granularity of routing decisions, e.g., per flow, flow class or ToR switch (cf. § 6).

Each operator in DeltaPath is implemented as a streaming operator above Timely Dataflow, first introduced in Naiad (Murray et al., 2013). Timely Dataflow is an ideal platform for us due to (i) its event-driven programming model, which allows us to naturally represent streams of asynchronous routing requests and network updates, and (ii) its native support for arbitrary iterative computations, common to routing tasks. Furthermore, in our experience, Timely’s Rust implementation outperforms other platforms such as Flink (fli, 2018) and Spark Streaming (spa, 2018).

Timely Dataflow employs a data-parallel execution model and processes the dataflow graph across a configurable set of workers as shown in the lower part of Figure 1. Each worker processes a partition of the input streams (data parallelism) asynchronously and exchanges data with other workers via channels. All records are tagged with logical timestamps (epochs) to keep track of workers’ progress and ensure the consistency of the asynchronous computation. asynchronous: each worker periodically issues statements about local work, together comprising a global view of progress.

4.1. QoS routing in DeltaPath

The routing operator (Figure 2) implements a QoS routing strategy as a continuous cyclic computation accepting two inputs: a stream of topology changes (), represented as edges (cf. § 3), and a stream of forwarding rules (), connected in a feedback loop with the operator’s output. The output is a stream of base forwarding rules.

Figure 2. DeltaPath routing as a cyclic dataflow computation.

A network adminstrator deploys a routing strategy by defining three functions, which are used to instantiate the routing operator: a link cost function , a path cost function , and a path selection function . DeltaPath then transparently provides incrementality.

The routing operator maintains some state: a snapshot of the current network graph (cf. § 3) and a corresponding collection of base forwarding rules, . Specifically, is a collection of edge records of the form , where and are endpoint nodes, and is the edge weight given by . Edge records are constructed on-the-fly from incoming network changes by the instantiate component of Figure 2. is a collection of forwarding rules of the form , where is the next-hop node in the path from to , is the total cost of the path given by , and is the length of the path in number of hops. is used to express updates to edges and forwarding rules in and respectively. At startup 333A one time event following the deployment of the routing application in a live controller., is populated with initial forwarding rules, called tautologies, generated by the initialize component of Figure 2. Table 1 summarizes this notation.

: source node : stream of link updates
: destination node : stream of rule updates
: next-hop node : set of links (edges)
: link weight (cost) : set of link changes
: path cost : set of rules
: path length (hops) : set of rule changes
: path cost function : path selection function
: link cost function : delta value ()
Table 1. Notation of Section 4

Cost functions. The link cost function defines edge weights, which can be either static, as defined by hop count, link capacity or propagation delay, or dynamic (updated by network monitoring), as defined by link utilization or observed delay. Since the choice of does not affect the routing computation, DeltaPath easily extends towards multi-parameter routing strategies, e.g, based on capacity, delay and buffer space (Guck et al., 2016; Widyono and Group, 1994).

The path cost function defines the cost of a newly discovered path that is constructed at a specific point during the iteration by ‘expanding’ a previously computed path with an adjacent edge. is applied on the cost of the existing path () and the weight of the edge (). The function selects the best path (among all discovered) for each pair. This is an aggregation function applied to the collection of forwarding rules in the local state of the routing operator. While can be an additive or a min/max function, is typically a min/max.

Table 2 shows the functions for shortest path, widest path, and two versions of the shortest-distance algorithm (one with weights denoting links’ available bandwidth, and another with weights denoting links’ utilization).

Note that in real networks, different classes of traffic coexist, requiring different QoS routing strategies. In the previous example, two reasonable strategies could be a shortest-distance strategy with delay-based weights for video conference flows, and a widest-shortest path srtategy for backup application flows. This could be realized using two instances of DeltaPath, each one running its own strategy in parallel. This approach provides a simple solution to traffic differentiation and could be optimized by leveraging knowledge on traffic patterns.

QoS routing algorithm link cost function path cost function path selection function*
hop-based = 1 = = min()
shortest-distance = =
 (free bandwidth)
shortest-distance = link’s utilization
(link utilization)
shortest-widest = link’s free bandwidth = min{,} = {max(),min()}
* The path selection function is applied to all rules for a pair in the collection (cf. Figure 2)
Table 2. Example cost functions for QoS routing in DeltaPath

4.2. Incremental routing computation

For simplicity we describe DeltaPath’s implementation assuming a shortest-distance routing strategy, where a path between two nodes and in the network graph minimizes the sum of weights (e.g. utilization) on its edges. Other routing strategies are implemented by providing the appropriate cost functions of § 4.1.

The routing operator is, in database terms, a delta-based join of and at a specific point in time (an epoch in Timely Dataflow). It implements a data-parallel incremental label propagation algorithm as a streaming dataflow operator. Label propagation is an iterative procedure whereby at each step every node receives values (labels) from its neighbors, applies an update function to its own label, and propagates this result back to its neighbors.

The algorithm works in two steps. First, an initial label propagation over the topology yields a complete set of forwarding rules (the base rules) according to a routing strategy, and thereafter the same algorithm incrementally reacts to changes in the topology and maintains these rules.

Algorithm 1 shows the routing logic applied on a per-epoch basis. Assume that an initial network topology exists in , and the set of tautology rules is generated and pushed to . For a shortest-distance strategy, the tautology rules have the form for each network node – intuitively, “each node can reach itself via itself at zero cost”.

Lines 3-8 process incoming forwarding rules in . The algorithm first collects the updates to (line 4) in a collection (initially all tautologies) and joins with as follows: For a rule and an edge , the join is performed on the ID of the source nodes, , and produces new forwarding rules of the form .444 Semantics here imply that the edges in are symmetric (cf. § 3) Intuitively, a new rule indicates that the destination node of the link can reach the destination node of the rule through (which is the same as , hence, adjacent to ) and with a cost defined by the path cost function ( is the cost of rule that amounts to the aggregated edge weights in the current path between and ).

Lines 3-8 execute repeatedly. After each iteration, the new forwarding rules are pushed to (line 8). These are used at the start of the next step to update the (line 4) using the path selection function , which selects the best forwarding rule (path) for each distinct () pair in .

Algorithm: DeltaPath routing
Input : A stream of network updates and a stream of forwarding rules
Output : A stream of forwarding rules
//The state (initialized once)
let be the network graph let be the collection of forwarding rules //Process stream of forwarding rules
1 while there are forwarding rules in the input stream  do
2        Update local state and collect changes, let ; for each pair  do
3               Create a new forwarding rule as: ; Push to the rule stream ;
4       
//Process updates coming from the network
5 while there are edge records in the input stream  do
6        Update local state and collect changes, let ; for each pair  do
7               Create a new forwarding rule as: ; Push to the output stream ;
8       
Algorithm 1 Core routing logic in DeltaPath

Lines 9-14 describe the processing of incoming network changes in . The algorithm first collects updates to in a collection (line 10). During this step, it matches records against edges in and sums values of edges with the same . All records with are then garbage-collected. This process guarantees a consistent view of the topology graph where the network updates are reflected on the topology before the incremental rule computation begins. Then, the algorithm joins with on the ID of the source nodes, , to generate new forwarding rules. The latter are pushed to and drive the incremental computation in lines 3-8. The algorithm uses the values of matched records to identify rules to recompute. Specifically, it multiplies the values of matched records (lines 7 and 13) so that resulting records with cancel out existing invalid rules, while rules with and the next minimum become the new established rules. At each step, rules with are garbage-collected as with edges.

The computation converges when and are both empty, so the current epoch has been processed. At that point, the updated forwarding rules are pushed to the output stream . These rules correspond only to the updates applied to with respect to the previous epoch.

Example: Table shows a hypothetical collection of forwarding rules after the shortest-distance algorithm has converged ( attribute omitted). For simplicity, we only show two rules for nodes (switches) and . Table contains two example updates: a removal of link and a new link between and with weight 3. In this case, Algorithm 1 (lines 9-14) will join and to produce the records in the bottom table of Figure 3, which are pushed to the rule stream . Note that the top two rules in this table match the existing, now invalid, rules and in . When the latter is updated (line 4), the algorithm will sum their values and garbage-collect them.

Figure 3. An example of incrementally updating forwarding rules. The top two rules in the result table match the existing invalid rules , . When updating , the routing algorithm will sum-up their values and garbage-collect them.

Discussion: DeltaPath’s routing component does not materialize any path; instead, it computes per-hop forwarding rules for the paths between all pairs of reachable nodes. Path construction is done lazily by the policy and the path retrieval components, as we explain in § 4.4 and § 4.3.

Algorithm 1 can be regarded as a specialization of Differential Dataflow (McSherry et al., 2013), a library built on top of Timely Dataflow to automatically incrementalize computations in the spirit of DeltaPath. To further explore the potentials of our approach, we have also implemented a prototype on Differential Dataflow, however, performance in that case degraded by due to the different (internal) data structures used by (McSherry, 2018).

As a final note, DeltaPath’s routing component could be made fault-tolerant via standard replication approaches but since Timely Dataflow already distributes transparently, more sophisticated fault-tolerance techniques (Jacques-Silva et al., 2016; Carbone et al., 2015) are applicable. We leave this as future work.

4.3. Policy evaluation

DeltaPath’s policy component allows network operators to define custom routing policies. Recall that we only consider two types of policies here: (i) waypointing policies of the form , and (ii) NOT-constraint policies of the form (cf. § 3).

Waypointing policies. Waypointing policies are evaluated by the policy component in two steps. First, the input policy is parsed to create a tree structure of constraints. Then, the component traverses the tree and initiates a series of path retrievals conforming with the policy. The example policy from § 3 breaks down into two path retrievals: one from to , and a second one from to . In general, for a waypointing policy with constraints , the policy component triggers path retrievals. The procedure is identical to the one described in § 4.4 but executes on the set of base forwarding rules. The resulting policy rules are combined with the base rules and pushed to the path retrieval component. Intuitively, a collection of rules for a waypointing policy is a subset of the base rules generated by the routing component for the current graph snapshot.

NOT and path-constraint policies. DeltaPath’s general model (cf. § 3) provides a very simple approach to implementing NOT and path-constraint policies. A NOT-constraint can be regarded as a node failure, while a path-constraint can be simulated by a link failure. As an exmaple, consider the policy . First, a request to remove from is sent to the routing component. This removal takes place in a copy of the current graph snapshot so that other active policies are not affected. Then, the updated base rules are sent to the policy component and a policy-compliant path between and is retrieved and pushed to the output. The copy of associated with the specific NOT-constraint policy is kept separately in the state of the routing operator, and is updated by the incoming network changes independently from the original graph . Policies with path constraints are supported in a similar fashion. For example, to find a backup path DeltaPath (1) looks up the links consisting the main path and (2) removes those links from a copy of the topology graph. The recomputed forwarding rules deliver a link disjoint backup path. The backup path enables policies on 2-way multipath and redundant transmissions as well.

Despite its simplicity, this approach might not scale well for very large networks. One direction we are currently exploring to further optimize implementation is reusing common forwarding rules across the different copies of .

4.4. Path retrieval

Given the full set of forwarding rules, a path retrieval is reduced into a simple pointer chasing using the attribute of each rule. For a source node and a destination node , the path retrieval starts with a lookup in the collection of rules to retrieve a rule, let , whose and . Then, it repeats the lookup in the rule collection with and , retrieving a second rule . The procedure iterates for steps until it finds a rule such that . The set of forwarding rules constitute the shortest path between the source and destination .

The aforementioned pointer chasing is essentially the functionality of DeltaPath’s path retrieval component, applied to its collection of policy rules upon a flow request. Path retrievals are also performed during the evaluation of policies as we explain in the next section.

5. Evaluations

We evaluate DeltaPath’s performance and compare it with the state-of-the-art and widely-deployed ONOS SDN controller – a choice we discuss below. We are particularly interested in the following metrics:

  • [leftmargin=*]

  • policy evaluation: DeltaPath computes a waypointing policy with 5 nodes in 3ms and evaluates NOT-constraints in comparable time to node failures.

  • recovery latency from isolated link and switch failures: we show that DeltaPath recomputes routing state 30 times faster than the ONOS controller.

  • recovery from successive failures: DeltaPath’s on-demand recomputation is stable under a sequence of failures, whereas ONOS’ recovery time quickly degrades due to expensive precomputation operations.

We focus on link removals since algorithmically they are equivalent to additions, but represent failures and other unscheduled events of high importance in production networks.

We run DeltaPath and ONOS on a single quad-socket Intel Xeon E5-4640 running Debian Jessie (8.2) with RAM, and frequency scaling enabled, and 8 cores (16 threads) per socket. Our compiler is rustc 1.20.0-stable.

Topologies. We evaluate against the four topologies in Table 3. Fat-tree (Al-Fares et al., [n. d.]) is a version of the common leaf-spine structure which maintains multiple paths between any two access switches and consists of multiple pods – sets of access and aggregation switches. Jellyfish (Singla et al., 2012) is a low-diameter random graph-inspired network designed for easy extendibility, and sized here to support the same number of hosts as Fat-tree. Topo-R is an older topology from a real, operational industrial datacenter which has evolved over time and so has irregular structure compared to a tree topology. Jupiter is Google’s scalable datacenter fabric (Singh et al., 2015), whose building blocks consist of Top-of-Rack, aggregation, and spine layers. Blocks at the upper two layers resemble 2-stage blocking network.

Topology Hosts Switches Ports Links
Jellyfish 27648 1280 48 6912
Fat-tree 27648 2880 48 55396
Topo-R 19404 546 * 917
Jupiter 98304 5632 64 87040

* The number of ports per switch in Topo-R varies.

Table 3. Network topologies used in the experiments

Link cost. We evaluate with two link weight assignments. The Hop-count plan gives all links an equal weight of 1, as is common in datacenter networks, and effectively represents hop-based routing. The Uniform plan corresponds to shortest distance routing based on weights represent link’s utilization, distributed in the range [1,100] with 100 denoting the full link bandwidth. To simulate utilization changes we introduce a random weight update as an arbitrary increase or decrease in utilization for a randomly chosen link. We also introduce batch size as the number of updates processed concurrently to reflect dynamic datacenter workloads. We note that shortest distance algorithms are more efficient in link utilization (Ma and Steenkiste, [n. d.]) than their widest-path counterparts. This choice of algorithms allows us to demonstrate both the applicability of our approach and its generality for constraint routing.

Comparing to existing controllers. Comparing DeltaPath to existing SDN controllers is a challenge, since they are not designed for performance measurement or analysis, and use widely different interfaces to express routing policy. We considered the ONOS (ONO, 2016), OpenDaylight (Ope, 2016), Ryu (Ryu, 2016), and OpenMUL (Ope, 2017). With the exception of OpenMUL, which uses APSP, all perform SSSP (most often Dijkstra’s algorithm) on a per-flow basis. We are unaware of any open-source SDN controllers that perform incremental APSP computation. We chose ONOS based on performance and market dominance (consortium, 2016; Cadenas et al., 2016).

ONOS expects flow demands to be specified declaratively via intents. For example, a PointToPointIntent specifies that two hosts engage in communication. Intents can be unprotected, served by a single shortest path between the endpoints, or protected, having a second, node-disjoint path as backup. When a link failure occurs, all affected intents are looked up in an index. Unprotected paths are recalculated by running Dijkstra on the updated topology graph. Protected paths fail over quickly via stored backups but then require running Suurballe’s algorithm (Suurballe, 1974) again to restore redundancy.

As we show, ONOS is slow in handling link updates because those are interpreted as network failures and trigger full path recomputation on a per-intent basis.

5.1. Retrieving end-to-end paths

Before presenting the core evaluations we discuss DeltaPath’s performance in computing a path. Path retrieval is essential in supporting policies (cf. § 5.5) and DeltaPath’s efficiency lays in extracting the path from a precomputed set of individual forwarding rules555This is in contrast to less efficient approaches that either precompute and materialize routing paths or trigger path computations on the fly..

DeltaPath can return a single path in , five orders of magnitude faster than ONOS, and can answer 2.3 million path requests per second on Jupiter.

After initializing a network with the set of base forwarding rules, we generate batches of randomly selected path requests and increase the batch size from 1 to 8192 in powers of 2). For each we submit 500 batches to a single DeltaPath worker666This is a scenario where parallelism is undesirable due to communication overhead, since path lookup is cheap in DeltaPath.. We measure how long it takes to return all the paths in each batch. No changes to topology or link weights occur in these experiments.

Figure 4. Throughput vs Latency for path lookup, Hop-count plan

Figure 4 shows throughput vs. latency for the different batch sizes in the Hop-count plan. DeltaPath can comfortably retrieve more than 1M paths for all topologies in less than . Using uniformly random distribution of link weights (Uniform plan) results in similar behavior at higher latency and we omit it due to space limits.

Differences in the average path length explain the relative performance for lookups for the different topologies. Intuitively, longer paths require higher number of forwarding rule lookups and thus longer lookup times. Specifically, in Figure 4 the average path is 5.3 hops for Jupiter, 5 hops for Topo-R, 4 hops for Fat-tree and 2.7 for Jellyfish. For the same reason we observed decrease in performance (both throughput and latency) for the Uniform plan.

We compares DeltaPath’s lookup time with ONOS’s time to compute an unprotected (single shortest path), since ONOS reactively runs a shortest path computation for each flow request. Finding a protected path is even more expensive and we leave results out. Since ONOS does not support link weights or batching lookups, we use the Hop-count plan with batch size of 1.

DeltaPath is five orders of magnitude faster than ONOS: for DeltaPath vs for ONOS with Dijkstra. This should be unsurprising: path retrieval extracting from a precomputed set of forwarding rules is far more time efficient than running a path search. However, a proactive approach is only feasible because DeltaPath can incrementally update APSP values extremely fast.

5.2. Reacting to link failures

We first measure how DeltaPath’s time to react to link failures compares with ONOS. In particular, we explore the tradeoff between DeltaPath’s approach of recomputing all paths incrementally in one go, and ONOS’s precomputation of single failover paths for a (protected) subset of flows, combined with flow-by-flow recomputation for affected flows.

We first show DeltaPath’s latency in the face of a single link failure, i.e. how fast it recomputes forwarding rules for all switches after a link has been detected as down. We start with a network with an initial set of rules calculated by DeltaPath, and then remove a randomly-selected link from the topology. We repeat this experiment (with different links) 500 times, each time starting from the same state.

We measure DeltaPath’s latency to identify and update all affected forwarding rules. Running with a single worker here minimizes median latency, but we show results for 8 workers since this minimizes tail latency.

Figure 5. Single-link failure recovery (Hop-count plan).
Figure 6. Single-link failure recovery (Uniform plan).
Figure 7. Link failure recovery for ONOS vs. DeltaPath.

Figures 7 and 7 (summarized in Table 4) show results for DeltaPath. Larger networks natually mean higher latencies, but median recovery time remains below except for Jupiter at around . DeltaPath performs better on Jellyfish than on Fat-tree, intuitively because Jellyfish has smaller diameter than Fat-tree for the same number of hosts (Singla et al., 2012).

The tails reported in Table 4 result from failing links that carry many shortest paths between pairs of switches, triggering many forwarding rule updates. Uniform shows higher latencies since its random weights tend to introduce longer shortest paths (using sum of weights rather than hop count).

Jellyfish Fat-tree Topo-R Jupiter
Hop-count Median 25.65 171.94 7.54 347.28
plan Worst 114.63 705.87 61.89 1628.98
Uniform Median 25.32 131.32 8.32 354.42
plan Worst 336.57 2685.66 66.18 5916.01
Table 4. Summary of Figures 7 and 7 [ms]

Comparison with ONOS: Figure 7 now compares DeltaPath with ONOS on the Fat-tree. For ONOS, we generate an intent between a random pair of switches and then fail a link that affects this intent, repeating the experiment 500 times. Since ONOS does not support link weights, we compare with DeltaPath with all link weights set to 1. Here we run DeltaPath with a single worker, since ONOS handles each intent on a single thread.

“DeltaPath” shows the time for DeltaPath to fully recompute APSP following the link failure. After this time, routing has effectively been recomputed for all paths in the system.

“ONOS Dijkstra” shows this time for ONOS to recompute the path for the single affected intent in the experiment, assuming it is unprotected and has no precomputed failover.

This computation for just one path is, at the median, 26 times slower than DeltaPath and 7.6 times slower at the tail. Multiple intents sharing the affected link multiply this cost for ONOS, but not for DeltaPath.

“Protected intents” in ONOS have precomputed failover paths and can be restored much faster. “ONOS Lookup” shows how long ONOS takes to find each affected intent given the failed link. After this lookup, failing over to the precomputed backup is extremely fast – faster than DeltaPath.

However, this failover latency has a cost. If another link affecting the backup path subsequently fails before the backup is recomputed, ONOS must then compute SSSP for each affected intent, which as the “ONOS Dijkstra” result shows can be prohibitive in cost for links carrying many intents.

Moreover, this “window of vulnerability” for ONOS can be quite large. “ONOS Suurballe” shows how long ONOS takes recompute a new backup path using Suurballe’s algorithm, on a much smaller network than the other results: ONOS runs out of memory in Suurballe on a 48-ary fat tree, since its implementation keeps multiple copies of the entire graph and uses extensive path materilization) so the Surballe results here are for a 16-ary fat tree.

With that caveat, this is the time taken for ONOS to restore path redundancy from the point where the failure was detected, but note again that this figure is for a single ONOS intent, and already can run into several minutes of CPU time

The tradeoff DeltaPath offers, therefore, is a latency of tens-to-hundreds of milliseconds to restore complete connectivity independent of the number of flows in the system. This is against ONOS’ fast failover for the subset of intents protected by (potentially prohibitively) expensive precomputation and slow restoration of unprotected paths, both of have costs which increase linearly in the number of flows in the system.

As an aside, DeltaPath also compares well with distributed routing in Open Shortest Path First (OSPF). An OSPF link failure causes each switch to calculate a new routing tree with itself as root. Convergence time after a failure is detected consists of this computation plus a configurable delay timer (spf-delay), which is set by default in Juniper switches to  (Jun, 2017), Brocade to 5s (Bro, 2017), and Cisco to (down from 5s last year) (Cis, 2017).

Entire switch failures: Finally, we repeat the experiment but now fail an entire switch and its attached links. Figures 10 and 10, summarized in Table 5, show the time take for DeltaPath to recover all forwarding rules.

Figure 8. Switch failure recovery (Hop-count plan)
Figure 9. Switch failure recovery (Uniform plan)
Figure 10. Throughput vs Latency for path weight updates, (Uniform plan)

Performance is similar to the single-link case with a longer tail switch failure creates more work than an average single-link failure on the same network. Similar arguments apply as to the relative performance of different topologies and link weight distributions.

The comparison with ONOS is also similar. Since ONOS pre-computes a second, node-disjoint path for each “protected” intent, fast failover will be the same for a switch failure as for a link, and so the figures are essentially the same as in Figure 7. Note, however, that ONOS recomputation times are linear in the number of intents that must be rerouted, and this figure will likely be much higher in the case of a switch failure.

Jellyfish Fat-tree Topo-R Jupiter
Hop-count Median 57.69 534.83 3.22 939.67
plan Worst 225.13 5224.34 116.77 11009.44
Uniform Median 113.13 1267.63 13.5 1882.75
plan Worst 537.52 5234.34 277.01 18921.72
Table 5. Summary of Figures 10 and 10 .

Discussion: DeltaPath’s performance in this experiment is due both to an incremental algorithm and a high-performance implementation. Govindan et al. (Govindan et al., [n. d.]) report that 80% of failures in Jupiter take between 10 and 100 minutes. This includes the manual task of restoring physical connectivity but it may be still interesting to map DeltaPath’s performance to the lower bound. From Table 4, median recovery time for Jupiter topology with hop-count (equal link weights) constitutes 3% from the reported 10min lower bound, while worse-case performance constitutes 16%.

5.3. Link weight updates

The merits of DeltaPath’s incremental computation apply also to handling concurrent link weight updates. Reacting quickly to large numbers of link weight updates is important for flexible network control, one example being bandwidth-constraint routing which we address in § 5.4. Here, we focus on the throughput-vs-latency trade-off DeltaPath achieves in handling weight updates, using the Uniform plan.

In this experiment, we form batches of link weight updates which mimic addition and removal of host-to-host flows and their resource consumption. We (i) randomly select a path and add its links to the batch, and (ii) link weight updates are fixed at 5% of total link capacity, representing the maximum flow size seen in public traces (Roy et al., [n. d.]; Benson et al., 2010). Taking a conservative number let us establish an upper bound on performance rather than tailor results to individual workload distributions. We apply 500 such batches successively to a network initialized as in § 5.2, making updates in this experiment additive. DeltaPath uses 32 worker threads.

Metric Target (s) Jellyfish Fat-tree Topo-R Jupiter
median throughput 1 350 5 1800 -
10 973* 33 3230* 13
mean path length 5 6.4 5 8
(*) extrapolated result due to insufficient topology size to extract large pool of paths.
Table 6. Throughput of processed link weight updates for a target latency.

Figure 10 shows throughput vs. median latency for applying updates as the batch size varies Each marker represents a batch size between 1 and 1028 in powers of two. Table 6 further extracts the median throughput and corresponding average path length (in hop count) for both plans with a target latency of .

DeltaPath achieves the latency target at high throughput on all topologies except Jupiter, where it required median for an 8-hop path. At a latency target, throughput grows to 13 updates/s in Jupiter and 20 updates/s in Fat-tree.

Convergence time is the main factor in DeltaPath’s performance: large well-connected graphs like Jupiter and Fat-tree need higher number of iterations to converge due to longer average path length (cf. Table 6). With twice as many nodes, Jupiter sees bigger number of affected paths.

To our knowledge, DeltaPath is the first system to handle concurrent network changes efficiently. Most open source SDN routing modules restart the computation, which does not scale. Despite its popularity, ONOS does not support arbitrary link weight updates. If we were to enable it, in the time ONOS would update a single intent (), DeltaPath can update all rules affected by 32 link weight updates.

5.4. QoS routing

We now show how DeltaPath behaves for all routing algorithms of Table 2. For each algorithm, we first initialize DeltaPath with Uniform plan (except in the case of hop-based where we use Hop-count plan), and let it compute an initial set of base rules. Then, we generate 1000 flow requests, which are submitted one after the other to DeltaPath, and we measure DeltaPath’s latency to retrieve the best path and update its collection of forwarding rules. Flows’ origin and target nodes are randomly chosen, whereas their sizes and inter-arrival times are based on the distributions extracted from the publicly available Facebook traces in (Roy et al., [n. d.]) (for cache leaders). The collection of base rules is updated after each request by all algorithms except the hop-count. This is done to keep the graph snapshot consistent for the next request, and reflects the fact that link weights change according to the active flows and their size (i.e. the bandwidth they reserve).

Figure 11 shows DeltaPath’s latency distribution for each algorithm on Jupiter (results are similar for other topologies). Shortest path (hop-based) is the fastest since it is workload insensitive, while shortest-widest path algorithm has higher latencies because the paths it updates tend to be longer than in other algorithms. The two shortest distance algorithms react faster with SD free BW (relating weights to available link bandwidth) being the faster one. This is explained with the difference in link cost functions: the linear cost function of SD utilization (relating weights to link utilization) causes even small flows as those in (Roy et al., [n. d.]) to trigger rules updates, while this is not the case for SD free BW.

We conclude that DeltaPath can offer performance benefits to different algorithms which can be expressed in its model.

Figure 11. Latency for serving flow requests (including path retrieval and rule updates) on Jupiter.

5.5. Policy evaluation

Here we evaluate DeltaPath’s performance on policy evaluation, as described in § 4.3. For the experiments of this section we only use Jupiter, which is the largest of all topologies in Table 3. Performance results with other topologies are similar and are omitted due to lack of space.

Waypointing policies. Figure 12 (top) shows DeltaPath’s latency in evaluating waypointing policies on Jupiter. In these experiments we vary the number of intermediate nodes the path has to go through from to (). For each value of , we generate 500 policies by randomly choosing the origin, target and intermediate nodes among all nodes in the topology. Each box-plot shows DeltaPath’s latency distribution in evaluating 500 random policies independently. Since waypoiting policies are reduced into one or more fast path retrievals (cf. § 4.4), latency in Figure 12 (top) remains low and increases slightly with the number of intermediate nodes.

Figure 12. Latency for policy evaluation on Jupiter. Policies without constraints () correspond to a single path retrieval and are included here for reference.

NOT-constraint policies. Figure 12 (bottom) shows DeltaPath’s latency in evaluating NOT-constraint policies. In these experiments we vary the number of intermediate nodes to exclude from a path from to (). For each value of , we generate 500 NOT-constraint policies by randomly choosing the origin, target and intermediate nodes as before. Each box-plot shows DeltaPath’s latency distribution in evaluating 500 random policies independently. All node removals per policy are submitted in a single batch to the routing component and are processed in parallel.

The time to evaluate NOT-constraint policies is significantly larger compared to waypointing policies because, in this case, the policy component needs to wait for the routing component to compute the new base rules after the node removals (cf. § 4.3). Note that the results in Figure 12 (bottom) are consistent with those in Figures 10 and 10 for Jupiter.

5.6. Scalability

We finally examine how DeltaPath’s throughput and latency scale with worker thread count. For most cases, DeltaPath scales respectably, and can show 6-fold speedup on Jupiter with 32 worker threads.

We initialized each topology with the Uniform plan and measured throughput and latency over 1000 runs while increasing the update batch size from 1 to 512 in powers of two, and the number of worker threads from 1 to 32 (the maximum number of hardware contexts on the machine).

#workers 1 2 4 8 16 32
Fat-Tree 26 24 24 30 56 77
(37.91) (41.09) (40.87) (32.50) (17.62) (12.97)
Jellyfish 174 212 198 188 337 426
(5.50) (4.70) (5.03) (5.29) (2.96) (2.34)
Topo-R 492 566 656 1142 1343 1445
(2.02) (1.76) (1.52) (0.87) (0.74) (0.69)
Jupiter 8 10 14 19 34 53
(123.14) (96.00) (70.28) (51.08) (29.32) (18.80)
Table 7. Maximum throughput vs. parallelism, Uniform plan. Optimal batch sizes are shown, with latency (msec) per link weight update below in parentheses.

Table 7 shows maximum median throughput and latency (in parentheses) for varying degrees of parallelism, for the optimal batch size in each case.

Such measurements should be treated with caution, but for many topologies we can conclude that DeltaPath scales well once the number of updates is large enough to make the computation CPU bound. This fits intuitively with the data-parallel computation of the routing operator.

6. Deployment considerations

DeltaPath is designed as a routing application easy to deploy on any SDN controller. The one requirement we impose is the streams format and semantics (network updates, policy updates, flow rules) to be consistent with the controller API.

Routing granularity. DeltaPath’s execution model is agnostic to routing granularity. The technique applies equally well to computing per-flow forwarding rules or finding paths between ToR switches. Host-based and switch-based routing are handled by the same DeltaPath computation, where the latter simply requires mapping the produced rules, containing switch addresses, to IP addresses. DeltaPath allows network architects to make routing granularity decisions.

Dynamic QoS routing. In real networks, several DeltaPath instances may be deployed in parallel, each running a distinct QoS routing strategy (§ 4.1). The behavior of those instances depends on whether the routing strategy uses static or dynamic weights. With static weights, forwarding rules can be pre-installed at switches and controller intervenes only upon changes in network topology or administrator policies. With dynamic weights however, routing decisions adapt continuously to live performance metrics, e.g., link utilization, and pre-installing rules is not feasible. Flows receive performance guarantees at the cost of inflated setup times (caused by both controller processing and frequent rule updates in switches).

In practice, only a subset of network flows would require dynamic routing. Interactive applications in modern-day datacenters (e.g., web and OLTP) have stringent performance requirements in terms of delivered throughput and corresponding low tail latency (Dean and Barroso, [n. d.]). Parley (Jeyakumar et al., 2014) and (Lee et al., [n. d.]) for Wikipedia benchmark (wik, 2018) demonstrate that bandwidth provisioning has strong impact on tail latency. Based on results in § 5.3, we believe DeltaPath could offer an alternative scalable runtime mechanism to enforce bandwidth guarantees for interactive applications. We leave those investigations for future work.

Multipath routing. DeltaPath currently discovers all paths between pairs of nodes and uses the path selection function to only keep one. DeltaPath could support ECMP-like load balancing by simply adapting the implementation to keep all discovered paths. An alternative research direction which we currently pursuit extends DeltaPath’s execution model towards algorithms for disjoint shortest path (Suurballe, 1974).

7. Related work

Traffic engineering. Traffic engineering (TE) is an active and busy field of research. One approach, widely used in practice for its simplicity, is to orchestrate link weights of distributed protocols such as OSPF and its multipath enhancement ECMP (Chiesa et al., 2016; Fortz and Thorup, [n. d.]; Fortz et al., 2002). A drawback is long convergence time after failures or updates to link weights. Centralized network control can only partially alleviate the latency problem (Liu et al., [n. d.]), for which reason recent TE proposals compute sets of paths offline (Liu et al., 2016; Suchara et al., 2011), infrequently upon demand (Hong et al., [n. d.]) or topology change (Kumar et al., 2016), or in a hierarchical process starting at hosts (Kumar et al., [n. d.]). Our approach does not target TE design but complements it by enabling faster path selection, getting us closer to dynamic centralized TE.

Shortest path algorithms. In traditional routing, OSPF has long supported an incremental optimization (McQuillan et al., 1980) (as has IS-IS (Gredler and Goralski, 2005)). In contrast, the centralized computation and single network view in SDNs favors all-pair shortest path algorithms (APSP), where an incremental approach is harder. King (King, 1999) proposed the first incremental APSP algorithm to outperform from-scratch recomputation. Later, (Demetrescu and Italiano, 2001, 2003) show practicality for large networks.

Algorithmic optimizations from the database and data mining communities as those in (Akiba et al., 2014; D’Angelo et al., [n. d.]; Raghavendra et al., 2012) and (Hayashi et al., 2016)

can be additionally applied to our approach. Bi-directional heuristic search

(Kaindl and Kainz, [n. d.]) deserves exploration too.

SDN routing modules. Current SDN controller implementations lag behind the algorithms. Most SDN routing applications are based on pure single-source shortest path (SSSP) routing logic. SDN routing applications derived from academic projects (McCaule, 2016; Ryu, 2016) typically implement basic, unoptimized algorithms (Berde et al., 2014). ONOS (ONO, 2016) and OpenDaylight (Ope, 2016) are more modular and flexible (e.g. ONOS supports both Dijkstra and Suurballe’s disjoint shortest paths). In contrast, OpenMUL (Ope, 2017), another production-ready controller, implements APSP. All open source SDN routing modules we are aware of rely on non-incremental algorithms. We however show that an incremental approach has significant performance advantages.

Incremental graph processing. Incremental graph processing approaches already exist: Kineograph (Cheng et al., 2012) ingests graph updates in batches to construct a series of consistent graph snapshots. Chronos (Han et al., 2014) optimizes this using locality-aware scheduling across multiple graph snapshots. However, these batch-oriented systems are designed for high throughput but not low latency, a requirement for SDN route computation.

8. Conclusion

Representing routing computations as an online, incremental fixed-point computation on streams of network and policy updates is a promising option for scalable SDN controllers.

Combining this model with a high-performance implementation based on Timely Dataflow, our prototype SDN routing application for all-pairs shortest path, DeltaPath, delivers much higher performance than existing controllers, and shows that is feasible to maintain APSP data even under highly dynamic conditions such as frequent changes in link attributes.

DeltaPath’s performance shows that an execution model which recasts route calculations as incremental dataflow computation can change the design space for centralized datacenter network control.

References

  • (1)
  • Flo (2012) 2012. Floodlight is a Java-based OpenFlow controller. http://www.projectfloodlight.org/floodlight/. (2012).
  • ONO (2016) 2016. Open Network Operating System (ONOS). http://onosproject.org/. (2016).
  • Ope (2016) 2016. OpenDaylight: Open Source SDN Platform. https://www.opendaylight.org/. (2016).
  • Ryu (2016) 2016. Ryu SDN Framework. https://osrg.github.io/ryu/. (2016).
  • Cis (2017) 2017. Change of Default OSPF and IS-IS SPF and Flooding Timers and iSPF Removal. https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/211432-Change-of-Default-OSPF-and-IS-IS-SPF-and.html. (2017).
  • Jun (2017) 2017. Junos OS OSPF configuration guide. https://www.juniper.net/documentation/en_US/junos/information-products/pathway-pages/config-guide-routing/config-guide-ospf.pdf. (2017).
  • Bro (2017) 2017. Network OS Layer-3. Routing COnfiguration Guide. http://www.brocade.com/content/dam/common/documents/content-types/configuration-guide/nos-601-l3guide.pdf. (2017).
  • Ope (2017) 2017. Open MUL High Perfomance SDN. (2017).
  • fli (2018) 2018. Apache Flink. https://flink.apache.org/. (2018).
  • spa (2018) 2018. Apache Spark. https://spark.apache.org/. (2018).
  • wik (2018) 2018. WikiBench: Web hosting benchmark. http://www.wikibench.eu/. (2018). Accessed: 2018-05.
  • Akiba et al. (2014) Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2014. Dynamic and Historical Shortest-path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling. In Proceedings of the 23rd International Conference on World Wide Web (WWW ’14). 237–248.
  • Al-Fares et al. ([n. d.]) Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. [n. d.]. A Scalable, Commodity Data Center Network Architecture. SIGCOMM Comput. Commun. Rev. 38, 4 (Aug. [n. d.]), 63–74.
  • Anwer et al. (2015) Bilal Anwer, Theophilus Benson, Nick Feamster, and Dave Levin. 2015. Programming Slick Network Functions. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR ’15). Article 14, 13 pages.
  • Benson et al. (2010) Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC ’10). 267–280.
  • Berde et al. (2014) Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, and Guru Parulkar. 2014. ONOS: Towards an Open, Distributed SDN OS. In Proceedings of the Third Workshop on Hot Topics in Software Defined Networking (HotSDN ’14). 1–6.
  • Botelho et al. ([n. d.]) F. Botelho, A. Bessani, F. M. V. Ramos, and P. Ferreira. [n. d.]. On the Design of Practical Fault-Tolerant SDN Controllers. In 2014 Third European Workshop on Software Defined Networks. 73–78.
  • Cadenas et al. (2016) M. Cadenas, L. Sciacca, R. Alvizu, and G. Maier. 2016. A PERFORMANCE ANALYSIS TOOL FOR SDN CONTROLLERS: ONOS versus OPENDAYLIGHT comparison. ONOS/CORD Workshop. (2016).
  • Carbone et al. (2015) Paris Carbone, Gyula Fóra, Stephan Ewen, Seif Haridi, and Kostas Tzoumas. 2015. Lightweight asynchronous snapshots for distributed dataflows. arXiv preprint arXiv:1506.08603 (2015).
  • Casado et al. ([n. d.]) Martin Casado, Nate Foster, and Arjun Guha. [n. d.]. Abstractions for Software-defined Networks. Commun. ACM 57, 10 ([n. d.]), 86–95.
  • Cheng et al. (2012) Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12). 85–98.
  • Chiesa et al. (2016) Marco Chiesa, Gábor Rétvári, and Michael Schapira. 2016. Lying Your Way to Better Traffic Engineering. In Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies (CoNEXT ’16). 391–398.
  • Cisco (2003) Cisco. 2003. Cisco IOS Software Releases 12.0 S. https://www.cisco.com/c/en/us/td/docs/ios/12_0s/feature/guide/ospfispf.html. (2003).
  • consortium (2016) ODL consortium. 2016. OpenDaylight Performance: A Practical, Empirical Guide. https://www.opendaylight.org/resources/odl-performance. (2016).
  • Crawley et al. (1998) E. Crawley, R. Nair, B. Rajagopalan, and H. Sandick. 1998. A Framework for QoS-based Routing in the Internet. (1998).
  • D’Angelo et al. ([n. d.]) Gianlorenzo D’Angelo, Mattia D’Emidio, and Daniele Frigioni. [n. d.]. Distance Queries in Large-Scale Fully Dynamic Complex Networks. Springer International Publishing, Cham, 109–121.
  • Dean and Barroso ([n. d.]) Jeffrey Dean and Luiz André Barroso. [n. d.]. The Tail at Scale. Commun. ACM 56, 2 (Feb. [n. d.]), 74–80.
  • Demetrescu and Italiano (2001) Camil Demetrescu and Giuseppe F. Italiano. 2001. Fully Dynamic All Pairs Shortest Paths with Real Edge Weights. In In IEEE Symposium on Foundations of Computer Science. 260–267.
  • Demetrescu and Italiano (2003) Camil Demetrescu and Giuseppe F. Italiano. 2003. A New Approach to Dynamic All Pairs Shortest Paths. In

    Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing

    (STOC ’03). 159–166.
  • Erickson (2013) David Erickson. 2013. The Beacon Openflow Controller. In Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN ’13). 13–18.
  • Fortz et al. (2002) B. Fortz, J. Rexford, and M. Thorup. 2002. Traffic engineering with traditional IP routing protocols. IEEE Communications Magazine 40, 10 (Oct 2002), 118–124.
  • Fortz and Thorup ([n. d.]) B. Fortz and M. Thorup. [n. d.]. Internet traffic engineering by optimizing OSPF weights. In Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), Vol. 2. 519–528 vol.2.
  • Foster et al. (2011) Nate Foster, Rob Harrison, Michael J Freedman, Christopher Monsanto, Jennifer Rexford, Alec Story, and David Walker. 2011. Frenetic: A network programming language. In ACM Sigplan Notices, Vol. 46. ACM, 279–291.
  • Gandhi et al. (2016) Rohan Gandhi, Y. Charlie Hu, and Ming Zhang. 2016. Yoda: A Highly Available Layer-7 Load Balancer. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys ’16). Article 21, 16 pages.
  • Govindan et al. ([n. d.]) Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. [n. d.]. Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure. In Proceedings of the 2016 ACM SIGCOMM Conference. 58–72.
  • Gredler and Goralski (2005) H. Gredler and W. Goralski. 2005. The Complete IS-IS Routing Protocol (1 ed.). Springer-Verlag London.
  • Guck et al. (2016) J. W. Guck, M. Reisslein, and W. Kellerer. 2016. Function Split Between Delay-Constrained Routing and Resource Allocation for Centrally Managed QoS in Industrial Networks. IEEE Transactions on Industrial Informatics 12, 6 (Dec 2016), 2050–2061.
  • Han et al. (2014) Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: A Graph Engine for Temporal Graph Analysis. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys ’14). Article 1, 14 pages.
  • Hayashi et al. (2016) Takanori Hayashi, Takuya Akiba, and Ken-ichi Kawarabayashi. 2016. Fully Dynamic Shortest-Path Distance Query Acceleration on Massive Networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16). 1533–1542.
  • He et al. ([n. d.]) Keqiang He, Eric Rozner, Kanak Agarwal, Wes Felter, John Carter, and Aditya Akella. [n. d.]. Presto: Edge-based Load Balancing for Fast Datacenter Networks. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. [n. d.]), 465–478.
  • Hong et al. ([n. d.]) Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. [n. d.]. Achieving High Utilization with Software-driven WAN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. 15–26.
  • Jacques-Silva et al. (2016) Gabriela Jacques-Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, and Ahmet Erdem Sariyüce. 2016. Consistent regions: guaranteed tuple processing in IBM streams. Proceedings of the VLDB Endowment 9, 13 (2016), 1341–1352.
  • Jeyakumar et al. (2014) Vimalkumar Jeyakumar, Abdul Kabbani, Jeffrey C. Mogul, and Amin Vahdat. 2014. Flexible Network Bandwidth and Latency Provisioning in the Datacenter. Technical Report. http://arxiv.org/abs/1405.0631
  • Kaindl and Kainz ([n. d.]) Hermann Kaindl and Gerhard Kainz. [n. d.]. Bidirectional Heuristic Search Reconsidered. J. Artif. Int. Res. 7, 1 (Dec. [n. d.]), 283–317.
  • Katta et al. (2016) Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. HULA: Scalable Load Balancing Using Programmable Data Planes. In Proceedings of the Symposium on SDN Research (SOSR ’16). Article 10, 10:1–10:12 pages.
  • King (1999) Valerie King. 1999. Fully Dynamic Algorithms for Maintaining All-Pairs Shortest Paths and Transitive Closure in Digraphs. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS ’99). 81–.
  • Kumar et al. ([n. d.]) Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Björn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, and Amin Vahdat. [n. d.]. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM ’15). 1–14.
  • Kumar et al. (2016) Praveen Kumar, Yang Yuan, Chris Yu, Nate Foster, Robert D. Kleinberg, and Robert Soulé. 2016. Kulfi: Robust Traffic Engineering Using Semi-Oblivious Routing. CoRR abs/1603.01203 (2016). arXiv:1603.01203
  • Lee et al. ([n. d.]) Jeongkeun Lee, Yoshio Turner, Myungjin Lee, Lucian Popa, Sujata Banerjee, Joon-Myung Kang, and Puneet Sharma. [n. d.]. Application-driven Bandwidth Guarantees in Datacenters. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM ’14). 467–478.
  • Liu et al. ([n. d.]) Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, and David Gelernter. [n. d.]. Traffic Engineering with Forward Fault Correction. In Proceedings of the 2014 ACM Conference on SIGCOMM. 527–538.
  • Liu et al. (2016) Y. Liu, D. Niu, and B. Li. 2016. Delay-Optimized Video Traffic Routing in Software-Defined Interdatacenter Networks. IEEE Transactions on Multimedia 18, 5 (May 2016), 865–878.
  • Loo et al. (2006) Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E Gay, Joseph M Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative networking: language, execution and optimization. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 97–108.
  • Ma and Steenkiste ([n. d.]) Qingming Ma and P. Steenkiste. [n. d.]. On path selection for traffic with bandwidth guarantees. In Proceedings 1997 International Conference on Network Protocols. 191–202.
  • McCaule (2016) Murphy McCaule. 2016. POX. https://openflow.stanford.edu/display/ONL/POX+Wiki. (2016).
  • McQuillan et al. (1980) John M. McQuillan, Ira Richer, and Eric C. Rosen. 1980. The New Routing Algorithm for the ARPANET. IEEE TRANSACTIONS ON COMMUNICATIONS 28, 5 (1980).
  • McSherry (2018) Frank McSherry. 2018. Differential Dataflow in Rust. https://github.com/frankmcsherry/differential-dataflow. (2018).
  • McSherry et al. (2013) Frank McSherry, Derek Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential dataflow. In Proceedings of CIDR 2013.
  • Murray et al. (2013) Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). 439–455.
  • Nagaraj et al. ([n. d.]) Kanthi Nagaraj, Dinesh Bharadia, Hongzi Mao, Sandeep Chinchali, Mohammad Alizadeh, and Sachin Katti. [n. d.]. NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference. 188–201.
  • Narváez et al. ([n. d.]) Paolo Narváez, Kai-Yeung Siu, and Hong-Yi Tzeng. [n. d.]. New Dynamic Algorithms for Shortest Path Tree Computation. IEEE/ACM Trans. Netw. 8, 6 (Dec. [n. d.]), 734–746.
  • Raghavendra et al. (2012) Ramya Raghavendra, Jorge Lobo, and Kang-Won Lee. 2012. Dynamic Graph Query Primitives for SDN-based Cloudnetwork Management. In Proceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN ’12). 97–102.
  • Roy et al. ([n. d.]) Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. [n. d.]. Inside the Social Network’s (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM ’15). 123–137.
  • Singh et al. (2015) Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network. In Sigcomm ’15.
  • Singla et al. (2012) Ankit Singla, Chi-Yao Hong, Lucian Popa, and P. Brighten Godfrey. 2012. Jellyfish: Networking Data Centers Randomly. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 17–17.
  • Soulé et al. (2014) Robert Soulé, Shrutarshi Basu, Parisa Jalili Marandi, Fernando Pedone, Robert Kleinberg, Emin Gun Sirer, and Nate Foster. 2014. Merlin: A language for provisioning network resources. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies. ACM, 213–226.
  • Suchara et al. (2011) Martin Suchara, Dahai Xu, Robert Doverspike, David Johnson, and Jennifer Rexford. 2011. Network Architecture for Joint Failure Recovery and Traffic Engineering. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 97–108.
  • Suurballe (1974) J. W. Suurballe. 1974. Disjoint paths in a network. Networks 4, 2 (1974), 125–145.
  • Tomovic and Radusinovic ([n. d.]) S. Tomovic and I. Radusinovic. [n. d.]. Fast and efficient bandwidth-delay constrained routing algorithm for SDN networks. In 2016 IEEE NetSoft Conference and Workshops (NetSoft). 303–311.
  • Tootoonchian and Ganjali ([n. d.]) Amin Tootoonchian and Yashar Ganjali. [n. d.]. HyperFlow: A Distributed Control Plane for OpenFlow. In Proceedings of the 2010 Internet Network Management Conference on Research on Enterprise Networking (INM/WREN’10). 3–3.
  • Vissicchio et al. (2015) Stefano Vissicchio, Olivier Tilmans, Laurent Vanbever, and Jennifer Rexford. 2015. Central control over distributed routing. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 43–56.
  • Widyono and Group (1994) Ron Widyono and Tenet Group. 1994. The design and evaluation of routing algorithms for real-time channels. (1994).
  • Younis and Fahmy (2003) O. Younis and S. Fahmy. 2003. Constraint-based routing in the internet: Basic principles and recent research. IEEE Communications Surveys Tutorials 5, 1 (Third 2003), 2–13.
  • Zave et al. (2017) Pamela Zave, Ronaldo A. Ferreira, Xuan Kelvin Zou, Masaharu Morimoto, and Jennifer Rexford. 2017. Dynamic Service Chaining with Dysco. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). 57–70.