It is desirable for networks to be resilient in the face of link failures. However, naive methods for generating routing protection schemes that account for congestion have complexity that grows combinatorially with the number of failures to protect against. That is, if a network has links and protection is required for up to link failures, then there are possible failure scenarios to plan for. For and , this number is ; for , it is . This scaling behavior precludes brute-force approaches to resilient traffic engineering. Furthermore, planning for the combinatorally large number of scenarios should be coordinated in such a way to minimize disruptions to the traffic pattern when new failures occur. Both of these issues make it clear that optimizing traffic routing with respect to individual failure scenarios is an inadequate basis for a traffic engineering strategy.
This challenge was addressed in  using R3, a congestion-avoiding routing reconfiguration framework that is resilient under multiple failures. The basic idea behind R3 is to account for all possible failure scenarios within a single optimization problem by adding “virtual” traffic corresponding to the capacity of links that might fail. This converts uncertainty in network topology into uncertainty in traffic. A base routing that optimizes maximum link utilization is solved for along with a protection routing that encodes detours in the presence of link failures. As links fail, and are updated using a handful of simple arithmetic operations, and traffic is rerouted accordingly. The simplicity of the updates minimizes network losses and latency. Meanwhile, in the background a solver can continuously monitor the current network connectivity and solve for optimal base and protection routings to replace the near-optimal updates as network stability permits.
R3 enjoys several theoretical guarantees regarding congestion avoidance, optimality, and the order of link failures. Perhaps more importantly, it is also efficient in practice, where the theoretical requirements for these guarantees do not typically hold. For example, a single node on the network periphery may be isolated with fewer than failures, but the traffic pattern that R3 generates will not be adversely affected by this degeneracy.
Proactive alternatives to R3 were proposed in  and : however, their reliance on predicted traffic demand adds an element of uncertainty that R3 avoids. Furthermore, these alternatives do not offer the theoretical guarantees of R3.
we establish more formal notations and definitions for the basic quantities of interest to R3, and we demonstrate how basic routing constraints can be effectively formulated using tensor product structure. We continue this approach in §IV
, conveniently giving the linear program embodying the offline precomputation for R3 in explicit matrix form. We address technicalities arising in the adaptation of R3 to wireless networks in §V. In §VI we move on to understand point-to-multipoint communications of the sort prevalent in wireless networks before introducing the corresponding generalization of R3 in §VII.
Ii Informal overview of R3
Let be a directed multigraph modeling the network topology: network routers correspond to vertices in , and network links are represented by directed edges in . It will be convenient to write and respectively for the source and target (sink) of a link . Let with zero diagonal (i.e., ) be the traffic demand and write . Let be the link capacity. If and , then the value of a base routing specifies the fraction of traffic with origin and destination that traverses the link . Thus the total amount of traffic on link is . More generally, a routing (defined formally in (5) below) is any function from to that satisfies natural constraints corresponding to conservation, totality, and global acyclicity of flow.
In the R3 framework, the capacitated topology and demand are given along with a number of allowed link failures. A base routing and protection routing are derived to ensure congestion-free traffic flow under link failures if sufficient connectivity exists.
The protection routing has the particular requirement that its nontrivial origin/destination pairs are of the form , and it encodes weighted alternative paths from to . Thus when link fails, the remaining paths from to can be reweighted and used in place of . This reconfiguration (which applies to both and ) only requires simple arithmetic operations and can be applied essentially instantaneously once a link failure is detected. Meanwhile, a background process can continuously solve for base and protection routings for the current topology and number of remaining allowed link failures to smoothly transition from optimal pre-planned routes to routes that are optimal for the actual current failures and residual possible failures.
To plan for arbitrary link failures, we use the rerouting virtual demand set . Each point corresponds to a potential load on the network that saturates no more than links on its own. In principle and could be obtained by solving the constrained optimization problem
This optimization requires that the sum of actual and maximum virtual traffic not exceed the link capacity times the maximum link utilization . So long as the objective , congestion-free routing is possible under link failures (and frequently in practice this works nicely even if failures can partition the network, since the online reconfiguration can remove unreachable demands).
In practice, the form of the optimization problem above is not immediately useful. However, it can be transformed into an equivalent linear program using the duality theorem. We elaborate on this transformation and the actual linear program we work with in §IV. The solution time varies only indirectly with , though for larger values more redundancy is demanded of a solution and routing performance will necessarily be affected. Thus the value of chosen should reflect some specific planning consideration.
With and in hand, traffic can be routed using and reconfigured using both and as follows. If link fails, we reconfigure according to
This simple update rule is also applied for subsequent failures and yields essentially instantaneous rerouting.
There are three major subtleties in the offline configuration phase of R3 in which the base routing and protection routing are computed that are not addressed in . The first of these subtleties is the intricate indexing required in setting up the key linear program. The second and third are related to parallel links and the preservation of routing constraints. These are respectively tackled by judicious use of tensor algebra in §III and §IV, a topology virtualization step that uses virtual nodes to eliminate parallel links (necessary for the self-consistency of the framework) combined with load evaluation as detailed in §V-A, and auxiliary techniques as mentioned in §V-B.
Finally, R3 was developed for wired network backbones: however, we have extended the approach in such a way that it can apply to networks with both wired and wireless connections. The key is to impose an additional constraint that ties the capacity of a wireless transmitter to a point-to-multipoint connection incorporating multiple links.
Iii Basic routing constraint
A function , written , is called a (flow representation of a) routing if the following conditions are satisfied for all :
Here in (5b) indicates that are all distinct, and that is neither a source nor a target of .
We note that  ignores the requirement in (5b) that should not be a source or target of [i.e., that should have positive in- and out-degrees], omits (5d) and (5f), and notationally suggests that there are no parallel links: however, all of these modifications are self-evidently desirable, not least in that they avoid degeneracies and manifestly enforce symmetry. That said, it may be desirable for the sake of computational efficiency to omit (5d) and (5f).
It turns out to be useful to deal with a weaker notion than a routing. For instance, a routing for the graph in Figure 1 must take spurious nonzero values.
Although in most respects such spurious values are harmless, they also involve equations to pointlessly solve and they complicate our understanding. As such we mention the weaker notion of a semirouting, in which (5) is satisfied only for such that there are paths in from to and from to , and such that and . A restricted semirouting that identically takes the value zero on triples not of this form is also useful to consider. That said, we restrict ourselves to routings in the rest of this paper.
Much of the effort in setting up a more useful equivalent of (1) is tied to intricate indexing that some basic tensor algebra can clarify. Without loss of generality, let and , so that and . Let denote the
th standard basis vector in: then , where as usual denotes the tensor product. Introduce generic vectors
and a scalar corresponding to the (actual plus virtual) maximum link utilization as building blocks for
Here we recall that direct sum of and is , so that .
A similar equation
encodes the requirement that be a routing: here denotes the entrywise product. Let ind_R be an array formed by stacking rows in lexicographic order. The following MATLAB snippet indicates how to obtain and :
% L is a Nx2 array of link sources and targets P = ; ind_P = ; sigma = zeros(size(R,1),1); for ell = 1:size(L,1) ind = ismember(ind_R(:,1:2),L(ell,:),’rows’); P = [P,R(:,ind)]; ind_P = [ind_P;ind_R(ind,:)]; sigma = sigma+any(R(:,ind),2); end
The specification of (up to signs of rows that are irrelevant and may be chosen freely) and can be completed by proceeding through the scalar equations of (5) in order and subsequently eliminating trivial or redundant equations in the order they are encountered, so that scalar equations remain, i.e. is a matrix and is a vector of dimension . The bound on arises as follows: (5a) gives scalar equations; (5b) gives scalar equations (ignoring the possibility of sources/targets); (5c) and (5d) each give scalar equations, and (5e) and (5f) each give scalar equations.
The term in (1) is the optimal objective of subject to and , where , , , and . This optimal objective is the same as that of the dual linear program subject to and .
Writing , this dual linear program is (after some trivial rearrangements)
From here we immediately get the R3LP linear program (for a fixed positive integer)
Note that (12) has obvious variants called in which semiroutings and restricted semiroutings are considered instead.
The remaining details are as follows. Let , , and denote the column vectors with entries all equal to , , or , respectively; we may also write, e.g., , where is a matrix with all entries equal to zero. Define the block matrices
where , is the
-dimensional identity matrix,, denotes the diagonal operation, and is an involutory permutation matrix of dimension that effectively swaps link indices à la and that is conveniently defined as follows:
Writing and , R3LP takes the MATLAB-ready form
V-a Dealing with parallel links
There is no problem with defining when there are parallel links. However, there is a serious but subtle problem with defining that is manifested by components of that are structurally forced to be equal.
Note that the source/target pairs are distinct iff there are no parallel links. In this case only we can regard as a subset of . In the event that there are parallel links, the notion of a “protection routing” as embodied by becomes either ill-defined (unless all parallel links have the same capacity) or useless (since parallel links need not have the same capacity).
That is, we must regard as a function on or on . Both cases can apply if there are no parallel links, since then there is a bijection between and the set of unique source/target pairs , and we can regard as a function on which is zero outside of . But if there are parallel links and only the first case applies, then the expression cannot be assigned a consistent meaning unless it takes the same value for all parallel links . But this is essentially the second case, and then the notion of the protection routing generally becomes useless, since there is then no way to completely account for parallel links with different capacities. The inextricability of the protection routing and link capacities is also latent in the matrix formulation of §IV, which turns out to rest in an essential way on interpreting as a function on .
In trying to cut this Gordian knot, the obvious tactic is to insert virtual vertices and links. However, this turns out to introduce new problems. For instance, suppose that every parallel link is split into two links joined at a virtual vertex. Then while this eliminates any internal inconsistency associated with , it also introduces a degeneracy into R3LP that forces , obliterating the non-congestion guarantee for that is at the heart of R3. Furthermore, experiments (not detailed here) show that removing constraints (12b) and (12c) associated with either the “outgoing half” or “incoming half” of the new links does not fix this problem (which turns out to be due to entries of the form that can be ignored when suitable care is taken).
It seems unlikely that more elaborate virtual topology schemes (e.g., splitting vertices) would succeed where the one sketched above fail. In any event, we have searched for but have not found such a scheme that works. Additionally, while it is conceivable that simultaneously fusing parallel links and altering the rerouting virtual demand set in  could be done in such a way as to address the case of failures, it seems unlikely that such a strategy could ever work for .
The underlying degeneracy that is introduced by topology virtualization turns out to be protection routing values of the form . As  points out, for on the original topology, an equality
“implies that link  carries no actual demand from OD pairs or virtual demand from links other than . So link  does not need to be protected and can be safely ignored.”
With this in mind, we can evaluate the maximum load, given as the optimal objective to
with protection routing values of the form either left unchanged or reset to zero, and compare these results with the dual objective .
In practice, values include constributions from ignorable diagonal protection routing values, and properly accounting for such cases after a topology virtualization allows us to recapture guarantees of congestion-free routing.
V-B Preservation of routing constraints
It turns out that the reconfiguration scheme of  does not actually enforce (5). It is clear that (5a) continues to hold and easy to show (using the fact that the original base and protection routings satisfy (5b)) that (5b) also continues to hold. But (5c), (5d), (5e), and (5f) do not automatically continue to hold. In fact, it is not hard to construct an example in which traffic is routed along a cycle after reconfiguration.
Though this problem is irksome, it is not critical: auxiliary techniques (e.g., forwarding only once, flow decomposition, or prohibiting turns ) can ameliorate it, and like the reconfiguration as a whole, it is a transient issue that lasts only until a new base routing can be solved for. It is also plausible that additional constraints along the lines of might circumvent the problem altogether.
Vi Wireless R3
A formalism for wireless networks requires the capability to describe point-to-multipoint (P2MP) transmission. 111 Multipoint-to-point reception can be described in an obviously similar way, but is not treated here. Towards this end, we introduce some notation before giving a toy example. Let be the set of links with source vertex . For , let be a surjective function: for each , the preimage is the set of links belonging to the th P2MP group at vertex . A group of cardinality 1 corresponds to a dedicated point-to-point transmission.
Noting that and writing , we can summarize the additional structure for P2MP transmission in the commutative diagram (i.e., a digraph with edges labeled by functions such that function compositions corresponding to paths with the same source and target give the same results)
where here indicates a generic inclusion. The group capacity is given in terms of a family of vertex-specific maps via and for we have .
We illustrate §VI-A with an example. Figure 2 depicts the underlying digraph and P2MP groups of a network in which the communications between three fixed terrestrial nodes, a ship, a plane, and a satellite are cariacatured.
By inspection, we have , , , , , and . Assuming (by default) that in the absence of parallel links the link indices correspond to the lexicographical ordering of source/target pairs, the maps are given (without loss of generality) by
The lexicographic ordering on links carries over to elements of , and .
Vi-C Wireless constraint
Absent parallel links, the additional constraint imposed by wireless communications can now be written down:
where and to avoid redundancy.
Vi-D Example 2
The presence of parallel links introduces additional intricacy which we illustrate through an example. Consider as in the left panel of Figure 3.