1 Introduction
Communication networks have become a critical infrastructure of our digital society, and the performance requirements on networks are becoming increasingly stringent. Especially inside datacenters, traffic is currently growing explosively. This is due to the popularity of datacentric applications and AI [singh2015jupiter], but for example also the trend toward resource disaggregation (requiring fast access to remote resources such as GPUs) and hardwaredriven and distributed training [li2019hpcc]. Accordingly, over the last years, great efforts have been made to improve the performance of datacenter networks [kellerer2019adaptable].
A particularly innovative approach to improve datacenter network performance, is to adjust the datacenter topologies towards the workload they serve, in a dynamic and demandaware manner. Such adjustments are enabled by emerging reconfigurable optical communication technologies, such as optical circuit switches, which provide dynamic matchings between racks [sirius, zhou2012mirror, kandula2009flyways, rotornet, opera, helios, firefly, megaswitch, quartz, chen2014osa, projector, cthrough, splaynet, venkatakrishnan2018costly, schwartz2019online, proteus, osa, 100times, fleet, osn21, griner2021cerberus]. Indeed, empirical studies have shown that datacenter traffic features much spatial and temporal structure [sigmetrics20complexity, benson2010network, roy2015inside], which may be exploited for optimization.
This paper studies the optimization problem underlying such reconfigurable datacenter networks. In particular, we consider a typical leafspine datacenter network where a set of racks are interconnected by optical circuit switches, each of which provides one matching between topofrack switches, so matchings in total.
1.1 The Model
Our problem can be modeled as an online dynamic version of the classic matching problem [anstee1987polynomial]. In this problem, each node can be connected with at most other nodes (using optical links), which results in a matching.
Input. We are given an arbitrary (undirected) static weighted and connected network on the set of nodes connected by a set of nonconfigurable links : the fixed network. Let be the set of all possible unordered pairs of nodes from . For any node pair , we call and the endpoints of , and we let denote the length of a shortest path between nodes and in graph . Note that and are not necessarily directly connected in .
The fixed network can be enhanced with reconfigurable links, providing a matching of degree : Any node pair from may become a matching edge (such an edge corresponds to a reconfigurable optical link), but the number of matching edges incident to any node has to be at most , for a given integer .
The demand is modeled as a sequence of communication requests^{1}^{1}1A request could either be an individual packet or a certain amount of data transferred. This model of a request sequence is often considered in the literature and is more finegrained than, e.g., a sequence of traffic matrices. revealed over time, where .
Output and objective. The task is to schedule the reconfigurable links over time, that is, to maintain a dynamically changing matching . Each node pair from is called a matching edge and we require that each node has at most incident matching edges. We aim to jointly minimize routing and reconfiguration costs, defined below.
Costs. The serving cost (i.e., routing cost) for a request depends on whether and are connected by a matching edge. In our model, a given request can either only take the fixed network or a direct matching edge (i.e., routing is segregated [projector]). If , the requests are routed exclusively on the fixed network, and the corresponding cost is (shorter paths imply smaller resource costs, i.e., lower “bandwidth tax” [rotornet]). If , the request is served by the matching edge, and the routing costs 0 (note that this is the most challenging cost function: our result only improves if this cost is larger).
Once the request is served, an algorithm may modify the set of matching edges: reconfiguration costs per each node pair added or removed from the matching . (The reconfiguration cost and time can be assumed to be independent of the specific edge.)
Online algorithms. A (randomized) algorithm Onl is online if it has to take decisions without knowing the future requests (in our case, e.g., which edge to include next in the matching and which to evict). Such an algorithm is said to be competitive [BorEl98] if there exists a constant such that for any input instance , it holds that
where is the cost of the optimal (offline) solution for and is the expected cost of algorithm Onl on . The expectation is taken over all random choices of Onl, i.e., the input itself is worstpossible, created adversarially. It is worth noting that can depend on the parameters of the network, such as the number of nodes, but has to be independent of the actual sequence of requests. Hence, in the long run, this additive term becomes negligible in comparison to the actual cost of online algorithm Onl.
Generalization to (,)matching problem. The comparison of an online algorithm to the fully clairvoyant offline solution may seem unfair. Therefore, we consider our matching problem in a generalized scenario, denoted matching. In such a setting, the restrictions imposed on an online algorithm remain unchanged: the number of matching edges incident to any node has to be at most . However, the optimal solution is more constrained: the number of matching edges has to be at most for any node. Similar settings are studied frequently in the literature, as resource augmentation models, e.g., in the context of paging and caching [SleTar85, Young91, Young94, BansalBN12] or scheduling problems, see, e.g., [ChadhaGKM09, KalyanasundaramP00].
1.2 Our contributions
Motivated by the problem of how to establish topological shortcuts in datacenter networks supported by optical switches, we present a randomized online algorithm for the matching problem, which achieves a competitive ratio of , which is asymptotically optimal: we show a lower bound of by a simple reduction. The best deterministic online algorithm for this problem is only competitive. Our analysis relies on a reduction to a uniform case (where and all path lengths are equal to ), which allows us to avoid delicate charging arguments and enables a simplified analytical treatment.
We show that our randomized algorithm is not only better than the deterministic online algorithm in theory, in terms of the worstcase competitive ratio, but also attractive in practice: our empirical results, based on various realworld traces, show that our algorithm is significantly faster while achieving roughly the same matching quality.
As a contribution to the research community, to facilitate followup work, and to ensure reproducibility, we will make all our implementations available as opensource together with this paper.
1.3 Organization
2 Algorithm and Analysis
We first define the uniform variant of the matching. There, the distance between each pair of nodes is (i.e., for any ) and (recall that is the cost of adding or removing a matching edge to ).
In the following, let
and 
We first show that it suffices to solve the uniform variant of the problem: once we do this, we can get an algorithm for the general variant, losing an additional factor of . We note that in all practical applications is by several orders of magnitude greater than , and thus is close to .
Afterwards, we show a simple algorithm that uses known algorithms for the paging problem to solve the uniform variant of the matching. This will yield a deterministic competitive algorithm and a randomized competitive one for the general variant of the matching problem.
2.1 Reduction to uniform case
We now show a reduction to the uniform case.
Theorem 1
Assume there exists a (deterministic or randomized) competitive algorithm for the variant of the matching problem where . Then there exists a (respectively, deterministic or randomized) competitive algorithm Alg for the matching problem for an arbitrary integer .
Let be the input for algorithm Alg. Alg creates (in an online manner) another input instance . For any node pair let
For any node pair independently, we call each th request to in special; the remaining ones are called ordinary. Now contains only special requests, and furthermore, for , we assume that . Alg simply runs on and repeats its reconfiguration choices (modifications of the matching ). That is, changes of are performed by Alg only upon the th occurrence of a given node pair .
To show the theorem, it suffices to show that the following inequalities hold for some value independent of the input .
Showing the first inequality. We fix any node pair and we analyze the cost pertaining to handling both by Alg and . We partition into disjoint intervals, so that the first interval starts at the beginning of the input sequence, and each interval except the last one ends at the special request to inclusively. (We note that the partition differs for different choices of .) We look at any nonlast interval and its counterpart in the input . Note that except requests to and one request to in , these intervals may contain also requests to other node pairs. We now compare the costs pertaining to in incurred on Alg (denoted ), to the costs pertaining to in incurred on (denoted ). We consider two cases.

If , then must have in when it is requested at the end of , and moreover, it cannot perform any reconfigurations that touch . This means that it must have in right after the special request to preceding , and keep in throughout the whole considered interval . As Alg is mimicking the choices of , it has in during , and thus all requests to in are free for Alg and as well.

Otherwise, let . As performed at most reconfigurations concerning in , so does Alg in . Furthermore, Alg pays at most for requests to ( for each). We note that
randomized and thus .
The first inequality follows by summing over all (nonlast) intervals and all possible choices of . The term upperbounds the total cost in the last intervals: there are of them, and in each the cost is at most .
Showing the second inequality. This one follows immediately as is competitive on input .
Showing the third inequality. We again fix any edge and consider the same partitioning into intervals as above. This time, however, on the basis of solution , we construct an offline solution Off for the input by mimicking all reconfiguration choices of Opt on input . Note that contains only a subset of requests from and thus, in response to a single request in , Off may react with a sequence of reconfigurations that are redundant (i.e., remove some edge from and then fetch it back). However, as the matching that Off maintains is the same as that of Opt, it is feasible.
We now argue that for any interval in and the corresponding interval in , it holds that .

If , then Opt performs no reconfigurations concerning . In this case Opt has to pay for all requests to within or none of them. The first case would imply that its cost is at least , and thus the only possibility is that it does not pay for any request. In this case, Off does not perform any reorganizations pertaining to either and does not pay for the only request to in . Hence, .

Otherwise, let . Opt performs at most reconfigurations pertaining to . Off executes the same reconfigurations, and hence it pays at most for reconfigurations in pertaining to and at most for the only request to in . Thus, .
The third inequality now follows by summing the relation over all intervals (including the last ones) and node pairs , and observing that the optimal cost for can be only smaller than the cost of Off on .
Combining all three inequalities. By combining all three inequalities, we obtain that
which concludes the proof.
2.2 Algorithm for uniform case
Below we present a randomized algorithm , which solves the uniform variant of the problem (where for all and ). By Theorem 1, this yields an algorithm for the general variant (arbitrary and values) with asymptotically the same competitive ratio.
Let the paging problem be a resourceaugmented variant of the paging problem [SleTar85] where an online algorithm has cache of size and optimal algorithm has a cache of size .
Theorem 2
Assume there exists a (deterministic or randomized) competitive algorithm for the paging problem for some function . Then, there exists an (also deterministic or randomized) competitive algorithm for the uniform variant of the matching problem.
We note that the cost model in the paging problem as defined in many papers (see, e.g., [SleTar85, FKLMSY91, Young91]) differs slightly from ours in two aspects:

In the paging problem, whenever a page is requested, it must be fetched to the cache if it is not yet there (bypassing is not allowed). In contrast, in the matching problem an algorithm does not have to include the requested node pair in the matching.

In the paging problem, an algorithm pays only for including page in the cache (there is no cost for eviction as in our model). Because bypassing is not allowed, the paging problem does not include the cost of serving a request either.
We will handle these differences in our proof.
Using any competitive deterministic algorithm for paging (e.g., LRU or FIFO [SleTar85]), Theorem 2 yields an competitive deterministic algorithm for the matching problem.
Using a competitive randomized algorithm for the paging problem [Young91] (better constant factors were achieved when [FKLMSY91, McGSle91, AcChNo00]) yields an competitive randomized algorithm for the matching problem.
Algorithm definition. Our algorithm for the matching problem runs separate paging algorithms for each node; initially the caches corresponding to all vertices are empty. dynamically creates input sequences for the paging algorithms running at particular vertices. An input for the algorithm run at node is a sequence of node pairs having as one of their endpoints defined below. At all times, the paging algorithm keeps a subset of at most such node pairs in its cache. On the basis of the (possibly random) decisions at each node, constructs its own solution, choosing which node pairs are kept as matching edges in .
More precisely, whenever handles a request , it passes query to the paging algorithms running at its endpoints: separately to node and to node . By the definition of the paging problem, both these vertices may reorganize their caches, removing an arbitrary number of elements (node pairs) from their caches, and afterwards they need to fetch to their caches (if it is not already there).
On this basis, reorganizes the matching maintaining the following invariant:
Any node pair is kept in the matching if and only if is in the caches of both its endpoints.
Therefore, when is requested, the actions of are limited to the following: (i) some node pairs having one of or as endpoints may be removed from , (ii) becomes a matching edge in (if it was not already there).^{2}^{2}2Note that there are cases where algorithm has less matching edges than allowed by the threshold . While this does not hinder theoretical analysis, it is worth noting that having an edge in the matching can only help us. Thus, in our experiments the removals are lazy, i.e., an edge is marked for removal and then some marked edges are pruned whenever their number incident to a node exceeds . We note that our reduction itself is purely deterministic, but may result in a randomized algorithm if we use randomized algorithms for solving paging subproblems at vertices.
Bounding competitive ratio. Now we proceed with showing the desired competitive ratio of .
[of Theorem 2]
Fix any input instance for the matching problem. It induces paging instances for each node: we denote the instance at node by . Let be the instance of online paging algorithm run at . By the theorem assumptions, for any node it holds that
(1) 
where is a constant independent of the input sequence and denotes the optimal paging solution for input .
We will show the following relations.

.

We note that we do not aim at optimizing the constants here, but rather at the simplicity of the description. Clearly, combining these two inequalities with (1) immediately yields the theorem.
We start with proving the second inequality. On the basis of the optimal solution for the matching problem , we first create an online solution for the same problem, but having the property that right after a node pair is requested it is included in the matching. We call this forcing property. This can be achieved by changing Opt decisions in the following way: whenever we have a request such that is not in the matching neither directly before or after serving the request, we modify the solution in the following way: after executing Opt reorganizations (if any), we include in the matching. If such a modification causes the degree of matching at to exceed , we evict an arbitrary edge incident to . We perform an analogous action for , evicting edge if necessary. Afterwards, right after serving the next request, but before implementing reorganizations of Opt, we revert these changes: remove from and include and back into . Note that as for a request to for which Opt paid , the solution of Off adds at most updates of the matching at the total cost of .
Now, we observe that for any node , the solution naturally induces feasible solutions for paging problem inputs : these solutions simply repeat all actions of Off pertaining to edges whose one endpoint is equal to . By the forcing property of Off, the solutions are feasible: upon request to node pair , it is fetched to the cache of . Furthermore, , as inclusion of to in the solution of Off leads to two fetches in the solutions of and . (A more careful analysis including the evictions costs that are not present in the paging problem would show that and can differ at most by an additive constant independent of the input).
Finally, we observe that an optimal solution for can only be cheaper than and thus .
We now proceed to prove the first inequality. When a requested node pair is in matching , then does nothing and no cost is incurred on or on any of algorithms ; hence, we may assume that . In such a case, either is not in the cache of , or is not in the cache of , or is in neither of these two caches. That is, either or , or both must fetch to their caches, which causes to grow by or . Moreover, paging algorithms running at and may evict some number of node pairs from their caches (these actions are free in the paging problem). In effect, pays for the request (at cost 1), includes in the matching (at cost ) and removes some number of edges from .
Let be the cost of on neglecting the cost of removals of edges from . The analysis above showed that increases of by can be charged to the increases of by at least , and thus . The proof of the first inequality follows by noting that the total number of removals from is at most the total number of inclusions to , and thus .
2.3 Lower bound
We now show that our algorithm is asymptotically optimal by proving that the matching problem contains the paging problem with bypassing as a special case. In the bypassing variant of the paging problem, an algorithm does not have to fetch the requested item to the cache.
Lemma 1
Assume that there exists a (deterministic or randomized) competitive algorithm for the matching problem for some function . Then, there exists an (also deterministic or randomized) competitive algorithm for the paging with bypassing.
Let be an algorithm for the matching problem. Assume that an algorithm for paging with bypassing has to operate in a universe of items. To construct an algorithm for the paging problem, we thus create a star graph of nodes and set of nonconfigurable links connecting with all remaining nodes, each of length . Nodes correspond to the universe of items. For any paging request to an item , generates a block of requests to node pair . internally runs on the star graph and repeats its choices: always caches the items that are connected by the matching edges to in the solution of .
For now, we ignore the costs of removing edges from the matching. It is easy to observe that the cost of is at most times larger than that of . Furthermore, without loss of generality, for a block of requests to node pair an optimal algorithm for the matching either includes in the matching right before the block or it does not change its matching at all. Thus the optimal solutions for the matching problem and paging problem coincide and their costs differ exactly by the factor of . Putting these bounds together, we obtain that is competitive for the paging problem with bypassing, ignoring the costs of matching removals, and thus at most competitive when matching removals are taken into account.
As noted by Epstein et al. [epstein11] the bypassing variant is asymptotically equivalent to the nonbypassing one. Using known lower bounds for the paging problem [SleTar85, Young91] along with Lemma 1, we immediately obtain the following corollary.
Theorem 3
The competitive ratio of any algorithm for the matching problem is at least when the algorithm is deterministic and at least when the algorithm is randomized. The results hold for an arbitrary .
3 Empirical Evaluation
In order to empirically evaluate the performance of our algorithm, we performed extensive experiments on realworld workloads from various datacenter operators. In particular, we benchmark our randomized algorithm against a stateoftheart (deterministic) online bmatching algorithm [perf20bmatch] and against a maximum weight matching algorithm [maxmatching86].
3.1 Methodology
Setup. We implemented all algorithms in Python (3.7.3) leveraging the NetworkX library (2.3). For the implementation of the Maximum Weight Matching algorithm we used the algorithm provided by NetworkX, which is based on Edmond’s Blossom algorithm [maxmatching86]. Our experiments were run on a machine with two Intel Xeons E52697V3 SR1XF with 2.6 GHz, 14 cores each and 128 GB RAM. The host machine was running Ubuntu 20.04 LTS.
Simulation Workloads. Real world datacenter traffic can vary significantly with respect to the spatial and temporal structure they feature, which depends on the application running [sigmetrics20complexity]. Hence, our simulations are based on the following real world datacenter traffic workloads from Facebook and Microsoft, which cover a wide range of application domains.

Facebook [roy2015inside]: We use three different workloads, each from a different Facebook cluster. We use a batch processing trace from one of Facebook’s Hadoop clusters, as well as traces from one of Facebook’s database clusters, which serves SQL requests. Furthermore, we use traces from one of Facebook’s WebService cluster.

Microsoft [projector]
: This data set is simply a probability distribution, describing the racktorack communication (a traffic matrix). In order to generate a trace, we sample from this distribution
i.i.d. Hence, this trace does not contain any temporal structure by design (e.g., is not bursty) [sigmetrics20complexity]. However, it is known that it contains significant spatial structure (i.e., is skewed).
In all our simulations, we consider a typical fattree based datacenter topology, with nodes in the case of the Facebook clusters, and with nodes in the case of the Microsoft cluster. The cost of each request is calculated as the shortest path length between and . Hence, if and are connected by a reconfigurable link the cost equals . Otherwise, the routing cost is computed as the number of hops from to . Furthermore, each simulation is repeated times and then the results are averaged.
3.2 Results and discussion
We discuss the main results of our simulations based on the traces introduced above. For each traffic trace we evaluate the routing cost and execution performance of our randomized matching algorithm. In particular, we evaluate the impact of different cache sizes and compare our randomized algorithm (RBMA) to the performance of BMA [perf20bmatch] (BMA) and a Maximum Weight Matching algorithm (SOBMA).
Routing Cost. We first discuss the observed routing cost. Fig. 0(a) shows the results of the Facebook database cluster. The violet line denotes the oblivious case, where each request is solely routed over the static network and no matching edges are present, e.g., a network without any reconfigurable switch. The results show that RBMA achieves a significant routing cost reduction of up to with a cache size of . In comparison to BMA, RBMA performs almost identical to BMA on smaller cache sizes of . Fig. 0(c) shows that RBMA routing cost reduction is not more than higher on smaller numbers of requests, e.g., up to 200,000 requests. Still, in comparison to the SOBMA, the gap with respect to the routing cost reduction widens as the number of requests grows. In contrast to the results achieved on Facebook’s database cluster, Figs. 1(c) and 2(c) show that RBMA achieves similar routing cost reductions compared to BMA and SOBMA.
Regarding the Microsoft traces, we can observe that RBMA achieves a similar routing cost reduction compared to BMA. Furthermore, as the cache size grows, RBMA achieves the same routing cost reduction as BMA, as shown in Fig. 3(a). Fig. 3(c) shows that SOBMA performs significantly better than RBMA. However, Microsoft’s traces have no temporal traffic patterns and therefore offline algorithms such as SOBMA have a significant advantage.
Execution Time. All our simulations on all different traces show that our RBMA algorithm outperforms BMA [perf20bmatch] with respect to runtime efficiency. Furthermore, the size of the cache has a smaller impact on the execution time as for BMA. In particular, Figs. 0(b) 1(b), 2(b) show that a larger cache size can lead to a decrease in execution performance of up to in the case of the BMA algorithm, whereas our randomized algorithm is comparatively more robust to a change in the size of the cache. The results of the Microsoft cluster in Fig. 3(b) also shows that the runtime of our randomized algorithm grows slower than for BMA.
Summary. Our RBMA algorithm achieves almost the same routing cost reduction as BMA, while also achieving competitive cost reductions compared to an optimal offline algorithm. With respect to the runtime efficiency of our randomized algorithm, we find that our algorithm significantly outperforms BMA on all workloads. We conclude that our RBMA provides an attractive tradeoff between routing cost reduction and runtime efficiency.
4 Related Work
The design of datacenter topologies has received much attention in the networking community already. The most widely deployed networks are based on Clos topologies and multirooted fattrees [clos, singh2015jupiter, f10], and there are also interesting designs based on hypercubes [bcube, mdcube] and expander graphs [xpander, jellyfish].
Existing dynamic and demandaware datacenter networks can be classified according to the granularity of reconfigurations. Solutions such as Proteus
[proteus], OSA [osa], Cerberus [griner2021cerberus], or DANs [dan], among other, are more coarsegranular and e.g., rely on a (predicted) traffic matrix. Solutions such as ProjecToR [projector, spaa21rdcn], MegaSwitch [megaswitch], Eclipse [venkatakrishnan2018costly], Helios [helios], Mordia [mordia], CThrough [cthrough], ReNet [apocs21renets] or SplayNets [splaynet] are more finegranular and support perflow reconfiguration and decentralized reconfigurations. Reconfigurable demandaware networks may also rely on expander graphs, e.g., Flexspander [flexspander] and Kevin [zerwas2022kevin], and are currently also considered as a promising solution to speed up data transfers in supercomputers [100times, fleet]. The notion of demandaware networks raise novel optimization problems related to switch scheduling [mckeown1999islip], and recently interesting first insights have been obtained both for offline [venkatakrishnan2018costly] and for online scheduling [schwartz2019online, dinitz2020scheduling, spaa21rdcn, perf20bmatch]. Due to the increased reconfiguration time experienced in demandaware networks, many existing demandaware architectures additionally rely on a fixed network. For example, ProjecToR always maintains a “base mesh” of connected links that can handle lowlatency traffic while it opportunistically reconfigures freespace links in response to changes in traffic patterns.This paper primarily focuses on the algorithmic problems of demandaware datacenter architectures. Our optimization problem is related to graph augmentation models, which consider the problem of adding edges to a given graph, so that path lengths are reduced. For example, Meyerson and Tagiku [meyerson2009minimizing] study how to add “shortcut edges” to minimize the average shortest path distances, Bilò et al. [bilo2012improved] and Demaine and Zadimoghaddam [demaine2010minimizing] study how to augment a network to reduce its diameter, and there are several interesting results on how to add “ghost edges” to a graph such that it becomes (more) “small world” [ghostedges, smallworldshortcut, gozzard2018converting]. However, these edge additions can be optimized globally and in a biased manner, and hence do not form a matching; we are also not aware of any online versions of this problem. The dynamic setting is related to classic switch scheduling problems [mckeown1999islip, chuang1999matching].
Regarding the specific matching problem considered in this paper, a polynomialtime algorithm for the static version of this problem is known for several decades [Schrij03, anstee1987polynomial]. Recently, Hanauer et al. [infocom22matching] presented several efficient algorithms for the static problem variant for application in the context of reconfigurable datacenters. Bienkowski et al. [perf20bmatch] initiated the study of an online version of this problem and presented a competitive deterministic algorithm and showed that this is asymptotically optimal. In this paper, we have shown that a randomized approach can provide a significantly lower competitive ratio as well as faster runtimes.
Finally, we note that there is a line of papers studying (bipartite) online matching variants [onlinematching, onlinematchingsimple, adwordsprimaldual, concavematching, rankingprimaldual, bipartitematchingstronglylp, adwordslp, adwordsec]. This problem attracted significant attention in the last decade because of its connection to online auctions and the AdWords problem [adwordssurvey]. Despite similarity in names (e.g., the bipartite (static) matching variant was considered in [kalyanasundaram2000optimal]), this model is fundamentally different from ours.
5 Conclusion
We have presented a randomized online algorithm which computes heavy matchings between, e.g., racks in a reconfigurable datacenter, guaranteeing a significantly lower competitive ratio and faster running time compared to the stateoftheart (and asymptotically optimal) deterministic algorithm. Our algorithm and its analysis are simple, and easy to implement and teach.
That said, our work leaves open several interesting avenues for future research. In particular, we still lack a nonasymptotic and tight bound on the achievable competitive ratio both in the deterministic and in the randomized case. Furthermore, we so far assumed a conservative online perspective, where the algorithm does not have any information about future requests. In practice, traffic often features temporal structure, and it would be interesting to explore algorithms which can leverage certain predictions about future demands, without losing the worstcase guarantees.