1 Introduction
Large cloud providers spend billions of dollars to provision and operate planetscale widearea networks (WANs) that interconnect geodistributed cloud datacenters. Cloud WANs are critical to the business of cloud providers since they enable low latency and high throughput applications in the cloud. Over the last decade, cloud providers have deployed SDNbased centralized network traffic engineering (TE) systems to effectively utilize cloud WANs [23, 20, 21, 29].
TE systems allocate demands between datacenters to achieve high link utilization [20, 23], fairness across network flows [20] and resilience to link failures [35, 2] in WANs. Traditionally, cloud TE systems formulate the traffic allocation problem as optimization with the objective of achieving a desirable network property (e.g., minimum latency, maximum throughput). The softwaredefined TE controller periodically gathers network state, solves the optimal traffic allocation problem, and translates the highlevel traffic allocations into router configurations (Figure 1).
After a decade of operation, production WAN TE systems are facing two major challenges today. First, the deployment of new edge sites and datacenters has increased the size of cloud WANs by a magnitude [20]. Larger cloud WAN topologies increase the time to compute optimal traffic allocations, thereby lengthening the TE control loop [1]. Longer TE control loops delay the SDN controller’s reaction to changes in network state such as topology and demand matrices. We quantify the effect of TE computation delays using traffic matrices and the network topology of a large commercial cloud provider in Figure 2: The left panel shows that stale traffic allocations from the previous TE run can oversubscribe nearly 40% of links in the network; the right panel shows that the fraction of satisfied demand decreases with the duration for which obsolete routes remain active in the network.
Second, cloud WANs have evolved from carrying firstparty discretionary traffic to realtime and userfacing traffic. As a result, cloud TE systems have to contend with rapid changes in traffic volume which is a hallmark of organic userdriven demands. Changes in traffic demands or network topology require quick recomputation of traffic allocations to prevent the network from undergoing both under and overutilization due to stale routes. Therefore, increased uncertainty in traffic demands coupled with sudden topology changes due to link failures exacerbates the negative effects of long TE control loops on the network performance.
ML for TE acceleration. In this work, we develop a deep learningbased approach for WAN TE to tackle the rapidly increasing scale of TE optimization in cloud WANs. Previous work on learning to engineer WAN traffic attempted to achieve similar performance as the TE optmization while operating under demand uncertainty [57]. In contrast, we argue that while carefullydesigned learningbased TE systems can achieve nearoptimal performance, the key benefit of MLbased TE is the runtime acceleration it offers to the TE control loop.
Key insight.
While linear programming (LP) solvers used by TE systems find optimal TE solutions, they scale poorly with network size. Moreover, stateoftheart algorithms for solving LPs are limited in their ability to exploit the parallelism of modern GPU systems. Our key insight is the highly parallelizable nature of neural network inference is ideal for accelerating TE optimization. Previous work has tackled the scaling challenge of TE optimization by decomposing the network topology into clusters and invoking LP solvers to solve subproblem for each cluster in parallel
[1]. However, due to the iterative nature of routing residual demands after network decomposition, the speedup is limited (§3).Challenges of applying ML to TE. However, offtheshelf deep learning models do not apply to the TE problem. First, standard fullyconnected deep neural networks ignore the effects of WAN connectivity on traffic allocations leading to traffic allocations that are far from the LP optimal. Second, neural networks lack the ability to constrain the range of output variables of the model. This can cause the traffic allocations from the model to exceed link capacities in the network, leading to unnecessary traffic drops in practice. And finally, as the scale of the TE problem increases with more network nodes or demand pairs, it gets challenging to design a deep learning model that is sufficiently expressive to produce nearoptimal traffic allocations.
We develop Teal to tackle these challenges. Teal builds a graph neural network (GNN) to leverage the structure and properties of the WAN topology. The GNN extracts features from demands and feeds them into policy network that computes flow allocations for all demands. Teal uses multiagent reinforcement learning (RL) to train the policy network offline on abundant historical traffic allocations in the production network of a commercial cloud provider. Finally, Teal augments the policy network with alternating direction method of multipliers (ADMM) to enforce link capacity constraints on traffic allocations produced by the policy network (§4). We evaluate Teal both in simulation and on production traffic matrices of a global commercial cloud provider (§6) and show:
Teal is nearoptimal. Teal realizes nearoptimal flow allocation that is within 3.7% at all times of the largest satisfiable demand achieved by the TE optimization solver.
Teal accelerates TE. Teal is several orders of magnitude faster than baselines across different network sizes. It achieves up to 59× speedup relative to NCFlow [1] on large topologies of over 1,500 nodes.
To aid further research and development, we will release Teal’s source code upon publication. We place Teal in the context of related work in §7.
2 Background and Motivation
In this section, we formulate the traffic engineering optimization (§2.1) and describe the scaling challenges faced by systems implementing the TE optimization (§2.2).
2.1 TE optimization formulation
The goal of network traffic engineering (TE) algorithms is to efficiently utilize the expensive network resources to achieve performance goals like minimal latency, maximal throughput, fairness between customer allocations, etc.
Network. We represent the WAN topology as a graph where nodes represent network sites (), edges between the sites represent network links resulting from longhaul fiber connectivity () and assigns link capacities to links. Let denote the number of network sites. Each network site can consist of either one or multiple aggregated routers.
Traffic demands. The demand between a pair of network sites and in is the volume of network traffic originating from that must be routed to within a given duration of time. A separate component in the system periodically gauges demands for the next time interval (e.g., five minutes) based on the needs of various services, historical demands, and bandwidth enforcement [23]. The gauged demand is considered fixed for the next time interval and is input to the TE optimization. The TE algorithm computes allocations along network paths to meet the given demand [57, 1].
Network paths. Traffic corresponding to network demand flows on a set of network paths . These paths are precomputed by network operators and are input to the TE optimization. This version of the TE optimization that allocates traffic onto paths as opposed to individual edges, is called the path formulation of TE. The path formulation reduces the computational complexity of the TE optimization and also reduces the number of switch forwarding entries required to implement the traffic allocation [20, 1].
Traffic allocations. A traffic allocation allocates all demands onto the corresponding network paths . Thus, the traffic allocation for a demand is a mapping from path set to nonnegative values, i.e., , such that is the fraction of traffic demand allocated on path . The traffic allocation at time interval is denoted as .
Constraints. For any demand , to only allocate as much traffic as the demands. Additionally, we constrain the allocations by , to ensure that the traffic allocations do not exceed the capacity of network links.
TE objectives. The goal of TE algorithms can range from maximizing network throughput to minimizing latency and previous work has explored algorithms with a variety of TE objectives. We show that Teal can achieve nearoptimal performance and acceleration for all wellknown TE objectives (Section 6). In this section, we illustrate the TE optimization problem using the maximum network flow objective since it has been adopted by production TE systems [20, 29]. The TE optimization computes a routing policy, that satisfies the demand and capacity constraints while maximizing the TE objective. We denote periodic TE allocations for time interval as . We summarize this TE formulation in Equation 1.
maximize  (1)  
subject to  
2.2 Scaling challenges of TE
In their early years, cloud WANs consisted of tens of datacenters. Due to the small scale of the WAN topology, it was feasible to compute traffic allocations by solving optimization formulations that encoded variants of the multicommodity flow problem (Equation 1). However, rapid deployment of datacenters in the WAN has increased the solving time of TE optimizations to several minutes [1]. Since the overall temporal budget for the TE control loop is only five minutes, it is important to accelerate solving the TE optimization. We discuss intuitive ways of improving the solving time of TE optimization and why they fall short.
Parallelizing TE optimization. An intuitive way to accelerate TE optimizations is to parallelize the optimization solver. However, algorithms for solving convex optimization problems are limited in their ability to exploit parallelism. Stateoftheart LP solvers like Gurobi [18] and CPLEX [22] rely on optimization methods that are severely bottlenecked by sequential operations. For instance, the standard algorithm—simplex method [44]—moves one small step at a time toward the optimal solution along the edges of the feasible region, and requires thousands to millions of such steps. As a result, optimization solvers do not benefit much from multiple threads. A common strategy adopted by LP solvers is to concurrently run independent instances of different optimization algorithms [45]. Each instance runs sequentially on one thread, and results are reported from whichever instance completes first. This is not an efficient use of multiple threads and thus only achieves marginal speedup as more threads become available.
Approximation algorithms. Combinatorial approximation algorithms (e.g., Fleischer’s algorithm [11]) find faster but approximate solutions with provable guarantees. Nonetheless, they are iterative algorithms that incrementally allocate more flows while computing a solution. In practice, they are hardly faster than LP solvers [1].
Decomposing TE into subproblems. Recently, NCFlow [1] proposed to decompose the WAN topology into clusters and simultaneously solve TE optimization in each cluster using LP solvers. However, to allocate as much flow as a global optimizer, NCFlow requires resolving the optimization problem iteratively on the remaining demands and residual capacities in each iteration. Moreover, NCFlow requires approximately number of clusters on a topology of nodes, restraining itself from achieving greater parallelism. We compare our approach with NCFlow in Section 6.
3 Accelerate TE optimization with ML
Motivated by the fast inference speed and immense expressive power [5] of neural networks—the core technique behind deep learning that contributes to most of the visible ML breakthroughs, we leverage deep learning to accelerate TE optimization and encode a TE scheme in neural networks. We train the neural network outofband on seas of historical network data to exploit its power of discovering regularities in data and specializing in a particular (not necessarily general) WAN topology of interest to operators. During deployment, the neural network makes an inference for input demand matrices and outputs traffic allocations. This inference process, known as “forward propagation,” consists of numerous highly parallelizable operations like matrix multiplications. Thus, neural network inference benefits from parallelism without iteratively approaching a TE solution like prior work (§2.2).
Leveraging hardware parallelism. Moreover, deep learning research has made tremendous efforts to enable hardware acceleration of neural network evaluation on GPUs and other specialized hardware [17, 67, 7]. Offtheshelf frameworks [46, 54, 55] are available for harvesting massive parallelism and optimizing the inference speed on hardware accelerators.
Leveraging abundant training data. Although deep learning is blessed with the massive parallelism in neural networks and fast computation on GPUs, we clarify that we do not attempt to replace LP solvers, which are admittedly more general. We also expect other approximation algorithms for TE (including NCFlow) that do not use deep learning to be more generalizable on unseen WAN topologies. By contrast, our learningbased TE scheme is highly specialized in the optimization for a given WAN topology, finding and potentially relying on the data patterns in the graph connectivity, link capacities, and traffic demands^{1}^{1}1Unlike prior work [57], we are not interested in learning any temporal pattern in the time series of traffic demands (see TE formulation in §2.1 and discussion in §7). However, our MLbased approach relies on the spatial pattern within every traffic demand matrix, and thus might not generalize well to a substantially different set of demand matrices.. We consciously train our learningbased scheme to “overfit” a particular topology because in practice, WAN topologies do not change frequently: network planning that adds new sites or links only occurs every few months [39]; link failures are also rare events [51] (while we show in Section 6.3 that our scheme can handle a small number of unseen link failures). Regardless, we will retrain the TE scheme upon significant topology changes or performance degradation.
3.1 Challenges of applying ML to TE
While deep learning inference has the potential to accelerate the TE control loop, offtheshelf models for deep learning don’t apply asis to the TE problem for the following reasons:
Graph connectivity and network flows. Modeling TE optimization naïvely, e.g., with the classic fullyconnected deep neural networks (DNNs), ignores the connectivity and structure of the network topology. For instance, connectivity between nodes impacts link utilization and demand patterns. A family of neural networks, called graph neural networks (GNNs), are better suited for modeling graphstructured data. Offtheshelf GNNs can model traditional graph attributes like nodes and edges but they do not capture network flows—the focal point of TE optimization.
Constrained optimization. Unlike linear programming, neural networks are known to lack the ability to learn hard constraints [33], e.g., total link utilization should be within the link’s capacity. Consequently, the TE solution output by offtheshelf neural networks is not directly actionable since the resulting link utilizations can violate capacity constraints.
Highdimensional solution space. Consider the path formulation widely adopted in practice (§2.1). For traffic demand between any sourcedestination pair in a topology of nodes, a TE scheme is required to split the demand into a handful of (e.g., four) flows routed over preconfigured paths, i.e., the output flow allocation is described by split ratios (specifically,
). To put it in context, on a WAN topology with a thousand network sites, the solution space’s dimension becomes as high as nearly 4 million, exposing MLbased TE schemes to the “curse of dimensionality”
[28].4 Teal: Learningaccelerated TE
In this section, we describe the design of Teal—Traffic Engineering Accelerated with Learning. The goal of Teal is to train a fast and scalable TE scheme with deep learning while achieving nearoptimal traffic allocation on largescale topologies, given any operatorspecified objective (e.g., maximizing the total flow). The rationale behind using deep learning is to harness the massive parallelism and hardware acceleration unlocked by neural networks. Moreover, every component of Teal is designed to be parallelizable (fast on GPUs) and scalable (performant on large WANs).
4.1 Overview
We first describe the highlevel workflow of Teal during deployment (Figure 3). On the arrival of new traffic demand matrices or changes in link capacities, Teal passes the updated demands and capacities into a novel graph neural network (GNN), which we call the FlowGNN (§4.2
). We note that link failures are an extreme case of capacity change where the failed link’s capacity drops to zero. FlowGNN extracts features from demands into an encoded feature vector, known as the “embedding,” which captures the information necessary to determine flow allocations in the subsequent stages. FlowGNN is highly parameterefficient and performs feature extraction in a distributed fashion.
In the path formulation of TE, each demand is allocated onto a handful of (e.g., four) preconfigured network paths (§2.1). Teal aggregates flow embeddings that belong to the same demand and sends them to a traffic allocation policy—encoded in a “policy network”— to produce the traffic split ratios across the path set. The shared policy network processes demands independently in parallel. Teal trains the policy network offline to coordinate flow allocations and collaboratively optimize a global TE objective (e.g., total flow). We achieve this by adapting a multiagent reinforcement learning (RL) algorithm (§4.3) to TE. Since the policy network operates on individual demand pairs rather than the entire demand matrix, it is compact with respect to the parameters to learn and is oblivious to the scale of WAN topologies.
The traffic split ratios output by the policy network may exceed some link capacities and cause the network to drop traffic. They may also be suboptimal in achieving the desired TE objective. Hence, to reduce constraint violations and close the optimality gap, Teal augments the policy network with 1–3 fast iterations of a classical constrained optimization algorithm—alternating direction method of multipliers (ADMM) (§4.4). In each iteration, ADMM starts from a potentially infeasible TE solution and takes a small step toward the feasible region, i.e., tweaking flow allocations to satisfy capacity constraints and optimize the TE objective. Each iteration of ADMM is inherently parallelizable. When warmstarted with a reasonably good solution, such as the allocations output by Teal’s policy network, ADMM can significantly improve the solution quality within a few iterations.
For each WAN topology, Teal trains its “model”—the FlowGNN and the RL policy network—end to end to optimize an operatorprovided TE objective. ADMM requires no offline training. The three major components of Teal (the FlowGNN, the multiagent RL policy, and ADMM) are devised to be highly parallelizable with a distributed nature. Their computation time and TE performance scale well with the network size and demands (§6).
4.2 Feature extraction with FlowGNN
In light of the graph structure of a WAN topology, Teal leverages graph neural networks (GNNs) for feature extraction. GNNs are a family of neural networks invented to capture graphstructured data [47]. They have found applications in domains such as network planning [70], social network [10, 40], and traffic prediction [32].
A GNN stores information in graph attributes, commonly in nodes, using a vector representation called the “embedding.” The information is exchanged on the GNN through “pooling” [16]: Each node gathers the embeddings from adjacent nodes, performs a transformation (e.g., a fullyconnected neural network) on the aggregated embeddings, and updates its own embedding with the result. GNNs are intrinsically distributed and parallelizable as the pooling is performed simultaneously on every node in each GNN layer. Multiple GNN layers can be stacked together to progressively disseminate information multiple hops away. Additionally, GNNs are parameter efficient because each layer shares and learns the same transformation function for pooling. Meanwhile, the transformation works in a relatively lowdimensional space (the embedding space) and does not scale in proportion with the input graph size.
Nevertheless, the focal point of TE is the allocation of flows, with each flow traversing a chain of network links (edges) on a precomputed path (for a demand). TE is also concerned about the interference between flows as they contend for link capacities. Hence, we put a spotlight on flows and explicitly represent flowcentered entities—edges and paths—as the nodes of our TEspecific GNN, which we call the FlowGNN.
Figure 4 illustrates the construction of FlowGNN with a small example. At a high level, the FlowGNN alternates between a regular GNN layer that roughly captures the capacity constraints in TE, and a dense DNN layer that roughly captures the demand constraints.^{2}^{2}2This design mimics coordinate descent [61] on a twodimensional space that repeatedly optimizes each coordinate in turn. The regular GNN layer is in the shape of a bipartite graph and constructed as follows (using the notation from §2.1). We create an “edgenode” for every edge in , a “pathnode” for every path in , and connect and with a link in GNN if and only if the edge is on the path in . We emphasize that each pathnode is specific to a demand and thus does not portray (e.g., the crowdedness of) a physical network path. Instead, it represents a flow routed for demand on the path . During the initialization of the GNN, we load edge capacity into the embedding of , and demand volume into the embedding of . By doing so, we hope that the regular GNN layer might learn to respect capacity constraints through the pooling operations between each edgenode and its neighboring pathnodes . The dense DNN layer is added to coordinate the flows of the same demand . It collects the embeddings of all the pathnodes that carry the flows for (i.e., ) and outputs the same number of embeddings that are immediately stored back to the respective pathnodes before being passed to the regular GNN in the next FlowGNN layer.
As embeddings are updated through alternating regular GNN layers and dense DNN layers in FlowGNN, they eventually encode the extracted features required by the next stage for flow allocations. In particular, the final embedding in a pathnode represents the learned features of the flow allocated for demand on path .
4.3 Traffic allocation with multiagent RL
Teal leverages FlowGNN to effectively extract features from flows and save them as embedding vectors. Next, Teal seeks to learn a traffic allocation policy encoded in a policy network that maps these flow embeddings to traffic splits. Precisely speaking, Teal trains its “model” that consists of the FlowGNN and the policy network end to end in practice.
Since each flow typically shares multiple links with other cross flows, the policy network prefers a global view of flows for coordination and allocation. However, concatenating all flow embeddings (or directly using a traffic demand matrix) would result in a prohibitively large input space. The output space is equally highdimensional: on a WAN topology of a thousand nodes, the policy network is expected to output millions of traffic split ratios. Empirically, we find that a gigantic policy network makes training intractable and leads to a significant portion of demand being unsatisfied (§6.6).
To reduce the problem dimensions and the number of weights to learn, Teal turns to distributed processing of individual demands using a shared, significantly smaller policy network (Figure 5). Specifically, Teal trains a single policy network to simultaneously allocate each demand onto its preconfigured path set. E.g., when the path set has four paths, the policy network’s input only contains four flow embeddings from the FlowGNN, while its output is four real values that describe how the demand is split onto its prespecified paths. The policy network is oblivious to the WAN topology size, i.e., it does not grow with the increase in WAN sites or traffic demands. We note that the policy network does not process each flow individually as it would have neglected the fact that the flows of the same demand are fundamentally connected with each other for having the same source and destination. Distributed processing of demands makes the policy network substantially more compact and tractable to learn.
Given Teal’s endtoend model—the parameterefficient, inherentlydistributed FlowGNN and the compact and distributed policy network (shared among demands), we ask: what learning algorithm is suitable for training this model that distributedly generates local traffic splits for every demand such that a global TE objective is optimized? To answer this question, we discuss the candidate learning paradigms that are applicable for training the above model.
Supervised learning: In an offline setting, LP solvers (e.g., Gurobi [18]
) can be used to compute the optimal allocation for each demand and provide groundtruth traffic splits to “imitate” using standard supervised learning. On large WANs, however, generating groundtruth labels for supervised learning
^{3}^{3}3“Learningtoroute” [57, 56] considers supervised learning for predicting future traffic demands, so their groundtruth labels are demand matrices that already exist in the training data. However, there is no demand uncertainty in our TE formulation (see §2.1). can take a prohibitively long time and incur overly high memory usage. E.g., Gurobi requires 5.52 hours to find the optimal allocation for a single demand matrix on a 1739node topology while consuming up to 18.1 GB of memory.Direct loss minimization: Supervised learning minimizes the distance between the optimal splits of each demand and those output by ML as the loss. Alternatively, one could choose another differentiableloss function and directly minimize it with gradient descent [19, 52]. Unfortunately, common TE objectives are nondifferentiable. E.g., computing the total (feasible) flow requires reconciling the flows that collectively exceed a link’s capacity, e.g., by proportionally dropping traffic from them. Consequently, the gradient of the total feasible flow with respect to the model’s parameters is zero, preventing the model from learning through gradient descent.^{4}^{4}4In this case, we simply say the loss function is “nondifferentiable.” The common workaround is to design a differentiable “surrogate loss” to approximate the nondifferentiable loss. For the objective of maximizing the total flow, we design a global surrogate loss that penalizes the total flow intended to be routed (as if on infinitecapacity edges) using the total overused link capacities (formal definition is in §6.1). In Section 6, we train Teal’s model to optimize the global surrogate loss with direct loss minimization. It falls short of the multiagent RL algorithm introduced below perhaps because the surrogate loss serves only as an approximation of the desired loss, i.e., the TE objective.
Multiagent RL: Teal chooses to model the allocation of each demand as an RL agent that seeks to cooperatively accomplish a common goal with other agents in the same environment; this is known as multiagent RL [13, 58, 14]. Throughout the training, agents jointly update the shared policy network that outputs traffic splits and learn to collaboratively attain a higher global “reward,” i.e., an operatorspecified TE objective such as the (nondifferentiable) total flow. In practice, Teal allows using almost any TE objective as the reward (§6.4) and does not particularly require a differentiable function. After training, each agent processes its demand independently in parallel. Teal materializes these cooperative agents by adapting COMA [14], a multiagent RL algorithm, for TE optimization. We design a variant of COMA, which we call COMA, by leveraging the “onestep” characteristic of TE—a traffic allocation output by the TE scheme does not alter the future input demands (§2.1). This TEspecific insight helps COMA
to estimate the contribution of each agent to the global reward
^{5}^{5}5A critical problem in RL known as “credit assignment” [6].for better coordination, as well as to reduce the variance in training
[59]. The details of COMA are described in Appendix A.4.4 Constrained optimization with ADMM
TE is a constrained optimization problem in essence, but neural networks lack the capability to enforce the hard demand and capacity constraints in TE. Consequently, the traffic allocations directly predicted by Teal’s neural network model might be infeasible and cause link overutilization. They might also be suboptimal due to prediction errors [8] and carry less demand (as we will show in Figure 11(b)).
To improve suboptimal traffic allocations and satisfy more capacity constraints, Teal complements deep learning with several (e.g., 1–3) fast iterations of a classical optimization algorithm—ADMM (alternating direction method of multipliers) [3]. In each iteration, ADMM improves the solution by taking a small step in the direction of the objective function’s gradient while striving to satisfy constraints. ADMM is wellsuited to Teal for two reasons. First, different from the optimization methods commonly used in LP solvers (simplex and interiorpoint methods), ADMM belongs to a family of optimization methods (“dual methods” [34]) that do not require a feasible (constraintsatisfying) solution to begin with. Instead, ADMM allows starting from an infeasible (constraintviolating) point—as Teal’s policy network might output—and proceeds toward the feasible region. Second, ADMM is a parallelizable optimization algorithm with a distributed nature. It decomposes a global optimization problem into small subproblems that can be solved in parallel. Therefore, unlike LP solvers, ADMM benefits significantly from GPU acceleration. Nonetheless, when starting from a random guess, ADMM still requires an excessive number of iterations to find an accurate solution, forfeiting its fast speed within each iteration. However, when ADMM is warmstarted with a reasonably good solution, i.e., the traffic allocations output by Teal’s policy network, we find that one or two iterations of ADMM suffice and lead to a notable improvement with a negligible runtime overhead (§6.6).
Before incorporating ADMM into Teal, we must first transform TE optimization into a particular form required by ADMM. During the transformation, we ensure that the optimization objective is decomposable (along each edge or path) such that each iteration of ADMM may optimize the decomposed objectives in parallel and complete quickly on GPUs. The details of ADMM are in Appendix B.
5 Implementation of Teal
Implementing Teal. We implement Teal
in PyTorch. The model assumes that a set of four paths are precomputed for each demand on the network topology. Most hyperparameters below (e.g., the number of neural network layers) are selected empirically by testing various values.

[noitemsep,topsep=0pt,leftmargin=*]

FlowGNN. The FlowGNN consists of 6 layers (where each layer contains a regular GNN and a DNN). In each fiveminute interval, we initialize the node embeddings in the first FlowGNN layer as follows: Each pathnode is initialized with the volume of demand for which the path is preconfigured, and each edgenode is initialized with the capacity of the corresponding link (the capacity is zero if the link is down during the interval). Starting from the onedimensional embeddings in the first FlowGNN layer, we increment the embedding dimension by one in each subsequent layer, where the extended dimension is filled with the same values as in the initial embeddings (a technique introduced in previous work [42]). Therefore, the output of the final embedding by FlowGNN has 6 elements each.

Multiagent RL.
The policy network in the multiagent RL is a fullyconnected DNN with one hidden layer of 24 neurons. It has 24 input neurons that aggregate four flow embeddings from the FlowGNN simultaneously for each demand. The output layer has 4 neurons and uses the softmax to normalize the outputs to 4 valid split ratios.

ADMM. We apply one iteration of ADMM for topologies with fewer than 500 nodes and two iterations otherwise.
Training Teal. We train a different Teal model per TE objective and per network topology. During training, Teal uses the Adam optimizer[25]
for stochastic gradient descent with a learning rate of 0.0001. Each training takes about 6–10 hours on large topologies.
6 Evaluation
In this evaluation section, we first describe our methodology in §6.1. In Section 6.2, we compare Teal with the stateoftheart schemes and show that Teal simultaneously achieves substantial speedup and nearoptimal allocation for TE. Upon link failures, Teal reacts promptly by realtime recomputation and retains its benefits (§6.3). Section 6.4 demonstrates Teal’s flexibility with TE objectives. Section 6.5 and 6.6 aim to understand the sources of Teal’s improvement.
6.1 Methodology
Topologies. We use the publicly released WAN topology B4 [23], a real network topology from a global commercial cloud provider anonymized as Anon, and two topologies from the Internet Topology Zoo [26]—UsCarrier and Kdl. To evaluate Teal on large networks, we adapt the ASlevel topology of the Internet [4]. Table 2 summarizes the number of nodes and edges in the above topologies. We precompute four shortest paths between every sourcedestination node pair. For those topologies that do not come with link capacities, we set the capacities such that the allocations generated by the best TE scheme in an experiment satisfies most traffic demand.
Traffic data. We collect traffic data over a 20day period in 2022 on Anon, the production WAN operated by a global commercial cloud provider. The network traffic between node pairs in each fiveminute interval is aggregated and considered as an input demand matrix to TE. We transform the real data collected on Anon into other topologies in a random fashion while retaining the realism in the original data. For each node pair in a different topology, we generate a time series of demand matrices by randomly selecting a sourcedestination pair in Anon, along with their demand matrices from a contiguous sequence of intervals. Each sequence starts at a random position.
We generate the training, validation, and test data sets similarly by sampling disjoint sequences from the data described above, where each sequence contains 700 intervals for training, 100 intervals for validation, and 200 intervals for testing.
Baselines. We compare Teal with the following baselines.

[noitemsep,topsep=0pt,leftmargin=*]

LPtop:
Our traffic trace exhibits a heavytailed distribution: top 10% of demands between node pairs constitute 88.4% of the total volume. To speed up the optimization, we implement LPtop, which uses the LP solver to allocate the top 10% of demands and pins the other 90% on their shortest paths. LPtop is a simple yet effective heuristic scheme that strikes a balance between allocation quality and computation time; it has been largely overlooked in prior work.

NCFlow: NCFlow [1] partitions the network topology into disjoint clusters and solves the subproblems of TE optimization within clusters in parallel. We adopt the same number of clusters as NCFlow used for UsCarrier and Kdl, and apply NCFlow’s default partitioning algorithm (“FMPartitioning”) on the other topologies. Other settings remain default as they are in NCFlow (e.g., stop condition).

TEAVAR: TEAVAR [2] is a riskaware TE scheme that explicitly accounts for the likelihood of link failures. It allocates traffic to achieve higher link utilization while meeting an operatorspecified threshold for availability. We compare Teal with the variant of TEAVAR that NCFlow adapted to maximize the total flow—TEAVAR [1].
Variants of Teal. We assess the impact of several design choices in Teal with microbenchmarks.

[noitemsep,topsep=0pt,leftmargin=*]

Teal w/o ADMM: A variant of Teal without running any iterations of ADMM.

Teal w/ direct loss: In this variant, we use the “direct loss minimization” algorithm described in §4.3 to train Teal’s model. The surrogate loss that approximates the (nondifferentiable) total flow is defined as the (potentially infeasible) total flow intended to be routed, penalized by the total link overutilization. Using the notation from §2.1, the surrogate loss is formally expressed as

Tealglobal w/ direct loss: Same as “Teal w/ direct loss” except that this variant uses the “gigantic policy network” (described in §4.3) that processes demands altogether.
is several orders of magnitude faster than the stateoftheart schemes while realizing nearoptimal flow allocation. Dots show mean values and error bars show standard deviations.
Objectives. We optimize the total (feasible) flow as the default TE objective that is commonly adopted before [1, 20, 23]. In §6.4, we also consider two other TE objectives in the prior work: 1) minimizing the max link utilization [12, 57], and 2) maximizing the total flow with delay penalties [9], formally defined as
where denotes the delay penalty on link .
Metrics. We evaluate TE schemes on the following metrics.

[noitemsep,topsep=0pt,leftmargin=*]

Computation time: We measure the wallclock time elapsed when a TE scheme computes the traffic allocation for every demand matrix. Teal runs on a single GPU (Nvidia Titan RTX). The other TE schemes do not benefit speedup from GPUs (due to their limitations detailed in §2.2), so we provision a sufficient number of CPU cores (typically 16) and verify that their runtime receives little improvement from additional CPU cores.

Satisfied demand: We measure the percentage of the total demand that a TE scheme satisfies with its allocation in an online setting: Before the TE scheme finishes computing the allocation for a new demand matrix, the most recent stale allocation will remain in use. This setting is more realistic for production TE systems and also adopted previously [1]. Note that the satisfied demand only normalizes the total flow by the total demand, and is thus suitable for evaluating schemes that optimize the total flow (our default objective unless stated otherwise, e.g., in §6.4).

Offline satisfied demand: We also measure the “offline satisfied demand” (in §6.5), which is the same as satisfied demand except in an offline setting: We ignore the delay caused by the TE scheme’s computation and assume the computed allocation is instantly deployed from the beginning of each interval. This metric separates out the influence of computation delay and focuses on the solution quality per se. Apparently, it captures the hypothetical performance of a scheme but is widely used in prior work [31, 1].
6.2 Teal vs. the state of the art
Figure 6 compares Teal with the stateoftheart schemes regarding the computation time and satisfied demand on four topologies. Overall, Teal is several orders of magnitude faster than the baselines across different sizes of networks, achieving up to 59 speedup relative to the secondfastest scheme (NCFlow on large topologies). Meanwhile, Teal realizes nearoptimal flow allocation that is within 3.7% at all times of the largest satisfiable demand (achieved by LPall on small topologies).
Small topologies. All the TE schemes are capable of quickly computing a traffic allocation within seconds on small topologies (Figure 5(a) and 5(b)). Therefore, none of their performance is negatively impacted by the computation time.^{6}^{6}6We verified this claim by looking at the “offline satisfied demand,” but the results are not differentiable from the figures and thus omitted. Figure 5(a) shows that the average satisfied demand of Teal is within 2.8% of LPall and 2.1% higher than NCFlow. (Note that the computation time is negligible and irrelevant in Figure 5(a) being all smaller than seconds.) In Figure 5(b), Teal’s allocation is also nearoptimal, with a difference of only 3.7% from LPall while being 19.7% greater than NCFlow. In particular, we note that NCFlow only meets 72.9% of the total demand since it is essentially an approximation algorithm designed for large topologies. We use the two small topologies as a sanity check and prove that Teal’s fast computation does not harm its solution quality.
Kdl. On the large topology Kdl with 754 nodes and 1790 edges (Figure 5(c)), Teal only requires 1.1 seconds for computing each allocation. This is 15 faster than NCFlow, 27 faster than LPtop, and 883 faster than LPall. Meanwhile, Teal also satisfies almost the same fraction of demand as the highest scheme (90.5% vs. 90.8%). By comparison, LPall requires over 900 seconds of computation and well exceeds the time budget of five minutes. Consequently, LPall is constantly forced to reuse the stale routes computed two or three time windows ago, leading to only 79% of demand being satisfied, despite its ability to compute optimal allocations when allowed unlimited time to run (which we report in §6.5). Both LPtop and NCFlow take less than 30 seconds to finish an allocation, but NCFlow only meets 68.9% of demands whereas the heuristic approach overlooked before, LPtop, satisfies notably more demands of 90.8%.
ASN. To assess each scheme’s scalability, we test them on the large ASN network, which has 1739 nodes and 8558 edges. Figure 5(d) shows more phenomenal speedup from Teal relative to the baselines. The average computation time of Teal (1.75 s) is 59 faster than NCFlow and 183 faster than LPtop (which has exceeded the fiveminute limit). LPall is practically infeasible to run on ASN for taking a prohibitively long time to finish, so we omit its results in the figure. Not only Teal achieves substantial speedup, it also satisfies the most demand on average. Specifically, it satisfies 6.6% more demand than LPtop and 19.8% more than NCFlow. Once again, the overlooked approach LPtop surpasses the NCFlow that is designed for large topologies like ASN. The superior performance of Teal demonstrates that Teal is fast and scalable on large networks of thousands of nodes, while it has also learned to produce highquality allocations.
Figure 7 zooms in on the performance as CDF curves. First, we observe in Figure 6(a) that Teal’s computation time is extremely stable across the demand matrices we tested, barely changing between 1.75–1.76 seconds at all percentiles. The consistent behavior stems from the fact that Teal only performs one forward pass on its neural networks, followed by precisely two iterations of ADMM (on this topology). The total amount of computation (e.g., measured by floatingpoint operations) is completely oblivious to the input demand matrix; Teal does not iterate until any accuracy target to meet either. By contrast, although NCFlow requires 94.1 seconds in the median case, its computation time grows to 231.5 seconds in the worst case, or 2.5 of the median. This is because NCFlow runs multiple iterations of the entire optimization, with each iteration on the remaining demands and residual capacities. It continues until a prespecified threshold is met, e.g., the current iteration improves by less than 5% compared with the last one. The LP solver (e.g., Gurobi) invoked by NCFlow has a similar stop condition (with a different definition of threshold). Given that NCFlow divides the ASN network into approximately 40 clusters (as preferred by the algorithm), if any of these clusters requires the LP solver to run longer than the others, solving the subproblem in that cluster would become the runtime bottleneck.
Second, Figure 6(b) shows that Teal outperforms LPtop and NCFlow on satisfied demand at almost all percentiles. In the median case, Teal allocates 20.62% more flows than NCFlow and 8.4% more than LPtop. The 90th percentile of Teal is 16.7% higher than NCFlow and 1.6% higher than LPtop. We believe that the robust performance of Teal is appealing to production TE systems.
6.3 Reacting to link failures
Teal solves TE optimization within seconds even on topologies of thousands of nodes. The realtime computation enables Teal to promptly react to link failures once they are detected in the network, since Teal may recompute allocation quickly on the “new” topology (with failed links having zero capacities). In practice, largescale link failures are extremely rare, so we only evaluate one or two link failures. Due to its heavy computational overhead, TEAVAR is unable to run on any topologies listed in Table 2 except for the smallest 12node B4 network [23]. Therefore, we only compare Teal with TEAVAR and NCFlow on B4.
On B4, all schemes are capable of finishing allocations quickly: NCFlow and TEAVAR require seconds whereas Teal only takes microseconds. Figure 8 shows their allocation performance under no link failures and immediately after one or two link failures. As the number of faults increases, all schemes meet less demand as anticipated, but Teal consistently outperforms TEAVAR by 2.4–4.6% while being on par with NCFlow. Conservatively preparing for possible link failures forces TEAVAR to exchange lower link utilization for higher availability. By contrast, Teal echos NCFlow’s argument—the performance loss during link failures can be compensated by fast recomputation. However, on the ASN network, we have shown that Teal only requires 1.76 seconds for recomputation whereas NCFlow needs up to 231.5 seconds, during which the network might experience a severe traffic drop. In other words, we extend NCFlow’s argument from fast recomputation to realtime computation, by showing that Teal effectively reduces the period impacted by link failures down to seconds without sacrificing the flow allocation performance.
6.4 Teal under different objectives
We train Teal under different TE objectives to evaluate its applicability. Section 4.3 explains why Teal is flexible with the TE objective to optimize compared with some ML algorithms that require a differentiable loss function. LP solvers are compatible with a wide range of linear objectives, while adapting other handcrafted algorithms such as NCFlow to use a different TE objective is nontrivial. Therefore, we are only able to compare Teal with LPall and LPtop on two different objectives that are used in prior work: 1) minimizing the max link utilization [12, 57] and 2) maximizing the total flow penalized by latencies. We exclude ADMM from Teal for these objectives as we did not bother to decompose them similarly as we did for the total flow (Appendix B); however, this is achievable since ADMM may optimize any convex function like an LP solver.
Max link utilization (MLU). Figure 9 shows that all the three schemes are able to achieve comparable MLUs—their differences are not statistically significant after taking variance into account. On the other dimension, Teal only requires 1.1–1.7 seconds to find a solution, whereas LPtop and LPall have to take approximately 20 seconds on Kdl and about 40–60 seconds on ASN. We observe two interesting phenomena. First, both LPtop and LPall complete faster compared with optimizing the total flow. We do not fully understand why Gurobi is faster in minimizing MLU, but both schemes are still 17–36 slower than Teal. Second, LPall runs faster than LPtop. This is because LPtop has an additional model building time for each demand matrix (Table 2), given that the top 10% of demands may vary over time and thus require rebuilding the model in Gurobi for each interval. The model rebuilding time accounts for a small overhead and slows down LPtop in this case.
Latencypenalized total flow. All schemes are able to optimize this objective within a reasonable time except for LPall on the ASN network. On Kdl, Teal is several orders of magnitude (718) faster than LPall. Meanwhile, compared with LPtop, Teal also tremendously speeds up the TE optimization on both topologies by 26–172. In terms of latencypenalized total flow, Teal is comparable to LPtop on Kdl and slightly better on ASN (0.83 vs. 0.76).
6.5 Offline TE performance
To understand how much of Teal’s performance advantages stem from its fast and scalable computation, we evaluate all schemes under the offline setting described in §6.1. LPall is practically infeasible to run on ASN due to its unreasonably long runtime and memory errors. We obtain the results otherwise in Figure 11.
On Kdl, LPall requires over 900 seconds to process each individual demand matrix, but its output allocation is optimal and may serve as a reference. Teal falls short of the optimal allocation by 4.9% in terms of the “offline satisfied demand,” but remains within 0.7% of the best feasible scheme (LPtop) and significantly outperforms NCFlow by 21.4%. On ASN, Teal and LPtop achieve comparable offline satisfied demand, while both being 19% higher than NCFlow. The results suggest that even when we do not account for the computation delay, Teal’s flow allocation is still nearoptimal.
6.6 Interpreting Teal
Next, we assess the contribution of several design decisions in Teal with microbenchmarks. The “gigantic policy network” (§6.1) that processes all demands together is not tractable to train on large topologies, so we only evaluate it on Anon and find that its performance after convergence falls significantly short of other schemes (Figure 11(a)).
Compared with “Teal w/ direct loss” (§6.1), Teal allocates 2.8–3.3% more demand in the median case. Although less prominent, the improvement is still critical to a WAN operator. Additionally, Teal’s multiagent RL policy may optimize a flexible array of TE objectives (§6.4), while it is nontrivial to find a good (differentiable) surrogate loss for objectives other than the total flow.
Compared with “Teal w/o ADMM,” Teal achieves a marginal improvement of 1.3–1.7% using ADMM. That said, the improvement is consistent across all percentiles because running ADMM will strictly reduce constraint violations and improve the solution quality without noticeable runtime overhead. We believe ADMM’s explainable behavior and the strictly better performance are desirable to TE operators, so we include ADMM in Teal to squeeze the marginal improvement on the billion dollars worth of the WAN asset.
7 Related Work
TE has been an integral part of service and cloud provider networks. Network operators have leveraged TE to maximize network utilization, guarantee fairness to flows and prevent link overutilization. While ISPs used switchnative protocols (e.g., MPLS, OSPF) to engineer traffic in their networks [62, tseospf], cloud providers implemented centralized softwaredefined TE systems to explicitly optimize for desirable network characteristics like low latency, high throughput and failure resilience [20, 23, 21, 35, 29, 66, 2]. In this section, we place Teal in the context of related work on WAN TE.
IntraWAN traffic engineering. In the last decade, large cloud providers have deployed SDNbased centralized TE in their planetscale WANs to allocate traffic between datacenters [23, 20, 29]. Centralized TE systems formulate the problem of allocating traffic in the WAN as an optimization problem and periodically solve the optimization to compute flow allocations in the network. Due to the increase in the scale of WANs and traffic matrices, the time needed to solve the optimization problem has become a bottleneck in the traffic engineering control loop. Researchers have proposed techniques that solve the TE optimization on smaller subsets of the problem and recombine the solutions to compute traffic allocations for the global graph [1, 43]. Teal tackles the scalability challenges faced by modern intraWAN TE controllers using a learningbased approach.
InterWAN traffic engineering. Cloud providers engineer the traffic at the edge of their networks by allocating demands on the links between the cloud and ISPs on the Internet. Recent work has shown the role of engineering interWAN traffic for both performance improvements and cost reduction [49, 48]. In contrast, Teal focuses on intradomain TE.
Crosslayer WAN TE. Researchers have leveraged centralized TE in ways that cut through different layers of the networking stack. For instance, recent work has improved network throughput by adapting the capacities of physical links in the TE control loop by leveraging the optical signal quality [51]. Researchers have engineered both WAN traffic and topologies in response to fiber cuts [69] and leveraged TE for capacity provisioning [50].
Networking research has leveraged learningbased techniques for a variety of classical networking problems like videorate adaptation [64, 38], congestion control [24, 65], traffic demand prediction [30, 37].
ML for traffic engineering. Recently, researchers have begun to leverage ML to allocate traffic in WANs [57, 56]. Their approach focuses on learning to route under traffic uncertainty. However, production WAN TE relies on separate components like bandwidth brokering to influence the traffic matrix for the next TE timestep. Therefore, Teal does not operate under demand uncertainty and receives the traffic matrix as part of its input. Other learningbased approaches to TE [15, 68, 36, 41, 63] aim to match the performance of the LP for specific objective functions like maximum throughput. Our work shows that learningbased approaches can achieve closetooptimal performance with low computation time to tackle the increasing scale of TE optimization.
8 Conclusion
In this work, we claim that deep learning is an effective tool for scaling intradomain TE systems to large WAN topologies. We develop Teal, a deep learning framework that combines graph neural networks, multiagent RL, and ADMM to allocate traffic in WANs. Teal computes nearoptimal traffic allocations with a 59 speedup over stateoftheart TE systems on a WAN topology of over 1,500 nodes.
References
 [1] (2021) Contracting Widearea Network Topologies to Solve Flow Problems Quickly. In Proceedings of USENIX NSDI, pp. 175–200. Cited by: Appendix C, §1, §1, §1, §2.1, §2.1, §2.2, §2.2, §2.2, 3rd item, 4th item, 2nd item, 3rd item, §6.1, §7.
 [2] (2019) TEAVAR: Striking the Right UtilizationAvailability Balance in WAN Traffic Engineering. External Links: Link Cited by: §1, 4th item, §7.

[3]
(2011)
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers.
Foundations and Trends® in Machine learning
3 (1), pp. 1–122. Cited by: §4.4.  [4] (2022) The CAIDA AS Relationships Dataset. External Links: Link Cited by: §6.1.
 [5] (2016) An Analysis of Deep Neural Network Models for Practical Applications. arXiv preprint arXiv:1605.07678. Cited by: §3.
 [6] (2003) All Learning Is Local: MultiAgent Learning in Global Reward Games. Advances in Neural Information Processing Systems 16. Cited by: footnote 5.
 [7] (2014) Diannao: a SmallFootprint HighThroughput Accelerator for Ubiquitous MachineLearning. ACM SIGARCH Computer Architecture News 42 (1), pp. 269–284. Cited by: §3.

[8]
(2018)
Inference Suboptimality in Variational Autoencoders
. pp. 1078–1086. Cited by: §4.4.  [9] (2001) MATE: MPLS Adaptive Traffic Engineering. pp. 1300–1309 vol.3. External Links: Document Cited by: §6.1.
 [10] (2019) Graph Neural Networks for Social Recommendation. pp. 417–426. Cited by: §4.2.
 [11] (2000) Approximating Fractional Multicommodity Flow Independent of the Number of Commodities. SIAM Journal on Discrete Mathematics 13 (4), pp. 505–520. Cited by: §2.2.
 [12] (2000) Approximating Fractional Multicommodity Flow Independent of the Number of Commodities. SIAM Journal on Discrete Mathematics 13 (4), pp. 505–520. Cited by: §6.1, §6.4.
 [13] (2016) Learning to Communicate with Deep MultiAgent Reinforcement Learning. Advances in Neural Information Processing Systems 29. Cited by: §4.3.

[14]
(2018)
Counterfactual MultiAgent Policy Gradients.
In
Proceedings of AAAI conference on artificial intelligence
, Vol. 32. Cited by: §4.3.  [15] (2021) Distributed and Adaptive Traffic Engineering with Deep Reinforcement Learning. In Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQOS), Vol. , pp. 1–10. External Links: Document Cited by: §7.
 [16] (2017) Neural Message Passing for Quantum Chemistry. In International Conference on Machine Learning, pp. 1263–1272. Cited by: §4.2.

[17]
(2022)
Cloud Tensor Processing Units (TPUs)
. External Links: Link Cited by: §3.  [18] (2022) Gurobi Optimizer Reference Manual. External Links: Link Cited by: §2.2, §4.3, 1st item.
 [19] (2010) Direct Loss Minimization for Structured Prediction. Advances in Neural Information Processing Systems 23. Cited by: §4.3.
 [20] (201308) Achieving High Utilization with Softwaredriven WAN. ACM SIGCOMM Computer Communication Review 43 (4), pp. 15–26. External Links: ISSN 01464833 Cited by: §1, §1, §1, §2.1, §2.1, §6.1, §7, §7.
 [21] (2018) B4 and After: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google’s SoftwareDefined WAN. External Links: Link Cited by: §1, §7.
 [22] (2022) CPLEX Optimizer. External Links: Link Cited by: §2.2.
 [23] (2013) B4: Experience with A GloballyDeployed Software Defined WAN. ACM SIGCOMM Computer Communication Review 43 (4), pp. 3–14. Cited by: §1, §1, §2.1, §6.1, §6.1, §6.3, §7, §7.
 [24] (2019) A Deep Reinforcement Learning Perspective on Internet Congestion Control. pp. 3050–3059. Cited by: §7.
 [25] (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.
 [26] (2011) The Internet Topology Zoo. IEEE Journal on Selected Areas in Communications 29 (9), pp. 1765–1775. Cited by: §6.1.
 [27] (1999) ActorCritic Algorithms. Advances in Neural Information Processing Systems 12. Cited by: Appendix A.
 [28] (2000) The Curse of Dimensionality. pp. 4–8. Cited by: §3.1.
 [29] (202204) Decentralized Cloud WideArea Network Traffic Engineering with BLASTSHIELD. Renton, WA, pp. 325–338. External Links: ISBN 9781939133274, Link Cited by: §1, §2.1, §7, §7.
 [30] (2019) Cloud Resource Demand Prediction Using Differential Evolution Based Learning. pp. 1–5. Cited by: §7.
 [31] (2016) Kulfi: Robust Traffic Engineering Using Semioblivious Routing. arXiv preprint arXiv:1603.01203. Cited by: 3rd item.
 [32] (2020) Traffic Prediction with Advanced Graph Neural Networks. External Links: Link Cited by: §4.2.
 [33] (2017) Enforcing Output Constraints Via SGD: A Step Towards Neural Lagrangian Relaxation. Cited by: §3.1.
 [34] (1954) The Dual Method of Solving the Linear Programming Problem. Naval Research Logistics Quarterly 1 (1), pp. 36–47. Cited by: §4.4.
 [35] (2014) Traffic Engineering with Forward Fault Correction. pp. 527–538. External Links: Link, Document Cited by: §1, §7.
 [36] (2020) Automated Traffic Engineering in SDWAN: Beyond Reinforcement Learning. In IEEE INFOCOM WKSHPS Workshops, Vol. , pp. 430–435. External Links: Document Cited by: §7.
 [37] (2020) Dynamic Graph Neural Network for Traffic Forecasting in Wide Area Networks. pp. 1–10. Cited by: §7.
 [38] (2017) Neural Adaptive Video Streaming with Pensieve. pp. 197–210. Cited by: §7.
 [39] (2017) SDN in WideArea Networks: a Survey. pp. 37–42. Cited by: §3, §5.

[40]
(2020)
SocialSTGCNN: a Social SpatioTemporal Graph Convolutional Neural Network for Human Trajectory Prediction
. pp. 14424–14432. Cited by: §4.2.  [41] (2019) DeepRoute on Chameleon: Experimenting with Largescale Reinforcement Learning and SDN on Chameleon Testbed. In Proceedings of IEEE International Conference on Network Protocols (ICNP), Vol. , pp. 1–2. External Links: Document Cited by: §7.
 [42] (2020) Solving Mixed Integer Programs Using Neural Networks. arXiv preprint arXiv:2012.13349. Cited by: 1st item.
 [43] (2021) Solving LargeScale Granular Resource Allocation Problems Efficiently with POP. In Proceedings of ACM SOSP, pp. 521–537. Cited by: §7.
 [44] (200001) The (Dantzig) Simplex Method for Linear Programming. Computing in Science and Engg. 2 (1), pp. 29–31. External Links: ISSN 15219615, Link, Document Cited by: §2.2.
 [45] (202008) Parallelism in LP and MIP. Note: https://cdn.gurobi.com/wpcontent/uploads/2020/08/HowtoExploitParallelisminLinearandMixedIntegerProgramming.pdf Cited by: §2.2.
 [46] (2019) PyTorch: An Imperative Style, HighPerformance Deep Learning Library. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §3.
 [47] (2021) A Gentle Introduction to Graph Neural Networks. Distill. Note: https://distill.pub/2021/gnnintro External Links: Document Cited by: §4.2.
 [48] (2017) Engineering Egress with Edge Fabric: Steering Oceans of Content to the World. pp. 418–431. Cited by: §7.
 [49] (2021) CostEffective Cloud Edge Traffic Engineering with Cascara. pp. 201–216. Cited by: §7.
 [50] (2021) CostEffective Capacity Provisioning in Wide Area Networks with Shoofly. In Proceedings of ACM SIGCOMM, New York, NY, USA, pp. 534–546. External Links: ISBN 9781450383837, Link, Document Cited by: §7.
 [51] (2018) RADWAN: Rate Adaptive Wide Area Network. New York, NY, USA. External Links: ISBN 9781450355674, Link, Document Cited by: §3, §7.
 [52] (2016) Training Deep Neural Networks Via Direct Loss Minimization. pp. 2169–2177. Cited by: §4.3.
 [53] (1999) Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12. Cited by: Appendix A.

[54]
(2022)
An EndtoEnd Open Source Machine Learning Platform
. External Links: Link Cited by: §3.  [55] (2022) Open Neural Network Exchange. External Links: Link Cited by: §3.
 [56] (2017) Learning to route with deep RL. In NIPS Deep Reinforcement Learning Symposium, Cited by: §7, footnote 3.
 [57] (2017) Learning to Route. In Proceedings of ACM HotNets, pp. 185–191. Cited by: §1, §2.1, §6.1, §6.4, §7, footnote 1, footnote 3.
 [58] (2019) Grandmaster Level in StarCraft II Using MultiAgent Reinforcement Learning. Nature 575 (7782), pp. 350–354. Cited by: §4.3.
 [59] (2013) The Optimal Reward Baseline for GradientBased Reinforcement Learning. arXiv preprint arXiv:1301.2315. Cited by: §4.3.
 [60] (2002) Optimal Payoff Functions for Members of Collectives. In Modeling Complexity in Economic and Social Systems, pp. 355–369. Cited by: Appendix A.
 [61] (2015) Coordinate Descent Algorithms. Mathematical Programming 151 (1), pp. 3–34. Cited by: footnote 2.
 [62] (200003) Traffic Engineering with MPLS in the Internet. IEEE Network 14 (2), pp. 28–33. External Links: ISSN 08908044, Link, Document Cited by: §7.
 [63] (2018) ExperienceDriven Networking: a Deep Reinforcement Learning based Approach. CoRR abs/1801.05757. External Links: Link, 1801.05757 Cited by: §7.
 [64] (202002) Learning in Situ: A Randomized Experiment in Video Streaming. In Proceedings of USENIX NSDI, Santa Clara, CA, pp. 495–511. External Links: ISBN 9781939133137, Link Cited by: §7.
 [65] (201807) Pantheon: the Training Ground for Internet CongestionControl Research. In Proceedings of USENIX ATC, Boston, MA, pp. 731–743. External Links: ISBN 9781939133014, Link Cited by: §7.
 [66] (2017) Taking the Edge Off with Espresso: Scale, Reliability and Programmability for Global Internet Peering. pp. 432–445. Cited by: §7.
 [67] (2015) Optimizing FPGABased Accelerator Design for Deep Convolutional Neural Networks. pp. 161–170. Cited by: §3.
 [68] (2020) CFRRL: Traffic Engineering With Reinforcement Learning in SDN. IEEE Journal on Selected Areas in Communications 38 (10), pp. 2249–2259. External Links: Document Cited by: §7.
 [69] (2021) ARROW: RestorationAware Traffic Engineering. New York, NY, USA, pp. 560–579. External Links: ISBN 9781450383837, Link, Document Cited by: §7.
 [70] (2021) Network Planning with Deep Reinforcement Learning. In Proceedings of ACM SIGCOMM, pp. 258–271. Cited by: §4.2.
Appendix A Coma details
At a high level, COMA builds on the idea of “counterfactual reasoning” [60]
, deducing the answer to a “What if…” question: At the moment every agent is about to make a decision (
action), what would be the difference in global reward if only one agent’s action changes while the other agents’ actions remain fixed? E.g., in the context of TE that aims to maximize the total flow, our COMA reasons about: Compared with the current traffic allocations, how much would the total flow differ if we only reallocate the flows of one delmand while keeping the allocations of the other demands unchanged? The performance difference measures the contribution of an agent’s action to the overall reward. Specifically, the reward difference defines the “advantage” of the current joint action over the counterfactual baseline (where only one agent tweaks its action). The “advantage” is heavily used in this family of RL algorithms (known as “actorcritics” [27]) to effectively reduce the variance in training.At each time step when a new demand matrix arrives or any link capacity varies (e.g., due to a link failure), Teal passes the flow embeddings (stored in pathnodes of the FlowGNN) for the same demand to the RL agent designated to manage the demand. We define these flow embeddings as the state observed locally by agent . Presented only with the partial view captured by , agent makes an action , a vector of split ratios that describe the allocation of the agent’s managed demand. Let denote the policy network parameterized by shared by agents. Learning the weights with gradient descent is known as “policy gradient” [53], which typically requires a stochastic form
that represents the probability of outputting
given . Since allocations are deterministic in TE, a common way to convertto stochastic is to make it output the mean and variance of a Gaussian distribution to facilitate training, during which actions are sampled from the Gaussian distribution
. After deployment, we use the mean value output by the Gaussian as the action.We use to denote the central state formed by all local states , and to denote the joint action formed by all local actions . A reward , such as the total flow, is available after all agents have made their decisions. To compute the “advantage” when only agent alters its action, COMA proposes to estimate the expected return, namely a discounted sum of future rewards, obtained by taking the joint action in central state . By comparison, our COMA computes the expected return by leveraging the “onestep” nature of TE: an action (flow allocation) in TE does not impact the future states (traffic demands). Consequently, the expected return effectively equals the reward obtained at a single step. Moreover, suppose that agent varies its action to while the other agents keep their current actions, the new joint action—denoted as —can be directly evaluated by simulating its effect, i.e., we compute the TE objective obtained if the new joint action were to be used. Putting everything together, COMA computes the “advantage” for agent as follows:
(2) 
where we perform Monte Carlo sampling to estimate the counterfactual baseline, e.g., by drawing a number of random samples for . The gradient of is then given by
(3) 
which is used for training the policy network with standard policy gradient. In reality, Teal trains the FlowGNN and the policy network of COMA end to end, so
represents all the parameters to learn in the endtoend model, meaning that the gradient is backpropagated from the policy network to the FlowGNN.
Appendix B ADMM details
In this section, we derive the ADMM iterates for the TE problem 1, reproduced here:
maximize  
subject to  
In order to apply ADMM which requires a specific form to optimize, we must decouple the constraints in the original problem. As constraint [2] couples the edge traffic across paths and demands, we introduce dummy variables
for each path (from in any demand ), and edge . We note that each path uniquely stems from a particular demand . Then, we replace constraint [2] with the following constraints:Finally, we add slack variables , for all demands and edges respectively, to turn inequalities [1] and [3] into equality constraints:
maximize  
subject to  
Introducing Lagrange multipliers , the augment Lagrangian for this transformed problem is
where and
The ADMM iterates at step are then given by
with the initial iterates warmstarted by the policy network.
Appendix C More experimental results
Following prior work [1], we also present how different schemes keep track of changing demands over time in Figure 13. Figure 6(a) has shown that LPtop’s runtime fluctuates from a median of 313 s to the worst case of 794 s, persistently violating the time budget. When we plot the time series in Figure 13, we observe that LPtop can only compute a new allocation for every other or every third demand matrix. However, using stale routes from 10–20 minutes ago does not actually lead to notable performance degradation for LPtop, presumably because the traffic demand in the real world (§6.1) does not exhibit high fluctuation—at least from the perspective of the total volume, meaning that the fluctuation of individual demands between node pairs is perhaps smoothed in the sum. Therefore, NCFlow’s fast computation does not compensate for its worse allocation in this case. By contrast, Teal consistently allocates the most demand on the network in each time interval, reinforcing our observation about Teal’s robustness on the temporal scope.