Many typical network optimization problems are known to be NP-hard (e.g. load balancing , QoS routing ). Thus, we have witnessed the use of different traditional optimization algorithms to address such kind of problems (e.g., ILP, SGD). In essence, a network optimization tool can be achieved by combining two main elements: a network model, and an optimization algorithm. In this context, the accuracy of the model is critical to achieve high-quality results, as it is the one in charge of estimating the resulting performance after changing a configuration parameter in the network .
Some network optimization problems (e.g., load balancing) can be solved using relatively simple network models (e.g., based on fluid models), while others (e.g., end-to-end delay optimization) require much more complex models, such as packet-level network simulators. However, these complex models are computationally very expensive and, as a result, do not meet the requirements to achieve online network optimization in large-scale network scenarios.
Alternatively, many analytic network models have been developed in the past  . However, such models make strong assumptions that do not hold in real-world networks, for instance neglecting queuing delay or probabilistic routing. In addition, they are unable to accurately model scenarios involving arbitrary sequences of complex queuing policies . As a result, they are not accurate for large networks with realistic routing and queuing configurations and, consequently, they are not practical for modelling relevant performance metrics like delay, jitter or loss in real-world networks.
This issue has attracted the interest of the networking community, which is recently investigating the application of Deep Learning (DL) techniques to build efficient networks models, particularly focused on complex network scenarios and/or performance metrics. Researchers are using neural networks to model computer networks 
and using such models for network optimization, in some cases in combination with advanced strategies based on Deep-Reinforcement Learning[12, 4].
Most of existing DL-based solutions for network modelling, mainly rely on common Neural Network (NN) architectures such as feed-forward NNs, or Convolutional NNs. However, data from computer networks is essentially represented in a graph-structured manner, and this kind of NNs are not suited to learn data structured as graphs. As a result, they have very limited applications. For instance, they cannot be trained in a set of networks and then operate successfully in other networks, nor understand routing and queueing policies different from those seen during the training phase. All this is the result of their poor generalization capability over graphs.
In this context, Graph Neural Networks (GNN)  have recently emerged as effective techniques to model graph-structured data. Particularly, these new types of neural networks are tailored to understand the complex relationships between connected elements in graphs. More in detail, the internal architecture of a GNN is dynamically built based on the elements and connections of input graphs, and this permits to learn generic modelling functions that do not depend on the graph structure. As a result, GNN has demonstrated an outstanding capability to generalize over graphs of variable size and structure in many different problems [8, 2, 25].
In this paper we present a new GNN model that is able to predict the per-path delay given an input topology, routing configuration, queue scheduling policy, and traffic matrix. This architecture is easily extensible to estimate other relevant performance metrics such as jitter or packet loss. More importantly, it offers a strong relational inductive bias over graphs , which endows the model with strong generalization capabilities over topologies, routings, queuing policies and traffic unseen during the training phase.
Ii Graph-based Deep Learning network model
In this section, we present a GNN-based model that introduces queue scheduling as an inherent component of the neural network architecture. This enables to model accurately how different queueing configurations (scheduling and queue parameters) affect the network performance. The model architecture is designed to generalize to different networks never seen during the training phase. We refer the reader to [18, 8, 3] for a comprehensive background on GNN.
The main intuition behind this architecture is as follows. The model considers three main entity types: the queues present in the different network devices, which have some configuration parameters (size, priority, weight, etc.), the links from the topology (i.e., connections between nodes), which include their capacity as input features, and the paths formed by the input routing scheme. Thus, the per-source-destination traffic of the input traffic matrix is encoded in the initial path states. Particularly, they contain information about the average bandwidth generated on that path.
Figure 1 shows a scheme that represents how the GNN treats these three components. First, the state of paths depend on the concatenation of queues and links they traverse. In this case, follows the sequence: . At the same time, the state of queues and links depend on all the paths that cross them. Hence, there is a circular dependency between the states of these three elements that the GNN model should resolve to eventually produce delay estimates on each path.
A computer network can be defined by a set of queues and the links that connect them . Let us also consider the function that returns all the queues that inject traffic into link . The routing scheme in the network can be described by a set of paths . Each path traverses a sequence of queues and links , where is the index of the i-th link or queue in the path . The properties (features) of queues, links and paths are denoted by , and respectively. In this particular case, the initial features of queues () are the size, the priority and the weight. The initial link features () are the capacity and the scheduling policy of the node. Finally, the initial path features () are defined by the bandwidth generated for each source-destination pair.
Ii-C Network Model
We describe the state of the queues, links and paths as , and
respectively. These unknown hidden vectors that describe the state are expected to contain some meaningful information about links (e.g., utilization), queues (e.g., load, packet loss rate), and paths (e.g., end-to-end metrics such as delays or packet losses). Considering the aforementioned assumptions, the following principles can be stated:
The state of a path depends on the states of all the queues and links that it traverses.
The state of a link depends on the states of all the queues that inject traffic in this link.
The state of a queue depends on the states of all the paths that inject traffic in it.
These principles can be formulated by the following equations:
where , and are some unknown functions.
A direct approximation of functions , and is complex given that: Equations 1, 2 and 3 define a complex nonlinear system of equations with the states being hidden variables, these functions depend on the input routing scheme, the mapping of traffic flows to queues (Type of Service) and the different queue policies in the network, and the dimensionality of all the possible states is extremely large.
GNNs have shown an outstanding capability to work as universal approximators over graphs. With this, the proposed GNN architecture finds an approximation for the , and functions that can be applied to unseen topologies, routing schemes and queue policies.
Ii-D Proposed GNN Architecture
Algorithm 1 describes the internal architecture of the GNN. This architecture takes advantage of the ability of GNNs to meet the challenges presented. Particularly, it solves the circular dependencies described in Equations 1, 2 and 3 by executing an iterative message passing process. In each message passing iteration, graph elements exchange their hidden states , and with their neighbours according to the operations in Algorithm 1, and this is repeated iterations (loop from line 4). Thus, hidden states , and eventually should converge to some fixed points.
In Algorithm 1, the loop from line 6, and lines 12 and 15 represent the different message-passing operations that exchange the information encoded in the hidden states between queues, links and paths. Likewise, lines 10, 13 and 16 are update functions that incorporate the newly collected information into the hidden states.
This architecture provides flexibility to represent any routing scheme and queue policy configuration. This is achieved by the direct mapping of
to specific message passing operations between queues, links and paths. Thus, each path collects messages from all the queues and links included in it (loop from line 5), then each queue receives messages from all the paths containing it (line 11) and similarly, each link collects information from all the queues that inject traffic in it (line 14). Given that the order of paths that traverse a queue does not matter, a summation is used to aggregate the paths’ hidden states on queues. However, in the case of links and queues, there is a sequential dependence. For this reason, we use a Recurrent Neural Network (RNN) to aggregate the sequences of queues and links on the paths’ hidden states. Similarly, the model aggregates the queue states on their related links using an RNN, as the order of queues is important to understand and model the queue policy (e.g., the priority order).
Iii Evaluation of the accuracy of the Model
In this section, we analyze the accuracy of the proposed GNN model to estimate the per-source-destination delay in a wide variety of topologies, routing schemes and traffic intensities.
Iii-a Simulation Setup
To train our model we built a ground truth with a packet-level network simulator (OMNeT++ v5.5.1 ). For each simulation, the traffic, queue policies, and routing scheme is chosen randomly according to the ranges defined below. Then the mean end-to-end delay for every source-destination pair is measured.
Our traffic model follows a similar approach as in . Particularly, we generate input traffic matrices () as follows:
Where U(0.1,1) is a uniform distribution in the range, is the number of nodes in the network topology and is a tunable parameter that indicates the overall traffic intensity in the simulation. represents how congested is the network, in our dataset it ranges from 400 to 2000 bits per time unit. Being 400 the lowest congested network (with 0% packet loss) and 2000 a highly congested network with 3% of packet loss. The inter-packet arrival time is modelled by a Poisson process, with the mean derived from . The packet size follows a bimodal distribution commonly used in other works . Finally, we assign randomly a Type of Service (ToS) label to each source-destination traffic flow (ToS[0-9]).
Each node is configured randomly with: a queue scheduling policy, that can be First In First Out (FIFO), Strict Priority (SP), Weighted Fair Queueing (WFQ) or Deficit Round Robin (DRR), a random number of queues 2-5 (except FIFO with only 1 queue) and random queue size (16, 32 or 64 packets). For WFQ and DRR, we also define a set of random queue weights that add up to 1. Finally, the mapping between ToS and queues is also chosen randomly.
Iii-B Training and Evaluation
We implement the GNN model using TensorFlow. The source code and all the training/evaluation datasets used in this paper will be publicly available upon acceptance. To train the model, we used NSFNET and GEANT topologies as described before. We choose GBN for evaluation. For each topology, 100,000 random samples are used, which gave us a total of 200,000 samples for training and 100,000 samples for evaluation.
Our model has two relevant hyper-parameters that can be fine-tuned: The size of the hidden states , and and the number of message passing iterations (). Based on early experiments we found in our case a size of 32 for the three different types of hidden states and iterations leads to good accuracy.
We choose the Mean Squared Error (MSE) as a loss function, which is minimized using an Adam optimizer with an initial learning rate of 0.001 and a decay rate of 0.6 executed every 80,000 steps. In addition to this, we added anregularization loss of . Figure 2 shows how the training loss decays as the training evolves. The training was executed in a testbed with a GPU Nvidia GeForce GTX 1080 Ti for
2 epochs (i.e.,450,000 training steps). In total, the training phase took around 12 hours (9.25 samples per second).
summarizes the evaluation results in the three topologies described in Section III A. Note that the model was only trained with samples from GEANT and NSFNET, while GBN was not seen during the training. As the proposed GNN can be analyzed as a regression model, we provide two evaluation statistics: (i) the Mean Relative Error (MRE) and (ii) the percentage of variance explained by the model (). As we can observe, the model shows good accuracy even in GBN – the network not seen during training (). This high accuracy reveals the capability of the proposed GNN architecture to generalize over different topologies never seen during the training phase.
Iv Related Work
One of the fundamental goals of network modelling is to provide a cost function that is then used for optimization. Over the years, many attempts have been made to obtain good cost functions. This includes fluid models, analytical models (e.g., queuing theory, network calculus) and discrete-event network simulators (e.g., ns-3, OMNet++).
Among all the existing techniques, queuing theory is possibly the most popular. As an example,  obtains an objective function based on the linearization of well-known queuing theory results. Alternatively, fluid models are efficient and popular for some congestion control problems. However, they make important simplification assumptions and, as shown before, it leads to considerable innacuracies . Likewise, network calculus operates over the worst-case scenarios of networks. Thus, these types of scenarios are rarely observed in operational environments. As a result, this kind of techniques can often lead to poor performance compared to the use of accurate models as the one proposed in this paper.
The use of Deep Learning for network modelling is a topic recently addressed by the research community [21, 23, 14]. Existing proposals mainly use common artificial neural networks (e.g., feed-forward NNs, convolutional NNs). The main limitation of these works is that they do not generalize to other topologies and configurations (e.g., routing).
In this paper, we have proposed a new Graph-based Deep Learning architecture for network modeling. The main novelty of this model is that it is able to predict the impact of arbitrary queuing policies on network performance, while generalizing successfully to unseen network scenarios. Although in this paper we evaluate the model to predict the per-path delay, it can be easily extended to support other QoS metrics such as jitter or packet loss. More importantly, the proposed model is fast compared to other accurate network modeling tools like packet-level simulators.
-  (2012) Fast emergency paths schema to overcome transient link failures in ospf routing. arXiv preprint arXiv:1204.2465. Cited by: §III-A3.
-  (2016) Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510. Cited by: §I.
-  (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261. Cited by: §I, §II.
-  (2019) DeepRMSA: a deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks. Journal of Lightwave Technology 37 (16), pp. 4155–4163. Cited by: §I.
-  (2012) Perspectives on network calculus: no free lunch, but still good value. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pp. 311–322. Cited by: §I.
-  (2006) Optimal throughput-delay scaling in wireless networks-part i: the fluid model. IEEE Transactions on Information Theory 52 (6), pp. 2568–2592. Cited by: §I.
-  (2007) On the limitation of fluid-based approach for internet congestion control. Telecommunication Systems 34, pp. 3–11. Cited by: §IV.
-  (2017) Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212. Cited by: §I, §II.
-  (2014) A survey on intelligent routing protocols in wireless sensor networks. Journal of Network and Computer Applications 38, pp. 185–201. Cited by: §I.
-  (2015) A declarative and expressive approach to control forwarding paths in carrier-grade networks. ACM SIGCOMM computer communication review 45 (4), pp. 15–28. Cited by: §I.
Wavelength converter placement in least-load-routing-based optical networks using genetic algorithms. Journal of Optical Networking 3 (5), pp. 363–378. Cited by: §III-A3.
-  (2019) Self-learning routing for optical networks. In International IFIP Conference on Optical Network Design and Modeling, pp. 467–478. Cited by: §I.
-  (2020) Interpreting deep learning-based networking systems. In Proceedings of ACM SIGCOMM, pp. 154–171. Cited by: §IV.
-  (2018) Understanding the modeling of computer network delays using neural networks. In Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, pp. 46–52. Cited by: §IV.
-  (2011) Performance evaluation of integrated otn/dwdm networks with single-stage multiplexing of optical channel data units. In 2011 13th International Conference on Transparent Optical Networks, pp. 1–4. Cited by: §III-A3.
-  (2004) Routing, flow, and capacity design in communication and computer networks. Elsevier. Cited by: §IV.
-  (2019) Unveiling the potential of graph neural networks for network modeling and optimization in sdn. In Proceedings of the 2019 ACM Symposium on SDN Research, pp. 140–151. Cited by: §I, §III-A1, §IV.
-  (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §I, §II.
-  (2007) Internet packet size distributions: some observations. USC/Information Sciences Institute, Tech. Rep. ISI-TR-2007-643, pp. 1536–1276. Cited by: §III-A1.
-  (2001) Discrete event simulation system. In Proc. of the European Simulation Multiconference (ESM’2001), pp. 1–7. Cited by: §III-A.
-  (2017) Machine learning for networking: workflow, advances and opportunities. IEEE Network 32 (2), pp. 92–99. Cited by: §IV.
-  (2011) A study of the routing and spectrum allocation in spectrum-sliced elastic optical path networks. In 2011 Proceedings Ieee Infocom, pp. 1503–1511. Cited by: §I.
-  (2018) Deep-q: traffic-driven qos inference using deep generative network. In Proceedings of the 2018 Workshop on Network Meets AI & ML, pp. 67–73. Cited by: §IV.
-  (2018) Experience-driven networking: a deep reinforcement learning based approach. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1871–1879. Cited by: §I.
-  (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §I.