The Internet has become a part of our everyday lives and is used by more and more individuals, firms and organizations worldwide . Internet disruptions can entail major risks and huge costs , in particular for the growing number of e-businesses. Being cut off from the Internet for even the shortest period of time affects customer confidence and could lead to a severe business losses. Motivated by these risks, a new field of research has evolved that investigates the robustness of the Internet with respect to random failures and targeted attacks, based on mathematical graph models [3, 70, 9]. This fundamental abstraction enables analysis and quantitative evaluation of the main function of large and complex networks such as the Internet: their ability to provide – preferably short – communication paths between pairs of entities. Assessing the robustness of the Internet structure and corresponding traffic flows has recently inspired a large number of novel robustness metrics. As of yet, however, there is no comprehensive review of this field. The present survey aims to fill this gap in order to foster future research. It presents important analysis strategies and metrics that are proposed in literature, along with major results.
This article is structured as follows. First, Internet Robustness is defined from a graph-theoretical perspective; then, layer and graph models of the Internet as well as network generators are presented, and challenges and aspects of robustness are introduced. The main categories of metrics are introduced, and the corresponding article structure is motivated. Then, important general features of metrics are compared and discussed, followed by a list of symbols, concepts and mathematical notations.
In the main part of the article, the actual metrics and methods for robustness analysis are presented and discussed, using categories that reflect their main idea: Adjacency, Clustering, Connectivity, Distance, Throughput, Spectral Methods, and Geographical Metrics. This is followed by a discussion section that compares the suitability of the metrics for measuring major robustness aspects and outlines a conceptual tool set of metrics for further research on Internet robustness. Finally, a summarizing table provides an overview of the main metrics and their features.
2 Methods and Notation
2.1 Internet Robustness
Internet Robustness can be generally defined as the ability of the Internet to maintain its service of transferring data between node pairs in a timely manner under challenges. However, its interpretation and relationship to the more general concept of resilience varies to some extent in different research communities [3, 111].
In our article, we focus on topological robustness and corresponding topological robustness metrics or graph metrics, which mainly abstract from technical or organizational details of the Internet by the use of mathematical graph theory and particular graph models
2.2 Layer Models and Graph Models of the Internet
On the one hand, the well-known ISO/OSI and TCP/IP models focus on technical aspects of communication that are described by a hierarchy of layers, such as physical and local connections, Internet routing, transport control and application-layer protocol connections, including the Hypertext Transfer Protocol (HTTP) for the Web but also protocols for exchanging routing information (such as BGP, the Border Gateway Protocol, see RFC 4271), for email exchange, or peer-to-peer (P2P) overlay networks. For instance in , a technically-layered model of Internet Resilience is constituted by a Physical (P), Link+MAC (L), Topology (To), Routing (R), Transport (Tr), Application (A) and End User (U) layer. Each layer provides the base for the next higher one and can be characterized by layer-specific metrics, including non-topological performance measures.
On the other hand, there are organizational and political perspectives on the exchange, forwarding, controlling and filtering of data on the Internet, for instance, models that distinguish the layers of Internet Protocol (IP) interfaces, routers, points-of-presence (PoP), autonomous systems (AS), Internet Service Providers (ISPs) or other organizations, and countries. As an example, in , the authors construct an organizationally-layered family of graph models derived from empirical measurement data.
For both perspectives and most layers, there are historical examples of attacks and failures with impact on connectivity [9, 37]. Some attacks on higher layers, such as BGP misconfiguration, affect the robustness of AS-level and IP-level graphs without damage to the underlying physical infrastructure and interconnections of the lower layers. Other failures and attacks take place on the Physical Layer of the Internet, but can cause faults on higher abstraction levels, such as routing or AS layers.
Correspondingly, it is useful to model and analyze entities and their interconnections as graphs for studying its topology and robustness at any particular layer. Traditionally, the complex-network community has studied Internet Robustness at single layers [2, 101, 3, 70, 9, 4], and this is the main focus of our study; but there are several recent approaches to model the Internet and assess its Robustness using layered hierarchies. Most graph metrics that are discussed in the following sections can be relevant for organizational as well as for technical graph models, and can be applied at every layer where an exchange or relation between different entities is of interest.
Each layer can be modeled by several different graph types . A simple graph is a strong abstraction as it is constituted solely of two entity types: nodes (e.g., routers or ASes) and undirected edges (e.g., links or routing connections), but it has the advantage that fewer information is needed for analysis. In edge-weighted graphs, transmission speed or capacity constraints are also modeled . A directed graph accounts for routing policies between the ASes, e.g., denoting a customer-provider relationship by an edge that points from the former to the latter; peer-to-peer ASes are connected through two links, one in either direction [117, 116]. Nodes can only communicate if both have a directed path towards a common node, e.g., if they share a common upstream provider. A weighted directed graph can also be taken into consideration. Further important modeling aspects are the amount of traffic flow between nodes  and their Euclidean distance .
Concerning properties of complex networks, Internet graphs on both router and AS-levels have been shown to feature scale-free (SF) degree distributions  and the small-world property [2, 4], in particular, shorter paths and higher clustering than expected within random networks with a given degree distribution .
Not all network-generating models exhibit these characteristics: The Erdös-Rényi (ER) model  exhibits none of them, the Watts-Strogatz (WS) model  lacks the scale-free degree distribution, and the clustering coefficient of the Barabási-Albert  (BA) model is slightly too high. The Klemm-Eguíluz (KE) model  and the HOT model  of the router level of the Internet  seem to be suitable in this sense. However, there is currently no final consensus on a most accurate model, in particular also with respect to the highly dynamic and evolutionary character of the Internet [101, 1]. Therefore, in this article, metrics are generally discussed independently of these models.
2.3 Challenges and Aspects of Robustness
Even with an abstract and topological approach, the question whether an Internet graph is robust should be considered with respect to the requirements of the particular service that is to be delivered; usually, this is a multidimensional problem, which requires several metrics measuring different robustness aspects that should remain in an acceptable state even under severe challenges.
Two main variants of topological robustness are often discussed in complex network literature: The robustness of a network against random failures of entities, and its robustness against targeted attacks [3, 70, 9]. These challenges can be further motivated (): Failures of entities are random and can be caused by software, hardware or human errors. Furthermore, unexpected non-malicious traffic increases in parts of the network can be caused by flash crowds. Not least, geographically extended faults can occur due to earthquakes or power outages.
Targeted and correlated attacks, such as distributed denial of service (DDoS) and malware, aim at destroying crucial entities, e.g., those identified with graph metrics such as highest Degree or Betweenness. Not specifically investigated in our study is the topic of network epidemics, which is relevant for self-disseminating malware such as Internet worms and their propagation strategies, but only indirectly affects network robustness, e.g., by malicious payloads or network saturation. Interactions of malware with topological network properties could be the focus of a dedicated survey on network epidemics.
Many metrics can be used with both random failure and targeted attack strategies. Local or non-adaptive attack strategies calculate the metric values of all entities once and take them down in decreasing order. In contrast, global or adaptive strategies recalculate metrics after every deletion and tend to have a more severe impact while also needing real-time and global information access . In this article, if an attack is analyzed without further specification of its strategy, it always refers to an adaptive attack on highest-degree nodes.
Loosely corresponding to the challenges, there are major aspects of topological robustness. Disconnection Robustness of a graph is measured by metrics that assess path redundancy, detect and evaluate topological bottlenecks, and analyze the severity of graph disconnections. It is defined as the ability of the network to maintain a connection between all (or most) of the node pairs when challenged. Another crucial feature to take into consideration is the small-world property of the Internet, which is associated with the inverse of the average minimum hop distance between node pairs. Transmission Speed Robustness is the ability to keep the decay of this measure low, even under stress.
Another aspect, Traffic Robustness of a network, is defined as the ability to endure and transmit highly dynamic traffic demands without becoming overloaded.
All of these aspects are important for the Internet Backbone, which forms the core of the Internet, providing shortest paths through high-capacity links and routers. Corresponding distance concepts focus, for example, on the number of hops a data package has to traverse in order to reach the target node, implicitly assuming that the longer the path, the slower the transmission speed. Throughput takes into account that capacities for information forwarding are limited, and calculates the maximum workload of network entities or their likelihood to become overloaded when other entities fail.
However, apart from the above mentioned exceptions, the study of layer interactions and their implications for Internet robustness is still an emerging research field, whose tools and methods are not yet broadly established. The same holds for the study of robustness of interacting networks  such as electricity and communication networks. For the broader and multi-facet concept of Resilience, we refer the reader to related work .
As a baseline for future extensions, our study aims to survey the most relevant metrics for single-layer Internet Robustness analyses, with emphasis on the Topology and Routing Layer and above. To what extent the surveyed metrics are suitable for studying these mentioned aspects, is a complex research question in itself, which will be discussed in more detail in Section 10.
2.4 Metric Selection and Article Structure
The selection of metrics is based on literature research, mainly conducted via Google Scholar using diverse topic-related keywords, and inverse search using the references in publications already identified. The selection of metrics for this survey, out of the vast number of general graph metrics in the literature , is conducted based on the following criteria: The metric has to be applicable to at least one graph type and either directly measure an aspect of network robustness or at least a clearly related characteristic.
Exclusively assigning the extensive amount of metrics relevant for Internet robustness to non-overlapping groups would prove to be infeasible since some metrics are relevant for several aspects. Therefore, the article is structured as follows: the presentation of metrics and corresponding results for Internet graphs is conducted by dividing them into six major groups that indicate their respective main task. Three of them, Adjacency, Clustering and Connectivity build on each other and describe the general structure of graphs. The Distance and Throughput categories focus on the concepts that are crucial for communication networks, approximating the concrete Internet routing processes via shortest-paths. Throughput accounts for Internet-specific link capacity restrictions. Aspects of all these groups are also regarded in Spectral Methods with the help of random walks. The more sophisticated matrix-calculation schemes involved justify their bundling into a dedicated chapter. Afterwards, a separate section assesses the suitability and interactions of the metrics with respect to the different Robustness Aspects.
2.5 Features of Metrics
Many metrics are used with different names in the literature. In order to ensure consistency of notation, only one name is selected and used throughout the article; the others are provided in brackets. At the outset of every section, general features of the metric are displayed.
First, the technical layers as defined above in Section 2.2, following , on which the metric can be used particularly well (but often not exclusively), are indicated by their respective abbreviations given in Section 2.2. This categorization should be understood as a preliminary step to combine the traditional single-layer graph analysis of the Internet with a more detailed technical layer model.
The next feature indicates the graph types to which the metric has been applied in the literature. Whenever an application to further graph types is straightforward, by substituting simple-graph elements by their directed or weighted counterparts, this will be listed in brackets. For example, the entry ’(un)weighted (directed)’ indicates that the metric is applied to simple and weighted graphs, but can be modified for directed ones. In order to highlight metrics that can be applied to every graph type, ’all’ is used.
The third entry states whether the metric is local – either measuring the individual node’s impact on network robustness or its liability to faults of connectivity to the rest of network – or if it is global, capturing features about the entire graph . Local metrics often depend on a smaller information set, provide a more detailed insight and need less calculation time, while global metrics are more meaningful for assessing the state of the entire network and also allow for comparison of different graphs.
A metric is denoted static if it is applied without simulating faults – taking a snapshot of characteristics that influence robustness – or dynamic if it assesses network behavior under arbitrary removal strategies; the entries failures or worst-case indicate metrics only suitable for these scenarios.
Next, the codomain is provided, providing a quick impression of the value range of a metric – for instance, if the metric ranges from zero to one. Its dependency on parameters such as the network size indicates whether a direct comparison of graphs that differ in these aspects is possible. If it is set to ’-’, the presented method is not a metric in the narrower sense, but a general algorithm, e.g., for finding cluster structures and minimum cuts.
Finally, an efficiently computable metric is denoted by a check mark, otherwise its exact calculation is NP-hard or NP-complete. If calculation orders
of (approximating) algorithms are provided in the literature, they are stated in brackets. A non-calculable metric is not practical for robustness analysis of large graphs, but still can provide interesting ideas and a base for further heuristics.
2.6 List of Symbols
The following notations are used throughout the article:
|Graph with a set of vertices and set of edges .|
|Number of nodes in the graph.|
|Number of edges in the graph.|
|set of nodes that are neighbors of node in a simple graph|
|Arithmetical mean of for all nodes/edges/etc. .|
By substituting the following elements of simple graphs with their weighted and/or directed equivalents, various metrics can be enhanced with the additional information contained in more advanced graph models.
|Adjacency matrix; if an edge from node to exists, else.|
|Weight matrix; is the edge weight between nodes and .|
|(In-/out-) degree of node (Section 3.1).|
|(In-/out-) strength of node (Section 3.2).|
|Minimum hop distance from node to node (Section 6).|
|Minimum weighted distance from node to node (Section 6).|
The assessment of node adjacency is one of the first and easiest approaches for investigating network robustness. The intuition is that a vertex with many edges, i.e., a high-degree node, could be more important to the overall graph structure than one with a low degree. The average node degree is a first indicator for the overall network robustness, and the node-degree variance for the resilience against high-degree attacks.
3.1 Node Degree (Degree Centrality)
L, R simple, directed local impact & liability / global static ✓
Notably, in , the simple-graph version is used to build an attack algorithm that first deletes the nodes with the highest degrees. The authors show that this attack algorithm affects ER, KE and WS networks more than random node failure in terms of Global Network Efficiency (Section 6.2) and Number of Nodes in the Largest Component (Section 5.12). However, at each step, the removal of a different node than the one with the highest Node Degree could cause even larger global damage. Therefore, the Degree of a node has some, but also a limited, influence on global Disconnection and Transmission Speed Robustness, beyond assessing the local connectivity of a node to direct neighbors.
Hence, relying on the degree alone for estimating the importance of a node could be misleading. In, a high correlation between Node Degree and Betweenness Centrality (Section 7) was discovered at the AS-level of the Internet in particular for higher degree nodes, as well as in a WS, BA, ER, and clustered scale-free networks.
The global version of this metric is the Degree-Frequency Distribution, , with defined as the number of nodes with degree . Its codomain is . A classical result of  states that the Internet belongs to the class of scale-free networks with . This distribution indicates that few highly connected nodes (hubs) exist that are very important for the network connectivity. More precisely, the authors found for the Internet on the AS-level between 1997 and 1998, whereas for the router level from 1995, . However, in , is is challenged that the Internet exhibits a hub-like core. The authors claim that a more suitable Internet topology with the same , the HOT network, exhibits a rather different topology with hubs on the periphery and hence not as a crucial Backbone. Their removal would hence only have local effects.
L, R weighted (un)directed local impact / global static ✓
Strength is the adaptation of Node Degree for weighted graphs [27, 23]. Quantifying the amount of information that would have to be redirected in the case of a failure of node , it can be used as a first indicator for its impact on Transmission Speed and Traffic Robustness. Moreover, the largest-capacity Backbone nodes are indicated by highest Strength. Its global version  ranges from zero to one: , with defined as the set of nodes with strength . It gives an overview over the heterogeneity of local node importance. Similar to the Degree-Frequency Distribution, a more uniform Strength Distribution leads to a network that is more robust against attacks, but more vulnerable to random failures.
L, R simple (all) global static ✓
Entropy measures the homogeneity of the Degree-Frequency Distribution (Section 3.1)
The maximum of this metric is . Wang et al.  find that optimizing an SF-network’s Disconnection Robustness to random failures in terms of Percolation Threshold (Section 5.9) is equivalent to maximizing its Entropy – and thus homogeneity of – while maintaining the SF-structure and keeping the average Node Degree constant.
L, R simple (all) global static ✓
Skewness indicates a very preferential or heterogeneous network, a very uniform one. is the Skewness of a network with uniform degree . The lowest node rank is given to the node with the highest degree, to the one with the second highest, and so on. Each rank is assigned only once; if there is more than one node with the same degree, the order in which they receive the respective ranks is randomly chosen .
Ghedini and Ribeiro  show that a more homogeneous Degree-Frequency Distribution (Section 3.1), as measured by Skewness , contributes to higher Transmission Speed and Disconnection Robustness in terms of Global Network Efficiency (Section 6.2) and Number of Nodes in the Largest Component (Section 5.12) against highest-Degree attacks, but leaves the graph more vulnerable in this respect to random failures. In the preceding and current sections, contrasting conclusions about the impact of on Disconnection Robustness to random failures are drawn. It must be noted, though, that measures a worst-case scenario at high failure rates and thus a different facet of Disconnection Robustness than , which evaluates graph behavior at lower failure rates. In this light, the apparent conflict between the metrics is solved.
3.5 Vulnerability Function
L, R simple graph global static ✓
is the standard deviation of, and is the Degree-Frequency Distribution (Section 3.1). A high value of indicates a vulnerable network, a low value a robust one.
Similarly to Entropy (Section 3.3) and Skewness (Section 3.4), the Vulnerability Function assesses the impact of the homogeneity of on robustness, additionally taking the relation of numbers of nodes to edges into account – it is designed not to simply increase when edges are added. For graphs with the latter two parameters fixed, the Vulnerability Function thus shares the implications of the other two metrics on Disconnection and Transmission Speed Robustness.
3.6 Assortative Coefficient
L, R simple graph global static ✓
The Assortative Coefficient  calculates whether the network is assortative, , meaning that most nodes are connected to nodes with a similar degree, or disassortative, , as the Pearson correlation coefficient of node degrees at both ends of an edge. Similar to Average Neighbor Connectivity, this metric summarizes of the
by calculating the probability that a hub node is connected to another hub, but the information loss is even bigger. The advantage is the clear codomain, withindicating an assortative network, which is thus robust against failures and attacks, and a disassortative one. If , there is no correlation between the node degrees of an edge.
In , assortative networks are found to exhibit a higher Disconnection Robustness concerning the Number of Nodes in the Largest Component (Section 5.12) against both failures and attacks and the disassortativity of the Internet is explained by high-degree connectivity providers and low-degree clients. Furthermore, Liu et al.  find assortative networks to be more Disconnection Robust with respect to Natural Connectivity (Section 8.7) and stability of under attacks, but less so in terms of Algebraic Connectivity (Section 8.4). The latter metric is thus in contrast with the former two. As it only assesses how easily any set of nodes can be disconnected, irrespective of its size, more weight should be put on , which conveniently addresses this problem. Still, further analysis of these three metrics is necessary to finally decide if they indeed measure different facets of this robustness aspect. Only then, an unambiguous interpretation of the impact of on Disconnection Robustness would be possible.
3.7 Average Neighbor Connectivity
L, R (un)weighted (directed) global static ✓
Average Neighbor Connectivity calculates the average neighbor degree of -degree vertices , summarizing the Joint Degree Distribution , which measures the probability that a randomly chosen edge points from a - to a -degree node. . It thus assesses the level of network assortativity and its Disconnection Robustness as measured by Number of Nodes in the Largest Component (Section 5.12) in a more detailed but also more complicated way than Assortative Coefficient (Section 3.6). According to Costa et al. , it holds that for all if there is no correlation between and . If is increasing in , the investigated graph is assortative; if it is decreasing, the graph is disassortative.
For weighted networks, Barrat et al.  propose the following formulae:
The local weighted average of the nearest-neighbor degree is calculated according to the normalized weight of the connecting edges . Similarly, measures the effective affinity to connect with high-degree or low-degree neighbors, considering the magnitude of interactions. The weighted version of Average Neighbor Connectivity, , measures the average weighted neighbor degree of a degree- node. When applying both the unweighted and weighted versions of this metric to a weighted graph, then indicates that edges with larger weights point to neighbors with larger degree, while shows that such edges are connected to lower-degree neighbors. This procedure can thus help to identify whether hubs are located at the high-capacity meaning high-strength Internet Backbone core or at the periphery (Section 3.1).
3.8 Rich-Club Connectivity (Rich-Club Coefficient)
L, R – all – global static ✓
The Rich-Club Coefficient measures the interconnectedness of hubs by calculating the ratio of existing edges between nodes with a degree higher than to the maximum possible number of edges . Its codomain is with zero indicating no direct interconnections, and one for a fully connected cluster. Uncorrelated networks that are neither assortative nor disassortative usually have a non-zero value of . Therefore, has to be normalized by in order to assess with Rich-Club Connectivity if there is a rich-club behavior (), or not ().
The Internet graph investigated in  lacks rich-club ordering, which is explained by the Backbone hubs gaining their importance through high bandwidths and traffic capacities between each other rather than by high numbers of interconnections. Along with , it is suggested that hubs solely provide connectivity to local networks on the periphery without redundant connections between themselves. The weighted Rich-Club Coefficient thus measures the weighted interconnectedness of high-strength nodes,
with the same codomain as above and again indicating a rich club behavior as well as increased Transmission Speed and Traffic Robustness. For more details, see .
The adjacency metrics presented in this chapter are descriptive and provide a static snapshot of basic characteristics of a graph. The Degree-Frequency Distribution, its summarizing metrics Entropy, Skewness, and Vulnerability Function and Strength Distribution indicate whether a network is vulnerable to Disconnection through attacks or random failures and to Transmission Speed reduction due to attacks. In order to attain the same goal, the easy-to-handle Assortative Coefficient can be used, which also gives a first indicator whether the graph is assortative and thus more Disconnection robust against both failures and attacks. The Average Neighbor Connectivity metric uses a similar measuring approach, but provides more details about the network and has a counterpart for weighted graphs, which makes it very suitable for Internet robustness analysis. Another very interesting metric is the weighted Rich-Club Connectivity. It evaluates the Backbone structures that provide many alternative transmission ways if one of the most important nodes fails. Since this is the only metric measuring this aspect, it should be part of the methodological repertoire for robustness analysis.
Clustering metrics aim to provide a detailed overview of the community structure of the Internet in order to gain an understanding of its dynamical evolution and organization .
4.1 Clustering Coefficient (Transitivity)
L, R – all – local impact / global static ✓
The Clustering Coefficient (CC) is commonly applied to measure the interconnectedness of nodes with the same neighbor. This metric has many different versions. In all of them, the coefficient ranges from zero to one, with a high value indicating high clustering. The frequently used local version for simple graphs compares , defined as the number of edges among the neighbors of node , to the maximum possible number [125, 25]:
A high denotes a node against whose removal the network is Disconnection and Transmission Speed robust, as many alternative routes between its neighbors exist .
is not defined for nodes having degree of one, which is problematic as the global CC is defined as . In  the Internet is evaluated from 2002 to 2010 using AS-level graphs based on snapshots of BGP routing-table dumps. The authors discover that most structural metrics are time-invariant. An exception is , which decreases while maintaining its distribution, possibly due to a flattening of net growth of transit providers in 2005  as they are substituted by an increasing number of unobservable peer-to-peer links.
An enhanced version includes only nodes of degree higher than one :
Generally, many real-world networks exhibit a high global .
Another approach to deal with the bias problem uses the following definition: :
As this approach normalizes by the sum of the possible edges between the neighbors of a node, the nodes of degree zero or one can be considered without further precautions. It gives the same weight to each triangle in the network. As they are usually involved in a larger number of triangles than low-degree vertices, hubs tend to have a higher weight in the calculation than with the other two global versions that weigh each node equally. A comparison thus reveals whether clustering occurs relatively more on hubs (), or on low-degree vertices ().
In , the average for nodes with degree , , is found to be a decreasing function of vertex degree because of the disassortative nature of Internet graphs (Section 3.6), where high-degree vertices mainly interconnect subgraphs consisting of smaller-degree vertices with few inter-subgraph connections. It holds that . Ravasz and Barabási  find a hierarchical exponent on the AS-level and a independent of on the router-level.
In , a CC is introduced that is not correlated to the Assortative Coefficient:
where , in contrast to , takes into account that not all the excess edges are available at the neighbors of . is obtained by a rewiring algorithm. This metric has the following characteristics: If all neighbors of a vertex have degree one, the CC is undefined; holds for all ; and if all neighbors of have a degree larger than or equal to , it follows that . Also, two different global versions are proposed:
For disassortative networks such as the Internet, and give highly contrasting results, whereas and are similar, and does not depend as strongly on the node degrees but remains constant or decays logarithmically with increasing vertex degree.
Comparing these simple-graph CCs, a high or as such indicates a graph of greater Disconnection and Transmission Speed Robustness as more alternative routes with hardly lengthened paths exist around the failed node. An assessment whether clustering rather occurs on hubs or on low-degree vertices , has direct implications for Internet resilience: Doyle et al.  (contrasting Albert et al. ) state that the Internet Backbone is formed by a low-degree mesh graph, and the access networks by hub ring-, star- or tree-like graphs located at the network periphery . This implies that should hold. An increased clustering of low-degree vertices () enhances global Transmission Speed and Traffic Robustness of the network as a whole; locally though, a higher clustering of hubs () significantly enhances the Disconnection Robustness of single peripheral clusters and end-users. Generally, a node of high has a low local impact in terms of these aspects, while one of very low is a local bottleneck. If graphs of equal Degree-Frequency Distribution (Section 3.1) and varying or are investigated, though, the interpretation becomes more complex: The higher , the more intra-connected the clusters and the less inter-connected. Thus, Disconnection and Transmission Speed Robustness against the removal of inner cluster entities is increased, but decreased against faults of community-peripheral bridge entities. To determine the position of a node in a cluster, the Participation Coefficient (Section 4.5) can be used.
As and calculate how many of available neighbor ties are used for clustering, thus not as community-connecting bridges, its interpretation corresponds to the one of altered with constant distribution . Properly addressing this problem with one single metric, and are very suitable to discover a too-pronounced community structure that bears the risk of fragmenting Disconnections. It could also cause severe Transmission Speed delays and Overload increases in the whole network in worst-case scenarios of cutting crucial bridges as presented in Section 5.
In , the following local version is proposed for weighted networks:
where the global version is again obtained by averaging . Here, connections between nodes with high-weighted connections to node get a higher weight, regardless of their own weight. When comparing and , indicates a network where interconnected triplets are more likely to be formed by edges with larger weights. For , the opposite holds: clustering is less important for the traffic flow and thus also for network organization .
This metric is only able to assess the individual impact of node on Disconnection Robustness on a mere topological level. As the weight of neighbor-interconnecting edges is not taken into account, meaning the ability to redirect possibly large traffic flows through them when fails, it is not suitable for assessing local impacts on Transmission Speed and Traffic Robustness. A better approach in this sense is introduced in :
decreases smoothly with the weight of the neighbor-interconnecting edge, as opposed to , which first stays constant, but abruptly drops when the weight turns zero.
In , a weighted equivalent to the global is introduced:
The triplet value
can be defined as either the arithmetic mean of the weights of links that form the triplet – which is severely influenced by extreme values of weights – or as the less sensitive geometric mean, or the maximum respectively minimum value of edge weights. The latter two options, given two triplets with the same average weight, apply a higher/ lower value to the one with larger weight dispersion, thus makingwith the minimum definition of very useful for global Transmission Speed and Traffic Robustness.
Furthermore, the definition of Wassermann and Faust  is used to make eq. 28 applicable to directed graphs: the nodes , , form a triplet if an edge points from to and one from to , and a closed triplet if, additionally, a direct link from to exists, which preserves the path if fails. This definition can thus be used to measure all three Robustness Aspects on the AS level.
4.2 Edge Clustering Coefficient
L, R simple (all) local impact static ✓
In this definition of the Edge Clustering Coefficient , denotes the number of triangles the edge belongs to. According to Newman , the in the numerator is added in order to not overestimate the importance of edges that do not belong to triangles and connect to a low-degree vertex. is not defined for edges that connect degree-one nodes.
More complex loops, such as squares, can be also taken into account. Then, low values identify edges that connect different communities and are bottlenecks not only on a local scale as in eq. 29, with high impact on Disconnection and Transmission Speed Robustness. As stated in , this holds only for networks with a high average CC, which is the case for the Internet.
L, To, R simple (weighted) global static ✓
Determining the clusters or communities in large graphs is of particular significance for detecting vulnerable points and assessing the network structure. To this end, Modularity  measures the quality of a given global network division into communities, where is a matrix whose elements are defined as the fraction of all links in the graph that interconnect communities and . Consequently, gives the total fraction of inner-community edges. is the fraction of edges that connect to vertices in community . In a graph where edges connect vertices randomly without considering their community, holds, thus . Therefore, a low indicates a bad division into communities, and a graph where every community represents one disconnected cluster.
The following two algorithms are the most suitable out of the ones provided in [86, 88] for Internet graphs since they apply to real networks without homogeneous community sizes and knowledge of the number of communities, using minimal calculation time:
4.3.1 Modularity Matrix
is the Kronecker Delta. The entries of the eigenvector corresponding to the largest positive eigenvalue divide the nodes into a group with positive eigenvector values, and one with negative values. Existing edges between them are not removed as this would change the degrees of the nodes and hence bias further calculations. The calculation is repeated within the groups until theModularity Matrix eigenvalues are all zero or negative, which indicates that no further useful division is possible. The advantage of this algorithm compared to other bisection methods is that the sizes of the communities, into which the network finally splits, do not have to be homogenous or known. The disadvantage is that at any step only a division into exactly two communities is possible. As stated in , dividing a network into two parts and then dividing one of those again does not yield the best possible result for a division into three communities.
4.3.2 Edge Betweenness Partitioning Algorithm
The Edge Betweenness Centrality metric (Section 7) used in this algorithm is the ratio of shortest paths between two nodes that pass through the considered edge, summed up over all node pairs. The idea behind it is that communities are characterized by having scarce inter-community edges that act as bottlenecks, obtaining high Edge Betweenness Centrality values [86, 46]. The algorithm repeatedly deletes the highest valued links and recalculates the Modularity and the Edge Betweenness Centrality after each removal. Finally, the state with the highest Modularity is chosen.
The implementation of this algorithm does not require homogeneity of community sizes or knowledge about their number, and as opposed to the Modularity Matrix, the graph can be split into any number of communities. Still, the whole process is very costly since after every edge removal, the Edge Betweenness Centrality of each remaining edge must be calculated again, which takes every time. In total, this can take on a sparse graph or on a non-sparse graph in the worst case. In order to reduce the calculation time and to introduce a stochastic element, Tyler et al.  propose to only sum over a random subset of vertices, giving the partial Edge Betweenness Centrality scores for all edges as a Monte Carlo estimation, which provides good results with reasonably small sample sizes.
4.4 Z-Score of the Within Module-Degree
L, R simple (all) local impact static ✓
Z-Score of the Within Module-Degree  calculates the rank of a node within its community, which is higher for a larger . is the number of connections that node has within its community, is the average number of intra-community connections of the nodes in , and the corresponding standard deviation. This is a normalized version of the Node Degree (Section 3.1) if just the community as a subgraph is considered. This approach of finding inner-community hubs is sufficient by itself for detecting vertices of high impact on inner-cluster Disconnection and Transmission Speed Robustness, as by definition inside of communities no bottlenecks exist.
4.5 Participation Coefficient
L, To, R simple (weighted) local impact & liability static ✓
is the number of links from node to community . The Participation Coefficient  measures how equally distributed the connections of are among all communities. indicates that is only connected to nodes in its own community, while
shows that its edges are uniformly distributed among all communities.serves as an indicator of the impact of a node, as a bridge, on the Robustness against community Disconnection or increased Transmission Speed. Furthermore, the removal of a small share of neighbors of a node with high can degrade the node’s Transmission Speed severely, if the removed neighbors belong to the same community.
Knowledge about the community structure of a graph provides insights into its hierarchical characteristics and gives first indicators of bottlenecks. The algorithms presented with Modularity are useful tools for community detection, and a resulting high value of the metric itself or of the CC of available neighbor edges, , serves as an indicator of a too pronounced clustering structure that is prone to severe Disconnections, Transmission Speed decreases and Overloads. In the presence of such a structure, the role of a node inside its communities as determined by Z-Score of the Within Module-Degree and Participation Coefficient is crucial; especially the latter detects vulnerable points.
In contrast to , the local by itself measures the ability to locally redirect information flows if node fails, thus its local impact on Disconnection and Transmission Speed Robustness.
The importance of clustering for network flows can be estimated by the comparison of and . As indicators for Transmission Speed and Traffic Robustness, in a weighted graph serves best by accounting for weights of neighbor-interconnecting edges; in a weighted directed network, with the minimum-value triplet definition can be used.
Clustering in a network contributes to its local robustness but entails the presence of fewer bridges between these clusters, and hence a reduced global Disconnection Robustness. In the subsequent sections, the fundamental and NP-complete task  of finding a graph-separating Minimum Cut Set and approximating bi-partitioning algorithms are discussed. After this worst-case approach, metrics to assess Connectivity under multiple random failures are presented.
5.1 Vertex-, Edge- & Conditional Connectivity (Cohesion, Adhesion & P-Connectivity)
To, R simple graph global worst-case NP-complete
The Vertex- and Edge-Connectivity of a graph are defined as the minimum number of nodes and edges whose removal disconnects the graph, respectively [18, 131]. A severe flaw is that the resulting component sizes are not accounted for, making it unsuitable for robustness analysis: On sparse graphs such as an Internet graph containing stub nodes, the return will equal the minimum Node Degree. On networks with a high minimum Node Degree and severe bottlenecks, this problem is NP-complete .
A more useful, though also NP-complete concept is the Conditional Connectivity . It is the smallest number of vertices (links) whose removal disconnects the graph, with every component having property . In order to address the above-mentioned flaws, can be set to the minimum of the resulting component sizes. Further metrics that address this problem are presented in the following sections.
To, R simple graph global worst-case ✓
Here, , and are three non-empty sets of nodes that together form the entire graph G. is the set of vertices that, if deleted, causes a separation from and . Therefore, a low value for indicates a network of low Disconnection Robustness, and a high value a robust one.
Calculating the minimum of this metric is NP-hard . In , this problem is addressed by considering every bi-partition of the graph into and , defining as node subset in adjacent to , and analogously, and taking
Interestingly, the inverse of this metric is the minimum possible average Betweenness Centrality (Section 7) of the nodes in , as every shortest path from to must pass through .
5.3 Cheeger Constant (Isoperimetric Number)
To, R simple graph global worst-case NP-complete
This metric  is similar to the Sparsity bisection approach, but focuses on edges instead of nodes. It calculates the minimum ratio of edge cuts necessary for graph bisection to nodes in the resulting smaller cluster. Small cut sets and large impacts in terms of Reachability (Section 5.12) are rewarded. holds for a disconnected graph, and for a fully connected one.
In , bounds for the NP-complete Cheeger Constant are derived using the calculable Algebraic Connectivity (Section 8.4): . However, they are not very tight and thus of little use due to the scale-free nature of the Internet. With the subsequently presented Network Partitioning Algorithm, the Cheeger Constant can be approximated by setting the balancing criterion to , repeatedly running the algorithm for , respectively, and taking the minimum resulting .
5.4 Minimum m-Degree
To, R simple graph global worst-case NP-C. &
Very similar to the Cheeger Constant (Section 5.3), the Minimum m-Degree denotes the smallest number of edges that must be removed to split the network into two components, one of which contains exactly nodes . Since for estimating Disconnection Robustness the exact component size is not crucial, this restriction can be relaxed as follows.
5.4.1 Network Partitioning Algorithm
This algorithm finds a graph bi-partitioning Minimum Edge Cut Set of size with splitting ratios of approximately . First, a balancing criterion of nodes is set, by which each partition may deviate at most from the splitting ratio . A separation then bisects the graph, denoting the resulting inter-partition links as cuts. Afterward, the following pass is executed: The gain of all nodes, initially denoted as unlocked, is calculated as the decrease in the number of cuts if the node is moved to its opposite partition. The largest (possibly negative) gain node is moved, as long as the balancing criterion is not violated, and denoted as locked. All node gains are recalculated, and the process is repeated until either all nodes are locked or the balancing criterion prevents further moves. The split with the fewest cuts is executed. Then, all nodes are unlocked again, and the procedure is repeated until the cut set cannot be further reduced. The computation takes per pass and the convergence occurs quite fast. In the worst case, passes are needed .
Wang et al.  studied Internet models both on AS and router levels and find that approximately 6.5% of the links have to be removed in order to obtain a splitting ratio of 50%. For a splitting ratio up to 30%, the disconnected parts are highly fragmented pieces in the router model, whereas in the AS model a large cluster contains 20-40% of disconnected nodes. Afterward, in both models, the vertices glue together, which enables them to communicate. The largest cluster in a scale-free network separated this way remains scale-free and functions well.
Furthermore, it is analyzed if the Minimum Edge Cut Set for randomly distributed nodes, which can be a connected cluster, can be significantly reduced by this algorithm compared to just cutting off all the links adjacent to the target nodes. The results show that this is only possible if the target nodes are connected and only at the expense of cutting off many non-target nodes with them.
5.5 Ratio of Disruption
To, R simple graph global worst-case NP-complete
Another metric that assesses edge cuts is the Ratio of Disruption where denotes the floor function :
Since this metric takes the size of the other (in most cases larger) component into account, it is more likely to result in more equally-sized graph cuts, but at the expense of bigger cut-set sizes than the Cheeger Constant. This metric is thus designed to find cut-sets which have considerably more severe impacts on a graph’s Reachability (Section 5.12). Unfortunately, it is NP-complete and cannot be conveniently approximated by the Network Partitioning Algorithm (Section 5.4.1). Therefore, this important concept is currently not applicable to large Internet graphs.
5.6 Local Delay Resilience (Resilience)
To, R, Tr simple, directed local liability / global worst-case NP-hard ()
is the subgraph induced by nodes within the -hop environment of node . The local Local Delay Resilience measures the size of its Minimum Cut Set of splitting ratio , assessing its proneness to local Disconnection or severe Transmission delays when entities around it fail. The global Local Delay Resilience is a function of the hop-radius , but as the number of vertices within it is higher in graphs with a high Expansion (Section 6.5), is presented as a function of , denoting the average number of vertices within hops. Karypis and Kumar  present a partitioning heuristics for this NP-hard problem.
In , the investigated AS and router-level graphs are stated to have a high Local Delay Resilience for any . Their directed adaptation, which takes policy-based routing into account, yields substantially lower values than the simple one. Now, is induced by nodes of policy-compliant paths no longer than hops. Thus, in AS-level graphs, only paths that do not violate provider-customer relationships are considered; in order to determine a router-level policy path, first the corresponding AS-level path is computed and then shortest paths within the ASes.
The name of this metric given above is an adapted version of the one found in the literature. It needs to be emphasized that it only measures one aspect of Resilience, and could by no means aggregate all facets of this complex concept (Section 2.3) into one single-valued number.
5.7 Toughness, Integrity, Scattering Number
To, R simple graph global worst-case N/A NP-complete
The previously presented Connectivity metrics concentrate on detecting worst-case bisections, but do not conveniently assess the damage of partitioning the network into more than two components. Toughness , Integrity  and Scattering Number  aim to fill this gap by finding cut sets (either the set of removed nodes or edges ), which minimize or maximize certain parameters.
They account for important characteristics of the disconnected graph, namely cut set size , number of nodes in the largest component and the number of resulting components , but every metric omits one of these. An inner similarity of these metrics can be ascertained by the results they provide: Knowing the Scattering Number and some basic information about the graph, a lower or upper bound for the other metrics can be derived .
To, R simple graph global worst-case N/A NP-complete
As opposed to the metrics presented in the previous section, Tenacity , Edge Tenacity and Mixed Tenacity  take all of the three important parameters into account for evaluating the Disconnection Robustness of a graph.
Here, and represent the sets of removed nodes and edges, respectively, and the resulting number of nodes and edges in the largest component of graph , and its number of components.
A study on the behavior of Tenacity, or an approximating algorithm, applied to Internet graphs is currently still due. As stated in , a large set of graph classes exists that are edge-tenacious, meaning that the minimum of the term measured by is obtained by cutting the entire set of edges in the graph. As the same presumably holds for Internet-representing graphs, this metric would become useless.
5.9 Percolation Threshold
To, R simple graph global failures ✓
Percolation Threshold measures the threshold of uniform node failure probability after which the network disintegrates, meaning that no giant component of graph size order exists . For SF-networks with exponent like the Internet, for infinitely large graphs. For finite Internet networks, the Largest Component Size (Section 5.12) decreases with increasing node failure probability , but the largest component persists for a of nearly 100%.
5.10 Reliability Polynomial
To, R simple graph global failures ✓
The K-Reliability Polynomial measures the probability that, given a uniform edge-reliability and thus edge failure probability , all nodes of a subset are connected. The number of -connecting subgraphs with edges is denoted by . This is the most general definition, denoted K-Reliability. If for all , and holds, Page and Perry  rank the importance of edge higher than the one of edge based on purely topological characteristics. Here, corresponds to graph , where the nodes adjacent to edge are contracted to being one vertex. The 2-Terminal Reliability, , calculates the probability that a message can be transmitted between both members of .
By calculating the All-Terminal Reliability, , before and after removing single entities, their importance to the topological structure can be ranked for a given . The All-Terminal Reliability contains the size of minimum edge cutsets, which disconnect the graph (corresponding to Edge-Connectivity, Section 5.1) as well as their number. According to the authors, the Number of Spanning Trees (Section 8.6) is correlated to All-Terminal Reliability. The algorithm provided in  is not practical for large Internet graphs, but can be used to measure the Disconnection Robustness of its Backbone.
Interestingly, the coefficients of the polynomial indicate the Number of Spanning Trees, Size of Smallest Edge-Cutset and Number of Smallest Edge-Cutsets. The latter two metrics could be useful for analysing the Backbone, if the Size is not determined by the node with minimum degree. An algorithm for Reliability Polynomial calculation is presented in .
5.11 Partition Resilience Factor (Resilience Factor)
To, R simple graph global failures ✓
Again, an adapted metric name is used here to distinguish it from the much broader Resilience concept. The Partial -Connectivity denotes the ratio of node failure sets of size , which disconnect the graph. A low Partition Resilience Factor thus indicates a well-connected network . It attempts to capture subtle Disconnection Robustness differences, but it does not consider the severity of disconnection measured by Reachability (Section 5.12), and is computationally very costly. Only an application to formerly identified Backbone graphs seems appropriate due to much smaller network sizes and the high importance of every single vertex.
5.12 Analysis of Disconnected Components & Reachability (Flow Robustness)
To, R simple graph global dynamic different codomains ✓
In order to assess the impact of an attack or failure, it is necessary to take some characteristics of the disconnected components into account. According to Sun et al. , these could be the Total Number of Isolated Components , the Fraction of Nodes in the Largest Component and the Average Size of Isolated Components . In  it is stated that the latter is not suitable for an application to the Internet as the disconnected component size seems to be heavy-tailed, always resulting in a low Average Size of Isolated Components. In , investigating an AS-level network measured in 2000, less than 3% of the highest-degree nodes have to be removed to reach .
For large graphs, the Distribution of Component Class Frequency  assigns disconnected components of similar size to disjunct classes , whose numbers indicate the decimal logarithm of the upper bound size of the class, and counts the nodes in each of them. Another related metric is the Distribution of the Relative Number of Nodes per Class , which is calculated as . It gives the probability that a node can communicate with at most a given number of nodes for a given percentage of removed nodes.
All these aspects are covered by Reachability (or Flow Robustness) , being the reason why Disconnection Robustness is best evaluated in terms of the resulting decrease in . It is defined as , with if a path exists between node and and otherwise . It calculates the fraction of node pairs which can still communicate, thus . Confirming the findings in (Section 3.1), a positive correlation is found between the exponent of an SF network and its Disconnection Robustness in terms of under node attacks .
In  the of the highest level of a multilevel AS graph after faults on lower levels is investigated. The multilevel graph consists of levels of simple graphs; for any pair of them, the set of all nodes in the higher level is a subset of the node set in the lower level. A connecting path between two nodes on a higher level can only exist if such a path also exists on the lower level. A partition of the tier-1 ISPs is found to be unlikely when the logical link layer is attacked. An increase in , a switch from adaptive to non-adaptive attacks, as well as from static routing on the original shortest-path to perfect dynamic routing resulted in a significantly higher Reachability.
The fundamental task of calculating exact Minimum Cuts is NP-complete. Cohesion and Adhesion are not useful, as resulting component sizes are not considered (contrary to the other worst-case cut finders). A fundamental drawback of those is that suitable approximating algorithms, exist so far only for bisections: Sparsity and Network Partitioning Algorithm are both apt for global Disconnection Robustness analysis. Yet, their calculation requires predetermined sizes of the resulting components, contrary to the NP-complete Cheeger Constant, which could be conventiently approximated by the repeated execution of the Network Partitioning Algorithm. For evaluating general cuts, Reachability in combination with the fraction of removed entities is a convenient tool, as it implicitly accounts for number and sizes of all components. Nodes close to local bottlenecks are found by Local Delay Resilience, which needs little calculation time relying only on subgraph .
Of the measures presented in Sections 5.7 and 5.8, only the Tenacity metrics consider all features of graph partitions of more than two components. But as they probably yield trivial results Internet graphs, and as all metrics in these sections are NP-complete, none is suitable for robustness assessment.
Percolation Threshold suitably assesses minimum Connectivity requirements after multiple failures, namely, at least a large part of the network still is connected and functioning. Reliability Polynomial and Partition Resilience Factor are more precise failure evaluation measures, but only useful for application to the Backbone due to their computational cost.
Distance metrics consider the path length between node pairs. Its simple or directed version is defined by the minimum number of hops it takes from node to . Its weighted counterpart, , further takes the speed and capacities of the connections into account.
6.1 Average Shortest Path Length
(Closeness Centrality/ Hop Count/ Average Diameter/ Network Diameter/ Geodesic Distance)
Tr simple (all) local impact / global static / ✓( / )
The local Average Shortest Path Length (ASPL) is useful for finding central nodes that can play a significant role in forwarding information. The global ASPL characterizes the average Transmission Speed with which node pairs communicate . Its degradation due to Challenges hence serves as an indicator for comparing Transmission Speed Robustness of networks that display differing characteristics as measured by other metrics. An implementation of Dijkstra’s algorithm is used in .
In SF networks, ASPL tends to remain unchanged even when up to 5% of the nodes fail . However, after a certain threshold of attacks, ASPL is no longer suitable as it approaches when a node is disconnected. To address this issue, only or just nodes in the giant connected component  could be taken into account. These versions, however, can obtain low values for disintegrated graphs, suggesting one with redundant connections.
Diameter-Inverse-K (DIK) addresses this problem with the infinite paths excluding :
where is the number of node pairs and the number of pairs connected by a path. It increases continuously as the graph disconnects.
As Distance Distribution Width, the standard deviation of path lengths , is small, is only slightly smaller than the Diameter [32, 90], whose codomain is if infinite paths are taken into account, and otherwise.
Due to these findings, , and overlap considerably and can be used interchangeably in connected graphs, while is preferable in disconnected ones.
The formulae above refer to simple networks. For weighted networks, the following adaptation proposed in  is applicable to every metric containing :
The weighted shortest path between two nodes is the path for which the sum of the inverted weights of the traversed edges is minimal, and is independent of the number of traversed edges, which can be very useful as the way that a data package takes depends highly on the speed and capacities of the connections.
6.2 Global Network Efficiency (Harmonic Centrality / Closeness Centrality)
Tr simple (all) local impact / global static ✓( / )
Similar as with the metrics presented in the previous section, the degradation of Global Network Efficiency indicates the Transmission Speed Robustness of a network under Challenges. By summing up the reciprocal of the distances, the problem of disconnected graphs is solved [105, 64]: For the local version, only holds for a totally disconnected node , while indicates that node