1 Introduction
In recent times, the possibility of accessing, handling and mining largescale networks datasets has revamped the interest in their investigation and theoretical characterization along with the definition of new modeling frameworks. In particular, mapping projects of the World Wide Web (WWW) and the physical Internet offered the first chance to study topology and traffic of largescale networks. Gradually other studies followed describing population networks of practical interest in social science, critical infrastructures and epidemiology [Barabasi:2000, Amaral:2000, mdbook, psvbook]. The study of large scale networks, however, faces us with an array of new challenges. The definitions of centrality, hierarchies and structural organizations are hindered by the large size of these networks and the complex interplay of connectivity patterns, traffic flows and geographical, social and economical attributes characterizing their basic elements. In this context, a large research effort is devoted to provide effective visualization and analysis tools able to cope with graphs whose size may easily reach millions of vertices.
In this paper, we propose a visualization algorithm based on the core decomposition able to uncover in a twodimensional layout several topological and hierarchical properties of large scale networks. The core decomposition [Batagelj02] consists in identifying particular subsets of the graph, called cores, each one obtained by recursively removing all the vertices of degree smaller than , until the degree of all remaining vertices is larger than or equal to . Larger values of “coreness” clearly correspond to vertices with larger degree and more central position in the network’s structure.
When applied to the graphical analysis of real and computergenerated networks, this visualization tool allows the identification of networks’ fingerprints, according to properties such as hierarchical arrangement, degree correlations and centrality. The distinction between networks with seemingly similar properties is achieved by inspecting the different layouts generated by the visualization algorithm. In addition, the running time of the algorithm grows only linearly with the size of the network, granting the scalability needed for the visualization of very large networks. The proposed visualization algorithm appears therefore as a convenient method for the general analysis of large scale complex networks and the study of their architecture. The presented visualization algorithm is publicly available [LANETVI].
The paper is organized as follows: after a brief survey on core studies (section 2), the basic definitions are introduced in section 3; the graphical algorithms are exposed in section 4 along with the basic features of the visualization layout. Section 5 shows how the visualizations obtained with the present algorithm may be used for network fingerprinting, while section 6 is devoted to the application of the algorithm to the visualization of various real and computergenerated networks.
2 Related work
While a large number of algorithms aimed at the visualization of large scale networks have been developed (e.g., see [COSIN]), only a few consider explicitly the core decomposition. Vladimir Batagelj et al. [Batagelj99] studied the cores decomposition applied to visualization problems, introducing some graphical tools to analyse the cores, mainly based on the visualization of the adjacency matrix of certain cores. To the best of our knowledge, the algorithm presented by Baur et al. in the paper “Drawing the AS Graph in 2.5 Dimensions” [Baur04], is the only one completely based on a core analysis and directly targeted at the study of large information networks. This algorithm uses a spectral layout to place vertices having the largest coreness. A combination of barycentric and iteratively directedforces allows to place the vertices of each shell, in decreasing order. Finally, the network is drawn in three dimensions, using the axis to place each coreness set in a distinct horizontal layer. It is important to stress that the spectral layout is not able to distinguish two or more disconnected components. The algorithm by Baur et al. is also tuned for representing AS graphs and its total complexity depends on the size of the highest core (see [brandes03] for more details on spectral layout), making the computation time of this proposal largely variable. In this respect, the algorithm presented here is considerably different in that it can represent networks in which cores are composed by several connected components. Another difference is that representations in 2D are more suited for information visualization than other representations (see [2Dvs3D] and references therein). Finally, the algorithm parameters can be universally defined (see section 6), yielding a fast and general tool for analyzing all types of networks.
It is interesting to note that the notion of cores has been recently used in biologically related contexts, where it was applied to the analysis of protein interaction networks [bader03] or in the prediction of protein functions [gnomeinfo03, Wuchty05]. A further interesting application in the area of networking has been provided by Gaertler et al. [Gaertler04], where the core decomposition is used for filtering out peripheral Autonomous Systems (ASes) in the case of Internet maps.
3 core decomposition: main definitions
Let us consider a graph of vertices and edges; a core is defined as follows [Batagelj02]:
A subgraph induced by the set is a core or a core of order iff , and H is the maximum subgraph with this property.
A core of can therefore be obtained by recursively removing all the vertices of degree less than , until all vertices in the remaining graph have at least degree .
Furthermore, we will use the following definitions:
A vertex has coreness if it belongs to the core but not to core. We denote by the coreness of vertex .
A shell is composed by all the vertices whose coreness is . The maximum value such that is not empty is denoted . The core is thus the union of all shells with .
Each connected set of vertices having the same coreness is a cluster . Each shell is thus composed by clusters , such that , where is the number of clusters in .
In Fig.1 we report a simple illustration of kcore decomposition of a connected graph and its visual rendering. Every vertex of a connected graph belongs to the core. In Fig.1, we have highlighted the different cores using closed lines of different types. A dashed line encloses all the vertices in the core (the entire graph).Then, all vertices of degree are recursively cut out. In Fig.1 all these vertices are colored in blue. The other vertices maintain a degree also after the pruning of the blue ones, therefore they are not eliminated. The remaining vertices form the core, enclosed by a dotted line. Further pruning allows to identify the innermost set of vertices, the core. One can check that all red vertices in Fig.1 have internal degree (i.e. between red vertices) at least . This core is highlighted by a dashdotted line. This simple process and its visual rationalization is at the basis of the construction of our visualizations algorithm and layout.
4 Graphical representation
The visualization algorithm we propose places vertices in dimensions, the position of each vertex depending on its coreness and on the coreness of its neighbors. A color code allows for the identification of core numbers, while the vertex’s original degree is provided by its size that depends logarithmically on the degree. For the sake of clarity, our algorithm represents a small percentage of the edges, chosen uniformly at random. As mentioned, a central role in our visualization method is played by multicomponents representation of cores. In the most general situation, indeed, the recursive removal of vertices having degree less than a given can break the original network into various connected components, each of which might even be once again broken by the subsequent decomposition. Our method takes into account this possibility, however we will first present the algorithm in the simplified case (Table 1), in which none of the cores is fragmented. Then, this algorithm will be used as a subroutine for treating the general case (Table 2).
4.1 Drawing algorithm for cores with single connected component
The network under study is represented by a graph , where is the set of vertices and is the set of links.
core decomposition. The coreness of each vertex is computed (according to the procedure described in section 3
) and stored in a vector
, along with the shells and the maximum coreness value . Each shell is then decomposed into clusters of connected vertices, and each vertex is labeled by its coreness and by a number representing the cluster it belongs to.The two dimensional graphical layout. The visualization is obtained assigning to each vertex a couple of polar coordinates (): the radius is a function of the coreness of the vertex and of its neighbors; the angle depends on the cluster number . In this way, shells are displayed as layers with the form of circular shells, the innermost one corresponding to the set of vertices with highest coreness. A vertex belongs to the layer from the center.
More precisely, is computed according to the following formula:
(1) 
is the set of neighbors of having coreness larger or equal to . The parameter controls the possibility of rings overlapping, and is one of the only three external parameters required to tune image’s rendering.
Inside a given shell, the angle of a vertex is computed as follow:
(2) 
where and are respectively the cluster and shell the vertex belongs to, N
is a normal distribution of mean
and width . Since we are interested in distinguishing different clusters in the same shell, the first term on the right side of Eq. 2, referring to clusters with , allows to allocate a correct partition of the angular sector to each cluster. The second term on the right side of Eq. 2, on the other hand, specifies a random position for the vertex in the sector assigned to the cluster .Colors and size of vertices. Colors are assigned according to the coreness: vertices with coreness are violet, and the maximum coreness vertices are red, following the rainbow color scale. Finally, the diameter of each vertex corresponds to the logarithm of its degree, giving a further information on vertex’s properties. Note that the vertices with largest coreness are placed uniformly in a disk of radius , which is the unit length ( equals for this reduced algorithm).
The complete algorithm is presented in Table 1. In particular, vector collects the cluster numbers of all vertices, and table contains the following pair of elements, indexed by the coreness and cluster label
(3) 
These input quantities, used in Eq. 2, can be computed during the core decomposition, when the cluster labels are assigned.
4.2 Extended algorithm using cores components
The algorithm presented in the previous section can be used as the basic routine to define an extended algorithm aimed at the visualization of networks for which some cores are fragmented; i.e. made by more than one connected component. This issue is solved by assigning to each connected component of a core a center and a size, which depends on the relative sizes of the various components. Larger components are put closer to the global center of the representation (which has Cartesian coordinates ), and have larger sizes.
The algorithm begins with the center at the origin . Whenever a connected component of a core, whose center had coordinates , is broken into several components by removing all vertices of degree , i.e. by applying the next decomposition step, a new center is computed for each new component. The center of the component has coordinates , defined by
(4) 
where scales the distance between components, is the maximum coreness and is the core number of component (the components are numbered by in an arbitrary order), is the unit length of its parent component, and are the radial and angular coordinates of the new center with respect to the parent center . We define and as follows:
(5) 
where is the set of vertices in the component , is the sum of the sizes of all components having the same parent component. In this way, larger components will be closer to the original parent component’s center .
The angle has two contributions. The initial angle is chosen uniformly at random^{1}^{1}1Note that if the is fixed, all the centers of the various components are aligned in the final representation., while the angle sector is the sum of component angles whose number is less than or equal to the actual component number .
Finally, the unit length of a component is computed as
(6) 
where is the unit length of its parent component. Larger unit length and size are therefore attributed to larger components.
For each vertex , radial and angular coordinates are computed by equations 1 and 2 as in the previous algorithm. These coordinates are then considered as relative to the center of the component to which belongs. The position of is thus given by
(7) 
where is a parameter controlling the component’s diameter.
The global algorithm is formally presented in Table 2. The main loop is composed by the following functions. First, the function {make_core } recursively removes all vertices of degree , obtaining the core, and stores into the coreness of the removed vertices. The boolean variable is set to if the core is empty, otherwise it is set to . The function { compute_clusters } operates the decomposition of the shell into clusters, storing for each vertex the cluster label into the vector , and filling table (see Eq. 3). The possible decomposition of the core into connected components is determined by function { compute_components }, that also collects into a vector the number of vertices contained in each component. At the following step, functions {compute_origin_coordinates_cmp } and {compute_unit_size_cmp } get, respectively, the center and size of each component of the core, gathering them in vectors , and . Finally, the coordinates of each vertex are computed and stored in the vectors and .
Algorithm complexity. The core decomposition can be computed using the algorithm of Batagelj and Zaversnik [Batagelj03]. Two steps are necessary to perform the core decomposition of a graph. First a list of the vertices with their respective neighbors is prepared. The recursive pruning algorithm is applied. Building the list of vertices with their degree takes a time . Starting from the lowest degree value , all the vertices of degree equal to are then recursively cut out. Pruning a neighbor of a vertex of degree means that the degree of is decreased to , so that is subsequently pruned as well. This is what is meant by the expression “recursively cutting out”. The first shell (of coreness ) contains all vertices removed during this process. When the remaining graph does not contain any vertex of degree , the algorithm repeats the procedure by recursively removing vertices of degree , thus constructing the shell of coreness . This process is repeated until no vertices are left, obtaining in this way the successive shells. The construction of the shells takes a time time (where is the number of edges), because removing a vertex implies cutting the edges between this vertex and its neighbors. The building of all coreness sets thus implies that all edges are removed one after the other in the process. In summary, the total time to perform the decomposition is . In order to build the clusters, each vertex should verify the coreness of its neighbors, which takes steps in the worst case. Finally, the total time complexity is for a general graph. This makes the algorithm very efficient for sparse graphs where is of order .
4.3 Basic features of the visualization’s layout
The main features of the layout’s structure obtained with the above algorithms are visible in Fig.2 where, for the sake of simplicity, we do not show any edge. The leftmost panel displays the case in which all cores have a single component, while in the rightmost one an example of core fragmentation is reported. Indeed, it is possible that, during the pruning procedure, the remaining nodes forming a core do not belong to the same connected component. When such a fragmentation occurs, the algorithm computes the multiple components of the core and displays all of them in a coherent way.
The visualization’s layout is twodimensional, composed of a series of concentric circular shells (see the five different shells in Fig.2).
Each shell corresponds to a single coreness value and all vertices in it are therefore drawn with the same color. A color scale allows to distinguish different coreness values: in the layouts, as in Fig.2, the violet is used for the minimum value of coreness , then nuances of blue, green and yellow compose a graduated scale for higher and higher coreness values up to the maximum value that is colored in red.
The diameter of each shell depends on the coreness value , and is proportional to (In Fig.2, the position of each shell is identified by a circle having the corresponding diameter). The presence of a trivial order relation in the coreness values ensures that all shells are placed in a concentric arrangement. On the other hand, when a core is fragmented in two or more components, the diameters of the different components depend also on the relative number of vertices belonging to each of them, i.e. the fraction between the number of vertices belonging to that component and the total number of vertices in that coreness set. This is a very important information, providing a way to distinguish between multiple components at a given coreness value. Looking at the two central components for high coreness values in Fig.2 (right), we immediately realize that the bigger one contains a larger fraction of vertices.
Finally, the size of each node is proportional to the original degree of that vertex; we use a logarithmic scale for the size of the drawn bullets.
5 Network fingerprinting
The core decomposition peels the network layer by layer, revealing the structure of the different shells from the outmost one to the more internal ones. The algorithm provides a direct way to distinguish the network’s different hierarchies and structural organization by means of some simple quantities: the radial width of the shells, the presence and size of clusters of vertices in the shells, the correlations between degree and coreness, the distribution of the edges interconnecting vertices of different shells, etc. The following features are useful to extract this structural information out of the visualization. We also highlight the role of the parameters , and of the visualization algorithms in helping to determine the structural characteristics of the visualized network.
1) Shells Width: In the graph representations the width can change considerably from shell to shell. The thickness of a shell depends on the coreness properties of the neighbors of the vertices in the corresponding coreness set. For a given shelldiameter (corresponding to the black circle in the median position of shells in Fig.3), each vertex can be placed more internal or more external with respect to this reference line. Nodes with more neighbors in higher coreness sets are closer to the center and viceversa, as shown in Fig.3. Node is more internal than node because it has three edges towards higher coreness nodes compared to the single edge emerging from towards inner shells. The maximum thickness of the shells is controlled by the parameter (Eq. 1).
2) Shell Clusters: The angular distribution of vertices in the shells is not completely homogeneous. Fig.3 shows that clusters of vertices can be observed. The idea is that of grouping together all nodes of the same coreness set that are directly linked in the original graph and of representing them close one to another in the shell. Thus, a shell is divided in many angular sectors, each one containing a cluster of vertices. This feature allows to figure out at a glance if the coreness sets are composed of a single large connected component rather than divided into many small clusters, or even if there are isolated vertices (i.e. disconnected from all other nodes in the shell, not from the rest of the core!).
3) DegreeCoreness Correlation: Another property that can be studied from the obtained layouts is the correlation between the degree of the nodes and the coreness value. In fact, both quantities are centrality measures and the presence or the absence of correlations between them is a very important feature characterizing a network’s topology. The nodes displayed in the most internal shells are those forming the central core of the network; the presence of degreecoreness correlations then corresponds to the fact that the central nodes are most likely highdegree hubs of the network. This effect is indeed observed in many real communication networks with a clear hierarchical structure, as the Internet at the Autonomous System level or the World Wide Airtransportation network. On the contrary, the presence of hubs in external shells is typical of networks without a clear global hierarchical structure as the WorldWide Web or the Internet Router Level. In this case, emerging starlike configurations appear with high degree vertices connected only to very low degree vertices. These vertices are rapidly pruned out in the kcore decomposition even if they have a very high degree, leading to the presence of local hub in the external kshells, as in Fig. 4.
4) Edges: The visualization shows only a homogeneously randomly sampled fraction of the edges. We can tune the percentage of drawn edges in order to get the better tradeoff between the clarity of visualization and the necessity of giving information on the way the nodes are mainly connected. Edgereduction techniques can be implemented to improve the algorithm’s capacity in representing edges; however, a homogeneous sampling does not alter the extraction of topological information, ensuring a low computational cost. Finally, the two halves of each edge are colored with the color of the corresponding extremities to make more evident the connection among vertices in different shells.
5) Disconnected components: The fragmentation of any given kshell in two or more disconnected components is represented by the presence of a corresponding number of circular shells with different centers (Fig. 2). The diameter of these circles is related with the number of nodes of each component and modulated by the parameter (Eq. 7). The distance between components is controlled by the parameter (Eq. 4).
In summary, the proposed algorithm makes possible a direct, visual investigation of a series of properties: hierarchical structures of networks, connectivity and clustering properties inside a given shell; relations and interconnectivity between different levels of the hierarchy, correlations between degree and coreness, i.e. between different measures of centrality.
6 Results from computergenerated and real networks
In the following we provide specific examples in which the use of the proposed visualization algorithm readily allows the identification of characteristic fingerprints and hierarchies in a set of real and computer generated networks. In particular, the visualization allows to identify the lack of hierarchy and structure of the basic ErdösRényi random graph. Similarly the time correlations present in the BarabásiAlbert network find a clear fingerprint in our visualization layout. A further interesting example is the identification of the different hierarchical arrangement of the Internet network when visualized at the Autonomous system level and the Router level. These examples provide an illustration of the use and capabilities of the proposed algorithm in the analysis of large sparse graphs. The parameters are set to the values , and , which provide a readable layout at the definition allowed in the present paper format.



6.1 Visualization and analysis of computergenerated graphs
In this section we want to provide the visualization of a set of computer generated networks generally used in the literature to model large scale graphs. We will show that the proposed algorithm provides a very intuitive visualization of the difference between the models and the real networks. In this perspective, the core decomposition appears as a suitable tool in the examination and validation of network models.
The ErdösRényi (ER) model [ErdosRenyi59], with poissonian degree distribution, is a typical example of graphs with a characteristic value for the degree (the average value ). Since an ER graph can consist of more than one connected component, we consider only the largest of these components. An instance of the visualization of Erdös Rényi random graphs is provided by Figure 5: the maximum coreness is clearly related to the average degree . The large central mass is the result of the very homogeneous topology; the vertex degrees have only small fluctuations, thus most vertices belong to the same core that is also the highest.
Since many realworld networks have been shown to display a very heterogeneous topology as measured by broad degree distributions, many models and mechanisms have been proposed to construct heterogeneous networks. The most famous is the BarabásiAlbert (BA) model [sf99], which considers growing networks according to the preferential attachment mechanism: each new vertex is connected to
already existing vertices chosen with a probability proportional to their starting degree. This model produces graphs with powerlaw degree distributions, thus characterized by a very large variety of degree values. Such a graph, with
, shown in Fig. 5 produces a quite peculiar decomposition. Indeed, although this graph displays a very heterogeneous vertex degree distribution, its core decomposition is trivial; only few layers at very small coreness are visible. The construction mechanism provides a simple explanation. Each new vertex enters the system with degree , but at the following time steps new vertices may connect to it, increasing its degree. Inverting the procedure, we obtain exactly the core decomposition. The minimum degree is , therefore all coreness sets with are empty. Recursively pruning all vertices of degree , one first removes the last vertex, then the one added at the preceding step, whose degree is now reduced to the initial value , and so on, up to the initial vertices which may have larger degree. Hence, all vertices except the initial ones belong to the coreness set of coreness . This somewhat pathological property holds for all growing networks with fixed initial number of links for new vertices. Simple variations of the basic algorithm and the introduction of stochasticity in the growing procedure result in more complicate structures.6.2 Visualization of real networks
We first present a visualization of a portion of the .fr domain of the World Wide Web (WWW). Its graph is composed by one million pages. This network, whose visualization is presented in Figure 6, is particularly interesting because at the core level two disconnected components emerge. Note that, since the actual definition of cores concerns undirected graphs, we consider here the WWW as undirected.

We also provide the visualization of networks representing Internet at various granularity levels. More precisely, we consider graphs of Internet at the autonomous system and router level. The autonomous system level is represented by collected routes of Oregon routeviews [oregon] project, called AS, and its extended version AS+ presented in Chen et al. [CCGJSW02], both from May 26, 2001. For the router level, we use the graph obtained by an exploration of Govindan and Tangmunarunkit [govindan00heuristics] in 2000, called here IR graph, and the IR_CAIDA graph obtained from the CAIDA project [IR_CAIDA] between April 21st and May 8th, 2003. Both networks are composed by approximately nodes. The two ASes maps (close to nodes each) differ mainly in the number of links: the AS+ maps were constructed by using informations from peering relationship of autonomous systems, obtained from Looking Glass tools. These tools are maintained by ISPs to troubleshoot routing problems.
Figure 7 displays the representation of two different maps of the autonomous system graphs (AS and AS+). All coreness layers are populated, and, any given kshell, the vertices are distributed on a relatively large range of the radial coordinate, which means that their neighborhoods are variously composed. It is worth noting that the coreness and the degree are very correlated, with a clear hierarchical structure. Links go principally from one coreness set to another, although there are of course also intralayer links. The hierarchical structure exhibited by our analysis of the autonomous system level is a striking property; for instance, one might exploit it for showing that in the Internet highdegree vertices are naturally (as an implicit result of the selforganizing growth) placed in the innermost structure.
At high resolution, i.e. at the router (IR) level, Internet’s properties are less structured, as shown in Figure 8, in which a completely different scenario emerges: external layers, of lowest coreness, contain vertices with large degree. For instance, in the IR graph we find 20 vertices with degree larger than 100 which have coreness smaller than 6. The correlation between coreness and degree is thus clearly of a very different nature in the maps of Internet obtained at different granularities i.e. routers or autonomous systems.
The lowest shells, containing vertices that are very external, are displayed as quite broad shells, meaning that the corresponding vertices have neighbors with coreness covering a large range of values. The larger coreness shells are thin rings, which means that the neighbors of the vertices in a given layer have similar coreness.



It is worth remarking how the present visualization allows the distinction of networks which appear very similar on the basis of the sole statistical properties. Indeed, we can notice that the IR map is quite different from the IR_CAIDA map. This difference likely finds its origin in the different exploration methods used to gather the two data sets. The IR map has been obtained from one source monitor, using source routing to detect lateral connectivity. The IR_CAIDA map, instead, is the merger of data gathered by several different probing monitors. On one hand, it is likely that the most central cores of the IR network are composed by routers with source routing activated (approximately of the total routers [govindan00heuristics]). These routers sample destinations unevenly resulting in a less regular layout. On the other hand, the IR_CAIDA map appears to have a very regular structure likely due to a more symmetric exploration process. The obtained layout provides at a glance the evidence for pronounced differences in the structural ordering in the two maps, suggesting the critical examination and comparison of the two experimental strategies.



7 Conclusions
In this paper, we have proposed a general visualization tool for large scale graphs. Exploiting core decomposition, and the natural hierarchical structures emerging from it, our algorithm yields a layout that possesses the simplicity of a 2D representation with a considerable amount of information encoded. One can easily read basic features of the graph (degree, hierarchical structure, position of the highest degree vertices, etc.) as well as more entangled features, e.g. the relation between a vertex and the hierarchical position of its neighbors. Our results show the possibility of gaining clear insights on the architecture of many real world and computergenerated networks by a visualization based on the rationalization of the corresponding graph. In conclusion, the present visualization strategy is a useful tool for a distinction between networks with different topological properties and structural arrangement, but it may be also used for determining if a certain model is in good agreement with real data, providing a further interesting tool for models validation. Finally, we also provide a publicly available tool for visualizing networks [LANETVI].
Acknowledgments: We gratefully acknowledge Fabien Mathieu of LIRMM at Montpellier, France, for providing the .fr portion of the WWW graph. This work has been partially funded by the European Commission  Fet Open project COSIN IST200133555 and contract 001907 (DELIS).