The problem of drawing very large graphs is challenging and has motivated a large body of research (see  for a survey). As the number of vertices and edges becomes larger, layout algorithms become less effective. Further, runtime is increased both at the layout stage and at the rendering stage. Recent work (for example ) approaches the problem by replacing the original graph with a “proxy graph”. The proxy graph is typically much smaller than the original graph, and thus layout and rendering is easier. The challenge for the proxy graph approach is to ensure that the proxy graph is a good representation of the original graph; for visualization, we want the drawing of the proxy graph to be faithful  to the original graph.
In this paper we examine a specific proxy graph approach using spectral sparsification as introduced by Spielman et al. : roughly speaking, the spectrum
(that is, the eigenvalues of the Laplacian; see) of the proxy graph approximates the spectrum of the original graph. Since the spectrum is closely related to graph-theoretic properties that are significant for graph drawing, this kind of proxy seems to promise faithful drawings.
We report results of an empirically investigation of the application of spectral sparsification to graph drawing. Specifically, we consider two closely related spectral sparsification techniques, one deterministic and one stochastic. We consider the quality of drawings so produced, using real-world and synthetic data sets. Quality is evaluated using the shape-based proxy graph metrics . The results of spectral sparsification are compared to sparsifications obtained by simple random edge sampling. Our investigation confirms the promise of spectral sparsification, and shows that (overall) it is better than simple random edge sampling.
Section 2 recalls the proxy graph approach, and shape-based quality metrics for large graph drawing. Section 3 describes the application of spectral sparsification to graph visualization. Section 4 presents our experiments with spectral sparsification. The results of these experiments are presented and discussed in Section 5. Section 6 concludes.
2.0.1 Proxy graphs and sparsification.
The proxy graph approach is described in Fig 1: for a given input graph , a proxy graph and a drawing of are computed. The proxy graph represents but is simpler and/or smaller than in some sense. The user sees the drawing of , and does not see a drawing of the original graph . However, if is a “good” representation of , then is an adequate visualization of in that the user can see all the structure of in the drawing .
If is a subgraph of , and the edge density of is smaller than the edge density of , then we say that is a sparsification of . Sparsification is the most common kind of proxy.
Sparsification has been extensively investigated in Graph Mining [9, 16, 19] (see survey ). Typically, sparsification is achieved by some kind of stochastic sampling. The most basic sparsification method is random edge sampling (RE)
: each edge is chosen independently with probability. This and many other simple stochastic strategies have been empirically investigated in the context of visualization of large graphs [20, 29]. In this paper we apply a more sophisticated graph sparsification approach to visualization: the spectral sparsification work of Spielman et al. [1, 25, 26].
2.0.2 Shape-based quality metrics.
Traditional graph drawing metrics such as edge bends, edge crossings, and angular resolution are based on the readability of graphs; these metrics are good for small scale visualisation but become meaningless beyond a few hundred nodes . For large graphs, faithfulness metrics are more important: informally, a drawing of a graph is faithful insofar as determines , that is, insofar as the mapping is invertible.
Here we use shape-based faithfulness metrics . The aim of these metrics is to measure how well the “shape” of the drawing represents the graph.
For large graphs, such as in Fig. 2, the shape of the drawing is more significant than the number of edge bends and edge crossings. To make this notion more precise, we use “shape graphs”. Given a set of points, a shape graph is a graph with vertex set such that the edge of define the “shape” of in some sense. Examples of shape graphs are the Euclidean minimum spanning tree (EMST), the relative neighbourhood graph (RNG), and the Gabriel graph (GG) .
Suppose that is a graph and is a set of points in the plane, and each vertex is associated with a point . Denote the set of neighbours of in by , and the set of neighbours of in the shape graph by . We say that
is the Jaccard similarity between the shape graph and . If is a drawing of then the (shape-based) quality of is , where is the set of vertex locations in the drawing . Similarly, if is a drawing of a sparsification of , then the (shape-based) (proxy) quality of is , where is the set of vertex locations in the drawing . Note that if does not occur in , we consider that . For more details, see .
3 The spectral sparsification approach to large graph drawing
First we describe some of the terminology and concepts of spectral graph theory. More details are in standard texts; for example, [5, 10]111Beware: much of the terminology in spectral graph theory is not standardised.. The adjacency matrix of an -vertex graph is the matrix , indexed by , such that if and otherwise. The degree matrix of is the diagonal matrix with where is the degree of vertex . The Laplacian of is . The spectrum of is the list of eigenvalues of . It can be shown that has real nonnegative eigenvalues , and we assume that ; straightforward computation shows that .
The spectrum of a graph is closely related to many structural properties of the graph:
involves the projection of the graph using its smallest eigenvalues. Spectral clustering solves a relaxation of theratio cut problem, that is, the problem of dividing a graph into clusters to minimise the ratio between the number of inter-cluster edges and the cluster size . Informally, the ratio cut problem seeks to find clusters of similar size so that the coupling between clusters is minimised.
The spectrum solves a kind of constrained stress problem for the graph. More specifically, the Courant - Fischer theorem (see ) implies that
is the set of unit vectors orthogonal to the first (
) eigenvectors. The minimum is achieved whenis an eigenvalue corresponding to . Note that the right hand side of equation (1) is a kind of stress function.
- Commute distance:
The average time that a random walker takes to travel from vertex to vertex and return is the commute distance between and . Eigenvalues are related to random walks in the graph, and thus to commute distances (see ).
Spielman and Teng , following Benczur and Karger , first introduced the concept of “spectral approximation”. Suppose that is an -vertex graph with Laplacian , and is an -vertex subgraph of with Laplacian . If there is an such that for every ,
then is an -spectral approximation of . Using the Courant-Fischer Theorem  with (2), one can show that if is an -spectral approximation of then the eigenvalues and eigenvectors of are close to those of . The importance of this is that spectral approximation preserves the structural properties listed above.
Spielman and Teng first showed that every -vertex graph has a spectral approximation with edges . The following theorem is one such result:
 Suppose that is an -vertex graph and . Then with probability at least , there is an -spectral approximation of with edges.
Further research of Spielman et al. refines and improves spectral sparsification methods (see ). These results have potential for resolving scale issues in graph visualisation by reducing the size of the graph while retaining its (spectral) structure. However, the practical impact of these results for graph visualization is not clear, because of large constants involved.
The proof of Theorem 3.1 is essentially a stochastic sampling method, using the concept of “effective resistance”. Suppose that we regard a graph as an electrical network where each edge is a 1- resistor, and a current is applied. The voltage drop over an edge is the effective resistance of . Effective resistance in a graph is closely related to commute distance, and can be computed simply from the Moore-Penrose inverse  of the Laplacian. If is the Moore-Penrose inverse of and , then + - 2 .
We next describe two graph drawing algorithms, both variants of algorithms of Spielman et al. . Each takes a graph and an integer , and computes a sparsification with edges, then draws .
(Stochastic Spectral Sparsification) randomly selects edges with probability proportional to their resistance value. Let be the edge set from random selections. Let be the subgraph of induced by ; draw .
(Deterministic Spectral Sparsification). Let consist of the of largest effective resistance. Let be the subgraph of induced by ; draw .
In both DSS and SSS, the sparsified graph can be drawn with any large-graph layout algorithm.
4 The experiments
The driving hypothesis for this paper is that for large graphs, spectral sparsification gives good proxy graphs for visualization. To be more precise, we define the relative density of the sparsification for a graph to be , where has edges and has edges. Note that a proxy with higher relative density should be a better approximation to the original graph; thus we expect that drawings of the proxy with higher relative density should have better quality.
Since spectral sparsification (approximately) preserves the eigenvalues, we hypothesize that both SSS and DSS are better than RE. Further, we expect that the difference becomes smaller when the relative density is larger. To state this precisely, let (respectively ) denote the drawing obtained by SSS (respectively RE). We say that is the quality ratio of SSS; similarly define the quality ratio of DSS. We expect that the quality ratio of both SSS and DSS is greater than 1. Further, we expect that the quality ratio for both algorithms tends to 1 as relative density tends to 1.
We implemented DSS, SSS and RE in Java, on top of the OpenIMAJ toolkit . In particular, we used OpenIMAJ to compute the Moore-Penrose inverse. The experiments were performed on a Dell XPS 13 laptop, with an i7 Processor, 16GB memory and 512GB SSD. The laptop was running Ubuntu 16.04 with 20GB swap memory. The computation of the Moore-Penrose inverse used Java 8, with a specified 16GB heap. We used multiple threads to speed up the resistance computation.
We used three data sets. The first set of graphs is taken from “defacto-benchmark” graphs, including the Hachul library, Walshaw’s Graph Partitioning Archive, the sparse matrices collection  and the network repository . These include two types of graphs that have been extensively studied in graph drawing research: grid-like graphs and scale-free graphs. The second set is the GION data set ; this consists of RNA sequence graphs that are used for the analysis of repetitive sequences in sequencing data; these graphs have been used in previous experiments. They are locally dense and globally sparse, and generally have distinctive shapes. The third set consists of randomly generated graphs that contain interesting structures that are difficult to model with sparsification. Specifically, we generated a number of “black-hole graphs”, each of which consists of one or more large and dense parts (so-called “black holes”), and these parts connect with the rest of the graph by relatively few edges. These relatively few edges outside the “black holes” determine the structure of the graph. Such graphs are difficult to sparsify because sampling strategies tend to take edges from the dense “black holes” and miss the structurally important edges. Figs. 2(b) and (c) are black-hole graphs. Details of the graphs that we used are in Table 1.
We sparsify these input graphs to a range of relative density values: from small (1%, 2%, 3%, 4%, 5%, 10%) to medium and large (15%, 20%, , 100%), using SSS, DSS, and RE.
We measured quality of the resulting visualizations by proxy quality metrics described in Section 2.0.2. For shape graphs, we used GG, RNG, and EMST; the results for these three shape graphs are very similar, and here we report the results for GG.
5 Results from the experiments
First we describe typical examples of the results of our experiments, using the graphs illustrated in Fig. 2; these are a relatively small defacto-benchmark graph can_144, and two black-hole graphs cN1031M22638 and gN733M62509.
Sparsifications of cN1031M22638 using RE, DSS, and SSS at relative densities of 3% and 15% are in Fig. 3. At relative density of 3%, both RE and SSS give poor results; the drawings do not show the structure of the graph. However, DSS gives a good representation. At relative density 15%, both DSS and SSS are good, while RE remains poor. A similar example, with relative densities of 1% and 10% for the black-hole graph gN733M62509, is in Fig. 4.
While the results for cN1031M22638 and gN733M62509 are typical, some results did not fit this mold. For can_144, see Fig. 5; here RE and SSS give poor representations, even at very high relative density (40%). However, all three algorithms give good representations at relative density 50%.
5.1 Quality: results and observations
Fig. 6 shows the quality metrics for the three data sets for all three algorithms. The -axis shows relative densities from 1% to 95%; the -axis shows quality measures of the proxies.
We make the following five observations from the results.
Quality increases with relative density. In general, quality increases as relative density increases. For many graphs there is a more interesting pattern: quality mostly increases up to a limit, achieved at a relative density between 10% and 30%, and then stays steady. Some of the defacto-benchmark graphs do not show this pattern: they show close to linear improvement in quality with density all the way up to 95%.
Spectral sparsification is better than random edge sampling.
Fig. 7 depicts the quality ratio (-axis) for DSS and SSS for each data set, over relative density from 1% to 95%. Note that the quality ratio is significant in most cases, especially at low relative density. For example, DSS metrics are around 200 times better than RE, and sometimes much more (for the yeast dataset it is about 400).
For most of the graphs, the quality ratio decreases as the relative density increases. Quality ratio is best for relative density smaller than 10%. When the relative density is more than 15%, RE may be slightly better than DSS for a few graphs, such as defacto-benchmark graphs graph (light blue), and (red). Interestingly, and show a peak at around 10% and 15% before a drop for larger relative density.
Sparsification is better for grid-like graphs than for scale-free graphs. Fig. 8(a) shows the quality change for DSS, SSS, and REwith density, over the grid-like and scale-free defacto-benchmark graphs.
Figure 8: Comparison of proxy quality metrics of defacto-benchmark graphs: (1) Average quality measures, (2) Average of quality ratio. The values are computed by graph types for scale-free graphs (_scalef), grid-like (_grid) graphs, and overall (_avg). .
Note that average values for DSS and SSS are better than the average value for RE when the relative density is less than 35%. When relative density is greater than 40%, there are fluctuations between SSS and DSS. For grid-like graphs, the DSS and SSS proxies give better average proxy measures than RE proxies for relative density less than 20%. For relative density greater than 35%, RE proxies improve. For scale-free graphs, DSS and SSS outperformed when relative density is under 80%.
Fig. 8(b) shows the ratio of the quality average between DSS over RE and SSS over RE. Overall, the quality ratios decline when relative density increase. The ratios are good from 1.2 to 3 times better for relative density up to 20%. For both types of graphs, DSS gives best quality, then SSS comes second.
Deterministic spectral sparsification is better. We compared the average of quality metrics for DSS, SSS and RE sparsification. Fig. 9 shows the average quality values for the three data sets. As expected, average values increase when the relative density increases. Note that DSS gives the best average and SSS is the second best.
Figure 9: Average quality metrics of DSS, SSS and RE over all data sets. Figure 10: Quality ratios of DSS/RE and SSS/RE over all data sets.
Fig. 10 shows the quality ratios and for all the data sets. Again, DSS gives an overall larger improvement over RE than SSS. The improvement of DSS over RE is good when relative density is less than 35%; SSS shows in improvement over RE as well, but it is not so dramatic. When relative density is beyond 35%, the ratio becomes small (close to 1) or even becomes smaller than 1. Further note from Fig. 10(a)-(c) that DSS and SSS give better quality ratios for black-hole graphs than for GION graphs and defacto-benchmark graphs.
Quality results are consistent across different layout algorithms. The results reported above use FM for layout. However, we found that results using other layout algorithms were very similar. We measured the quality ratios using FM, Fast, Nice and NoTwist layouts from OGDF. For example, Fig. 11 shows the quality ratio of DSS. As depicted from the graphs, the improvement of DSS over RE is consistent across different layout algorithms. The differences in the ratios is very small.
Figure 11: Comparison of average quality ratio of DSS over RE between FM, Fast, Nice and NoTwist layouts. The y-axis shows the average quality ratio .
Although the main purpose of our investigation was to evaluate the effectiveness of spectral sparsification, some remarks about runtime are in order.
Fig. 12(a) illustrates runtimes. The x-axis shows the number of edges, and the y-axis shows the computation time in minutes. Fig. 12(b) shows the amount of time for (parallel) computing resistance values. The x-axis shows the number of edges, and the y-axis shows the computation time in seconds.
The dominant part of runtime is the computation of the Moore-Penrose inverse (and thus effective resistance); for this we used standard software . For the defacto-benchmark graphs, computing the Moore-Penrose inverse takes 10.42 minutes on average. Graph can_144 takes minimum time for the Moore-Penrose inverse calculation (0.0003 mins), and graph graph_3 takes the longest time (115 mins).
6 Concluding remarks
This paper describes the first empirical study of the application of spectral sparsification in graph visualization.
Our experiments suggest that spectral sparsification approaches (DSS and SSS) are better than random edge approach. Further, the results suggest some guidelines for using spectral sparsification:
DSS works better than SSS in practice.
DSS and SSS give better quality metrics for grid-like graphs than for scale-free graphs.
For sparsifications with low relative density (1% to 20%), DSS and SSS are considerably better than edge sampling. For relative density larger than 35%, RE may be more practical, because it is simpler, faster, and produces similar results to DSS and SSS.
Future work includes the following:
Improve the runtime of these methods. For example, Spielman and Srivastava  present a nearly-linear time algorithm that builds a data structure from which we can query the approximate effective resistance between any two vertices in a graph in time. This would allow testing spectral sparsification for larger graphs.
More extensive evaluation: our experiments compare spectral sparsification with random edge sampling, but not with the wide range of sampling strategies above. Further, extension to larger data sets would be desirable.
In our experiments, quality is measured using an objective shape-based metric. It would be useful to measure quality subjectively as well, using graph visualization experts as subjects in an HCI-style experiment.
-  Batson, J.D., Spielman, D.A., Srivastava, N., Teng, S.: Spectral sparsification of graphs: theory and algorithms. Commun. ACM 56(8), 87–94 (2013)
-  Ben-Israel, A., Greville, T.N.: Generalized inverses: theory and applications, vol. 15. Springer Science & Business Media (2003)
Benczúr, A.A., Karger, D.R.: Approximating s-t minimum cuts in
) time. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA, May 22-24, 1996. pp. 47–55 (1996)
-  Chimani, M., Gutwenger, C., Jünger, M., Klau, G.W., Klein, K., Mutzel, P.: The Open Graph Drawing Framework (OGDF). CRC Press (2012)
-  Chung, F.: Spectral Graph Theory. American Maths Society (1997)
-  Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (Dec 2011)
-  Eades, P., Hong, S., Nguyen, A., Klein, K.: Shape-based quality metrics for large graph visualization. J. Graph Algorithms Appl. 21(1), 29–53 (2017), https://doi.org/10.7155/jgaa.00405
-  Eades, P., Hong, S., Nguyen, A., Klein, K.: Shape-based quality metrics for large graph visualization. Journal of Graph Algorithms and Applications 21(1), 29–53 (2017)
-  Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of OSNs. In: Proceedings of the 29th Conference on Information Communications. pp. 2498–2506. INFOCOM’10, IEEE Press, Piscataway, NJ, USA (2010)
-  Godsil, C.D., Royle, G.F.: Algebraic Graph Theory. Graduate texts in mathematics, Springer (2001), https://doi.org/10.1007/978-1-4613-0163-9
-  Gross, J., Yellen, J.: Handbook of Graph Theory. CRC Press (2004)
-  Hachul, S., Jünger, M.: Drawing large graphs with a potential-field-based multilevel algorithm. In: GD ’04. pp. 285–295 (2004)
-  Hare, J.S., Samangooei, S., Dupplaw, D.: OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: Proceedings of the 19th International Conference on Multimedia 2011. pp. 691–694 (2011)
-  Hu, P., Lau, W.C.: A survey and taxonomy of graph sampling. CoRR abs/1308.5865 (2013)
-  Hu, Y., Shi, L.: Visualizing large graphs. Wiley Interdisciplinary Reviews: Computational Statistics 7(2), 115–136 (2015), http://dx.doi.org/10.1002/wics.1343
-  Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 631–636. ACM (2006)
-  Lovász, L.: Random walks on graphs: A survey. In: Miklós, D., Sós, V.T., Szőnyi, T. (eds.) Combinatorics, Paul Erdős is Eighty, vol. 2, pp. 353–398. János Bolyai Mathematical Society (1996)
-  Marner, M.R., Smith, R.T., Thomas, B.H., Klein, K., Eades, P., Hong, S.: GION: interactively untangling large graphs on wall-sized displays. In: Graph Drawing - 22nd International Symposium, GD 2014. pp. 113–124 (2014)
-  Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the sample good enough? Comparing data from twitter’s streaming API with Twitter’s firehose, pp. 400–408. AAAI press (2013)
-  Nguyen, Q.H., Hong, S.H., Eades, P., Meidiana, A.: Proxy graph: Visual quality metrics of big graph sampling. IEEE Transactions on Visualization and Computer Graphics 23(6), 1600–1611 (June 2017)
-  Nguyen, Q.H., Eades, P., Hong, S.: On the faithfulness of graph visualizations. In: GD ’12, Redmond, WA, USA. pp. 566–568 (2012)
-  Nguyen, Q.H., Eades, P., Hong, S.: On the faithfulness of graph visualizations. In: IEEE Pacific Visualization Symposium, PacificVis 2013, February 27 2013-March 1, 2013, Sydney, NSW, Australia. pp. 209–216 (2013), https://doi.org/10.1109/PacificVis.2013.6596147
-  Rafiei, D., Curial, S.: Effectively visualizing large networks through sampling. In: 16th IEEE Visualization Conference, VIS 2005, Minneapolis, MN, USA, October 23-28, 2005. pp. 375–382 (2005)
-  Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015), http://networkrepository.com
-  Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. CoRR abs/0803.0929 (2008), http://arxiv.org/abs/0803.0929
-  Spielman, D.A., Teng, S.: Spectral sparsification of graphs. SIAM J. Comput. 40(4), 981–1025 (2011)
Toussaint, G.T.: The relative neighbourhood graph of a finite planar set. Pattern Recognition 12(4), 261–268 (1980),https://doi.org/10.1016/0031-3203(80)90066-7
-  Von Luxburg, U.: A tutorial on spectral clustering. Statistics and computing 17(4), 395–416 (2007)
-  Wu, Y., Cao, N., Archambault, D., Shen, Q., Qu, H., Cui, W.: Evaluation of graph sampling: A visualization perspective. IEEE Transactions on Visualization and Computer Graphics 23(1), 401–410 (Jan 2017)