1 Introduction: networks, their graph spectra and importance
Many systems of interest can be naturally characterised by complex networks; examples include social networks (Mislove et al., 2007; Flake et al., 2000; Leskovec et al., 2007), biological networks (Palla et al., 2005)
and technological networks. Trends, opinions and ideologies spread on a social network, in which people are nodes and edges represent relationships. Networks are mathematically represented by graphs. Of crucial importance to the understanding of the properties of a network or graph is its spectrum, which is defined as the eigenvalues of its adjacency or Laplacian matrix
(Farkas et al., 2001; CohenSteiner et al., 2018). The spectrum of a graph can be considered as a natural set of graph invariants and has been extensively studied in the fields of chemistry, physics and mathematics (Biggs et al., 1976). Spectral techniques have been extensively used to characterise the global network structure (Newman, 2006b)and in practical applications thereof, such as facial recognition and computer vision
(Belkin and Niyogi, 2003), learning dynamical thresholds (McGraw and Menzinger, 2008), clustering (Von Luxburg, 2007), and measuring graph similarity (Takahashi et al., 2012).A major limitation in utilizing graph spectra to solve problems such as graph similarity and estimating the number of clusters
^{1}^{1}1Just two example applications of the general method we propose for learning graph spectrum in this paper. is the inability to automatically and consistently learn an everywherepositive, nonsingular approximation to the spectral density. Full eigendecomposition, which is prohibitive for large graphs, or iterative momentmatched approximations both give a Dirac sum that must be smoothed to be everywhere positive. The choice of smoothing kernel and kernel bandwidth choice , or number of histogram bins, which are usually chosen in an adhoc manner, can significantly affect the resulting output.The main contributions of this paper are as follows:

We prove that the method of kernel smoothing, commonly used in methods to visualize and compare graph spectral densities, biases moment information;

We propose a computationally efficient and information theoretically optimal smooth spectral density approximation, based on the method of Maximum Entropy, which fully respects the moment information. It further admits analytic forms for symmetric and nonsymmetric KLdivergences and Shannon entropy;

We utilize our information theoretic spectral density approximation, on two example applications. We investigate graph similarity and to learn the number of clusters in a graph, outperforming iterative smoothed spectral approaches on both real and synthetic datasets
2 Preliminaries
Graphs are the mathematical structure underpinning the formulation of networks. Let be an undirected graph with vertex set . Each edge between two vertices and carries a nonnegative weight . corresponds to two disconnected nodes. For unweighted graphs we set for two connected nodes. The adjacency matrix is defined as and . The degree of a vertex is defined as
(1) 
The degree matrix is defined as a diagonal matrix that contains the degrees of the vertices along diagonal, i.e., . The unnormalised graph Laplacian matrix is defined as
(2) 
As is undirected, , which means that the weight matrix is symmetric and hence is symmetric and given is symmetric, the unnormalized Laplacian is also symmetric. As symmetric matrices are special cases of normal matrices, they are Hermitian matrices and have real eigenvalues. Another common characterisation of the Laplacian matrix is the normalised Laplacian (Chung, 1997),
(3) 
where is known as the normalised adjacency matrix ^{2}^{2}2Strictly speaking, the second equality only holds for graphs without isolated vertices.. The spectrum of the graph is defined as the density of the eigenvalues of the given adjacency, Laplacian or normalised Laplacian matrices corresponding to the graph. Unless otherwise specified, we will consider the spectrum of the normalised Laplacian.
3 Motivations for A New Approach on Approximating and Comparing the Spectra of Large Graphs
For large sparse graphs with millions, or billions, of nodes, learning the exact spectrum using eigendecomposition is unfeasible due to the
cost. Powerful iterative methods, such as the Lanczos algorithm, which only require matrixvector multiplications, and hence have a computational cost scaling with the number of nonzero nodes in the graph, are often used. These approaches approximate the graph spectrum with a sum of weighted Dirac delta functions, closely matching the first
moments (where is the number of iterative steps used, as detailed in Appendix B) of the spectral density (Ubaru et al., 2016) i.e.:(4) 
where , and denotes the th eigenvalue in the spectrum. However, such an approximation is undesirable because natural divergence measures between densities, such as the informationbased relative entropy (Cover and Thomas, 2012),(Amari and Nagaoka, 2007) as equation 5,
(5) 
can be infinite for densities that are mutually singular. The use of the JensenShannon divergence simply rescales the divergence into . This can lead to counterintuitive scenarios, such as an infinite (or maximal) divergence upon the removal or addition of a single edge or node in a large network, an infinite (or maximal) divergence between two graphs generated using the same random graph model and identical hyperparameters.
3.1 The argument against kernel smoothing:
To alleviate these limitations, practitioners typically generate a smoothed spectral density by convolving the Dirac mixture with a smooth kernel (Takahashi et al., 2012; Banerjee, 2008), often a Gaussian or Cauchy (Banerjee, 2008) to facilitate visualisation and comparison. The smoothed spectral density, with reference to Equation (4), thus takes the form:
(6) 
We make some assumptions regarding the nature of the kernel function, , in order to prove our main theoretical result about the effect of kernel smoothing on the moments of the underlying spectral density. Both of our assumptions are met by (the commonly employed) Gaussian kernel.
Assumption 1
The kernel function is supported on the real line .
Assumption 2
The kernel function is symmetric and permits all moments.
The th moment of a Dirac mixture , which is smoothed by a kernel satisfying assumptions 1 and & 2, is perturbed from its unsmoothed counterpart by an amount , where if is even and otherwise. denotes the th central moment of the kernel function . The moments of the Dirac mixture are given as,
(7) 
The moments of the modified smooth function (Equation (6)) are
(8)  
We have used the binomial expansion and the fact that the infinite domain is invariant under shift reparametarization and the odd moments of a symmetric distribution are
.The above proves that kernel smoothing alters moment information, and that this process becomes more pronounced for higher moments. Furthermore, given that , and (for the normalised Laplacian) , the corrective term is manifestly positive, so the smoothed moment estimates are biased.
For large random graphs, the moments of a generated instance converge to those averaged over many instances (Feier, 2012), hence by biasing our moment information we limit our ability to learn about the underlying stochastic process. We include a detailed discussion regarding the relationship between the moments of the graph and the underlying stochastic process in Appendix Section E.
4 An Information Theoretically Optimal Approach to the Problem of Smooth Spectra for Massive Graphs
For large, sparse graphs corresponding to real networks with millions or billions of nodes, where eigendecomposition is intractable, we may still be able to compute a certain number of matrixvector products, which we can use to get unbiased estimates of the spectral density moments, using stochastic trace estimation (as explained in Appendix
A). We can settle on a unique spectral density which satisfies the given moment information exactly, known as the density of Maximum Entropy explained in Section 4.1.4.1 Maximum Entropy: MaxEnt
The method of maximum entropy, hereafter referred to as MaxEnt (Pressé et al., 2013), is informationtheoretically optimal in so far as it makes the least additional assumptions about the underlying density (Jaynes, 1957) and is flattest in terms of the KL divergence compared to the uniform (Granziol et al., 2019). To determine the spectral density using MaxEnt, we maximise the entropic functional
(9) 
with respect to , where are the power moment constraints on the spectral density, which are estimated using stochastic trace estimation (STE) as explained in Appendix A. The resultant entropic spectral density has the form
(10) 
where the coefficients are derived from optimising (9). We use the MaxEnt algorithm, proposed in (Granziol et al., 2019) to learn these coefficients. For simplicity, we denote as . ^{3}^{3}3We make our Python code available on https://github.com/diegogranziol/PythonMaxEnt
4.2 The Entropic Graph Spectral Learning algorithm
5 Visualising the Modelling Power of EGS
Having developed a theory as to why a smooth, exact moment matched approximation of the spectral density is crucial to learning the characteristics of the underlying stochastic process, and having proposed a method (Algorithms 1) to learn such a density, we test the practical utility of our method and algorithm on examples where the limiting spectral density is known.
5.1 ErdősRényi graphs and The semicircle law
For ErdősRényi graphs with edge creation probability
, and , the limiting spectral density of the normalised Laplacian converges to the semicircle law and its Laplacian converges to the free convolution of the semicircle law and (Jiang, 2012). We consider here to what extent our EGS learnt with finite moments can effectively approximate the density. Wigner’s density is fully defined by its infinite number of central moments given by , where are known as the Catalan numbers. As a toy example we generate a semicircle centered at with and use the analytical moments to compute its corresponding EGS (FIG 1). As can be seen in FIG 0(a), for moments, the central portion of the density is already well approximated, but the end points are not. This is largely corrected for moments.


We generate an ErdősRényi graph with and , and learn the moments using stochastic trace estimation. We then compare the fit between the EGS computed using a different numbers of input moments and the graph eigenvalue histogram computed by eigendecomposition. We plot the results in FIG 2. One striking difference between this experiment and the previous one is the number of moments needed to give a good fit. This can be seen especially clearly in the top left subplot of FIG 2, where the 3 moment, i.e Gaussian approximation, completely fails to capture the bounded support of the spectral density. Given that the exponential polynomial density is positive everywhere, it needs more moment information to learn the regions of boundedness of the spectral density in its domain. In the previous example we artificially alleviated this phenomenon by putting the support of the semicircle within the entire domain. It can be clearly seen that increasing moment information successively improves the fit to the support FIG 2. Furthermore, the magnitude of the oscillations, which are characteristic of an exponential polynomial function, decay in magnitude for larger moments.
5.2 Beyond the semicircle law
For the adjacency matrix of an ErdősRényi graph with , the limiting spectral density does not converge to the semicircle law and has an elevated central portion, and the scale free limiting density converges to a triangle like distribution (Farkas et al., 2001). For other random graph, such as the BarabásiAlbert (Barabási and Albert, 1999), also known as the scalefree network, the probability of a new node being connected to a certain existing node is proportional to the number of links that existing node already has, violating the independence assumption required to derive the semicircle density. We plot a BarabásiAlbert network () and, similar to Section 5.1, we learn the EGS and plot the resulting spectral density against the eigenvalue histogram, shown in FIG 3. For the BarabásiAlbert network, due to the extremity of the central peak, a much larger number of moments is required to get a reasonable fit. We also note that increasing the number of moments is akin to increasing the number of bins in terms of spectral resolution, as seen in FIG 3.
6 EGS for Measuring Graph Similarity
In this section, we test the use of our EGS in combination with symmetric KL divergence to measure similarity between different types of synthetic and real world graphs. Note that our proposed EGS, based on the MaxEnt distribution, enables the symmetric KL divergence to be computed analytically  this we show in Appendix F
. We first investigate the feasibility of recovering the parameters of random graph models, and then move onto classifying the network type as well as computing graph similarity among various synthetic and real world graphs.
6.1 Inferring parameters of random graph models
We investigate whether one can recover the network parameter values of a graph via its learned EGS. We generate a random graph of a given size and parameter value (e.g., ) and learn its entropic spectral characterisation using our EGS learner (Algorithm 1). Then, we generate another graph of the same size but learn its parameter value by minimising the symmetricKL divergence between its entropic spectral surrogate and that of the original graph. We repeat the above procedures for different random graph models i.e. ErdősRényi (ER), WattsStrogatz (WS) and BarabásiAlbert (BA) and different graph sizes (), and the results are shown in Table 1. It can be seen that, given the approximate EGS, we are able to learn well the parameters of the graph producing that spectrum.
50 100 150 ER () WS () BA ()  Large BA YouTube ER WS BA 
6.2 Learning real world network types
Determining which random graph model best fit a realworld network, characterised by their spectral divergence can lead to better understanding of its dynamics and characteristics. This has been explored for small biological networks (Takahashi et al., 2012) where full eigendecomposition is viable. Here, we conduct similar experiments for large networks based on our EGS method. We first test on a large (node) synthetic BA network. By minimising the symmetric KL divergence between its EGS and those of small (1000node) random networks (ER, WS, BA), we successfully recover its own type. As a realworld use case, we further repeat the experiment to determine which random network can best model the YouTube network from the SNAP dataset (Leskovec and Krevl, 2014) and find, as shown in Table 2, that the BA gives the lowest divergence, which aligns with other findings for social networks (Barabási and Albert, 1999).
6.3 Comparing different real world networks
We now consider the feasibility of comparing real world networks using EGSs. Specifically, we take biological networks, citation networks and road networks from the SNAP dataset (Leskovec and Krevl, 2014), and compute the symmetric KL divergences between their EGS with moments. We present the results in a heat map (FIG 4). We see very clearly that the intraclass divergences between the biological, citation and road networks are much smaller than their interclass divergences. This strongly suggests that the combination of our EGS method and the symmetric KL divergence can be used to identify similarity in networks. Furthermore, as can be seen in the divergence between the human and mouse network, the spectra of human genes are more closely aligned with each other than they are with the spectra of mouse genes. This suggests a reasonable amount of intraclass distinguishability as well.
7 EGS for Estimating Cluster Number
It is known from spectral graph theory (Chung, 1997), that the number multiplicity of the eigenvalue in the Laplacian (and the normalized Laplacian) is equal to the number of connected components in the graph (Von Luxburg, 2007). Previous literature has argued (Ubaru and Saad, ), that for a small amount of intercluster connections by matrix perturbation theory (Bhatia, 2013) we should expect a number of eigenvalues close to , we make this argument precise with the following Theorem 7.
The normalised Laplacian eigenvalue, perturbated by adding a single edge between nodes and from two previously disconnected clusters and , is bounded to first order by
(11) 
where denotes the degree of node and and similarly , where denotes the sum over all nodes connecting to node . Using Weyl’s bound on Hermination matrices (Bhatia, 2013),
(12) 
By the definition of the normalized Laplacian
(13) 
to first order in the binomial expansion. We hence have the result. For two clusters with identical degree , connected by a single intercluster link, the zero eigenvalue eigenvalue is perturbed to first order by at most . Hence for intercluster connections, our bound goes as and hence the intuition of a small change in the eigenvalue holds if the number of edges between clusters is much smaller than the degree of the nodes within the clusters.
1: Input: Normalized graph Laplacian , graph dimension , tolerance 2: Output: Number of clusters 3: EGS Algorithm 1() 4: Find minimum that satisfy and 5: Calculate 
7.1 Learning the number of clusters in large graphs
For the case of large sparse graphs, where only iterative methods such as the Lanczos algorithm can be used, the same arguments from Section 3 apply. This is because the Dirac’s delta functions are now weighted, and to obtain a reliable estimate of the eigengap, one must smooth the Dirac’s delta functions. We would expect a smoothed spectral density plot to have a spike near . We expect the moments of the spectral density to encode this information and the mass of this peak to be spread. We hence look for the first spectral minimum in the EGS and calculate the number of clusters as shown in Algorithm 2. We conduct a set of experiments to evaluate the effectiveness of our spectral method in Algorithm 2 for learning the number of distinct clusters in a network, where we compare it against the Lanczos algorithm with kernel smoothing on both synthetic and realworld networks.
7.1.1 Synthetic networks
The synthetic data consists of disconnected subgraphs of varying sizes and cluster numbers, to which a small number of intracluster edges are added. We use an identical number of matrix vector multiplications, i.e., (see Appendix C for experimental details for both EGS and Lanczos methods), and estimate the number of clusters and report the fractional error. The results are shown in Table 3. In each case, the method achieving lowest detection error is highlighted in bold. It is evident that the EGS approach outperforms Lanczos as the number of clusters and the network size increase. We observe a general improvement in performance for larger graphs, visible in the differences between fractional errors for EGS as the graph size increases and not kernelsmoothed Lanczos.
()  9 (270)  30 (900)  90 (2700)  240 (7200) 

Lanczos  
EGS 
To test the performance of our approach for networks that are too large to apply eigendecomposition, we generate two large networks by mixing the ER, WA, BA random graph models. The first large network has a size of 201,600 nodes and comprises 305 interconnected clusters whose size varies from 500 to 1000 nodes. The second large network has a size of 404,420 nodes and comprises interconnected 1355 clusters whose size varies from 200 to 400 nodes. The results in FIG 6 show that for both methods, the detection error generally decreases as more moments are used, and our EGS approach again outperforms the Lanczos method for both large synthetic networks.


7.1.2 Small real world networks
We next experiment with relatively small real world networks, such as the Email network in the SNAP dataset, which is an undirected graph where the nodes represent members of a large European research institution and the edges represent the existence of email communication between them. For such network, we can still calculate the groundtruth number of clusters by computing the eigenvalues explicitly and finding the spectral gap near . For the Email network, we count very small eigenvalues before a large jump in magnitude (measured in the log scale) and set this as the groundtruth. This is shown in FIG 5, where we display the value of each of the eigenvalues in increasing order and how this results in a broadened peak in the EGS. The area under the curve multiplied by the number of network nodes is the number of clusters . We note that the number differs from the value of given by the number of departments at the research institute in this dataset. A likely reason for this groundtruth inflation is that certain departments, Astrophysics, Theoretical Physics and Mathematics for example, may collaborate to such an extent that their division in name may not be reflected in terms of node connection structure. We plot the log error against the number of moments for both EGS and Lanczos in FIG 6(a), with EGS showing superior performance. We repeat the experiment on the Net Science collaboration network, which represents a coauthorship network of scientists () working on network theory and experiment (Newman, 2006a). The results in FIG 6(b) show that EGS quickly outperforms the Lanczos algorithm after around moments.
7.1.3 Large real world networks
For large datasets with , where the Cholesky decomposition becomes completely prohibitive even for powerful machines, we can no longer define a groundtruth using a complete eigendecomposition. Alternative “groundtruths” supplied in (Mislove et al., 2007), regarding each set of connected components with more than 3 nodes as a community, are not universally accepted. This definition, along with that of selfdeclared group membership (Yang and Leskovec, 2015), often leads to contradictions with our definition of a community. A notable example is the Orkut dataset, where the number of stated communities is greater than the number of nodes (Leskovec and Krevl, 2014). Beyond being impossible to learn such a value from the eigenspectra, if the main reason to learn about clusters is to partition groups and to summarise networks into smaller substructures, such a definition is undesirable.
We present our findings for the number of clusters in the DBLP (), Amazon () and YouTube () networks (Leskovec and Krevl, 2014) in Table 4, where we use a varying number of moments. We see that for both the DBLP and Amazon networks, the number of clusters seems to converge with increasing moments number , whereas for YouTube such a trend is not visible. This can be explained by looking at the approximate spectral density of the networks implied by maximum entropy in FIG 8. For both DBLP and Amazon (FIG 7(a) and 7(b) respectively), we see that our method implies a clear spectral gap near the origin, indicating the presence of clusters. Whereas for the YouTube dataset, shown in FIG 7(c), no such clear spectral gap is visible and hence the number of clusters cannot be estimated accurately.



Moments  40  70  100 

DBLP ()  
Amazon ()  
Youtube () 
8 Conclusion
In this paper, we propose a novel, efficient framework for learning a continuous approximation to the spectrum of large scale graphs, which overcomes the limitations introduced by kernel smoothing. We motivate the informativeness of spectral moments using the link between random graph models and random matrix theory. We show that our algorithm is able to learn the limiting spectral densities of random graph models for which analytical solutions are known. We showcase the strength of this framework in two real world applications, namely, computing the similarity between different graphs and detecting the number of clusters in the graph. Interestingly, we are able to classify different real world networks with respect to their similarity to classical random graph models. The EGS may be of further use to researchers studying network properties and similarities.
Appendix A Stochastic Trace Estimation
The intuition behind stochastic trace estimation is that we can accurately approximate the moments of with respect to the spectral density by using computationally cheap matrixvector multiplications. The moments of can be estimated using a MonteCarlo average,
(14) 
where is any random vector with zero mean and unit covariance and is a matrix whose eigenvalues are . This enables us to efficiently estimate the moments in for sparse matrices, where . We use these as moment constraints in our entropic graph spectrum (EGS) formalism to derive the functional form of the spectral density. Examples of this in the literature include (Ubaru et al., 2017; Fitzsimons et al., 2017).
Appendix B Comment on the Lanczoz Algorithm
In the stateoftheart iterative algorithm Lanczos (Ubaru et al., 2017), the tridiagonal matrix can be derived from the moment matrix , corresponding to the discrete measure satisfying the moments for all (Golub and Meurant, 1994) and hence it can be seen as a weighted Dirac approximation to the spectral density matching the first moments. The weight given on every Ritz eigenvalue (the eigenvalues of the matrix
) is the square of the first component of the corresponding eigenvector, i.e.,
, hence the approximated spectral density can be written as,(15) 
Appendix C Experimental Details
We use Gaussian random vectors for our stochastic trace estimation, for both EGS and Lanczos (Ubaru et al., 2017). We explain the procedure of going from adjacency matrix to Laplacian moments in Algorithm 3. When comparing EGS with Lanczos, we set the number of moments equal to the number of Lanczos steps, as they are both matrix vector multiplications in the Krylov subspace. We further use Chebyshev polynomial input instead of power moments for improved performance and conditioning. In order to normalise the moment input we use the normalised Laplacian with eigenvalues bounded by and divide by . To make a fair comparison we take the output from Lanczos (Ubaru et al., 2017) and apply kernel smoothing (Lin et al., 2016) before applying our cluster number estimator.
Appendix D EGSs of Real World Networks with Varying Number of Moments
In order to more clearly showcase the practical value of having a EGS based on a large number of moments, we show the symmetric KL divergence between real world networks using a
moment Gaussian approximation. The Gaussian is fully defined by its normalization constant, mean and variance and so can be specified with
Lagrange multipliers. The results for the same analysis as in Figure 4, but now obtained using a moment Gaussian approximation, are shown in Figure 9. The networks are still somewhat distinguished; however, one can see for example that citation networks and road networks are less clearly distinguished to the point that interclass distance is lessened compared to intraclass distance, which for the purpose of network classification is not a particularly helpful property. The problem still persists for more moments; for example, when we choose , which is what has been reported stable for other offtheshelf maximum entropy algorithms, similar results are observed in Figure 10. In comparison, this is not the case for more moments in Figure 4 in the main text.Appendix E On the Importance of Moments
Given that all iterative methods essentially generate a moment empirical spectral density (ESD) approximation, it is instructive to ask what information is contained within the first spectral moments.
To answer this question concretely, we consider the spectra of random graphs. By investigating the finite size corrections and convergence of individual moments of the empirical spectral density (ESD) compared to those of the limiting spectral density (LSD), we see that the observed spectra are faithful to those of the underlying stochastic process. Put simply, given a random graph model, if we compare the moments of the spectral density observed from a single instance of the model to that averaged over many instances, we see that the moments we observe are informative about the underlying stochastic process.
e.0.1 ESD moments converge to those of the LSD
For random graphs, with independent edge creation probabilities, their spectra can be studied through the machinery of random matrix theory (Akemann et al., 2011).
We consider the entries of an matrix to be zero mean and independent, with bounded moments. For such a matrix, a natural scaling which ensures we have bounded norm as is . It can be shown (see for instance (Feier, 2012)) that the moments of a particular instance of a random graph and the related random matrix converge to those of the limiting counterpart in probability with a correction of .
e.0.2 Finite size corrections to moments get worse with larger moments
A key result, akin to the normal distribution for classical densities, is the semicircle law for random matrix spectra
(Feier, 2012). For matrices with independent entries , , with common elementwise bound , common expectation and variance , and diagonal expectation , it can be shown that the corrections to the semicircle law for the moments of the eigenvalue distribution,(16) 
have a corrective factor bounded by (Füredi and Komlós, 1981)
(17) 
Hence, the finite size effects are larger for higher moments than that for the lower counterparts. This is an interesting result, as it means that for large graphs with , the lowest order moments, which are those learned by any iterative process, best approximate those of the underlying stochastic process.
Appendix F Analytic Forms for the Differential Entropy and divergence from EGS
To calculate the differential entropy we simply note that
(18) 
The KL divergence between two EGSs, and , can be written as,
(19) 
where refers to the th moment constraint of the density . Similarly, the symmetricKL divergence can be written as,
(20) 
where all the and are derived from the optimisation and all the are given from the stochastic trace estimation.
Appendix G Spectral Density with More Moments
We display the process of spectral learning for both EGS and Lanczos, by plotting the spectral density of both methods against the groundtruth in FIG 11. In order to make a valid comparison, we smooth the implied density using a Gaussian kernel with . Whilst this number could in theory be optimised over, we considered a range of values and took the smallest for which the density was sufficiently smooth, i.e., everywhere positive on the bounded domain . We note that both EGS and Lanczos approximate the groundtruth better with a greater number of moments and that Lanczos learns the extrema of the spectrum before the bulk of the distribution while EGS captures the bulk right from the start.
References
 The oxford handbook of random matrix theory. Oxford University Press. Cited by: §E.0.1.
 Methods of information geometry. Vol. 191, American Mathematical Soc.. Cited by: §3.
 The spectrum of the graph laplacian as a tool for analyzing structure and evolution of networks. Ph.D. Thesis. Cited by: §3.1.
 Emergence of scaling in random networks. science 286 (5439), pp. 509–512. Cited by: §5.2, §6.2.
 Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §1.
 Matrix analysis. Vol. 169, Springer Science & Business Media. Cited by: §7, §7.
 Graph theory 17361936, 1976. Clarendon Press. Cited by: §1.
 Spectral graph theory. American Mathematical Soc.. Cited by: §2, §7.
 Approximating the spectrum of a graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1263–1271. Cited by: §1.
 Elements of information theory. John Wiley & Sons. Cited by: §3.
 Spectra of “realworld” graphs: beyond the semicircle law. Physical Review E 64 (2), pp. 026704. Cited by: §1, §5.2.
 Methods of proof in random matrix theory. Ph.D. Thesis, Harvard University. Cited by: §E.0.1, §E.0.2, §3.1.
 Entropic trace estimates for log determinants. External Links: arXiv:1704.07223 Cited by: Appendix A.
 Efficient identification of web communities. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 150–160. Cited by: §1.
 The eigenvalues of random symmetric matrices. Combinatorica 1 (3), pp. 233–241. Cited by: §E.0.2.
 Matrices, moments and quadrature. Pitman Research Notes in Mathematics Series, pp. 105–105. Cited by: Appendix B.
 MEMe: an accurate maximum entropy method for efficient approximations in largescale machine learning. Entropy 21 (6), pp. 551. Cited by: §4.1.
 Information theory and statistical mechanics. Phys. Rev. 106, pp. 620–630. Cited by: §4.1.
 Empirical distributions of laplacian matrices of large dilute random graphs. Random Matrices: Theory and Applications 1 (03), pp. 1250004. Cited by: §5.1.
 The dynamics of viral marketing. ACM Transactions on the Web (TWEB) 1 (1), pp. 5. Cited by: §1.
 SNAP Datasets: Stanford large network dataset collection. Note: http://snap.stanford.edu/data Cited by: §6.2, §6.3, §7.1.3, §7.1.3.
 Approximating spectral densities of large matrices. SIAM review 58 (1), pp. 34–65. Cited by: Appendix C.
 Laplacian spectra as a diagnostic tool for network structure and dynamics. Physical Review E 77 (3), pp. 031102. Cited by: §1.
 Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp. 29–42. Cited by: §1.
 Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego, CA. Cited by: §7.1.3.
 Finding community structure in networks using the eigenvectors of matrices. Physical review E 74 (3), pp. 036104. Cited by: §7.1.2.
 Modularity and community structure in networks. Proceedings of the national academy of sciences 103 (23), pp. 8577–8582. Cited by: §1.
 Uncovering the overlapping community structure of complex networks in nature and society. nature 435 (7043), pp. 814. Cited by: §1.
 Principles of Maximum Entropy and Maximum Caliber in Statistical Physics. Reviews of Modern Physics 85, pp. 1115–1141. Cited by: §4.1.
 Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PLoS One 7 (12), pp. e49949. Cited by: §1, §3.1, §6.2.
 Fast Estimation of tr (f (a)) via Stochastic Lanczos Quadrature. Cited by: §3.
 Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM Journal on Matrix Analysis and Applications 38 (4), pp. 1075–1099. Cited by: Appendix A, Appendix B, Appendix C.
 [33] Applications of trace estimation techniques. Cited by: §7.

A tutorial on spectral clustering
. Statistics and computing 17 (4), pp. 395–416. Cited by: §1, §7.  Defining and evaluating network communities based on groundtruth. Knowledge and Information Systems 42 (1), pp. 181–213. Cited by: §7.1.3.
Comments
There are no comments yet.