Network analysis has attracted considerable research efforts in many areas of artificial intelligence, as networks are able to encode rich and complex data, such as human relationships and interactions. One major challenge of network analysis is how to represent network properly so that the network structure can be preserved. The most straightforward way is to represent the network by its adjacency matrix. However, it suffers from the data sparsity. Other traditional network representation relies on handcrafted network feature design (e.g., clustering coefficient), which is inflexible, non-scalable, and requires hard human labor.
In recent years, network representation learning, also known as network embedding, has been proposed and aroused considerable research interest. It aims to automatically project a given network into a low-dimensional latent space, and represent each vertex by a vector in that space. For example, a number of recent works apply advances in natural language processing (NLP), most notably models known as word2vec[Mikolov et al.2013], to network embedding and propose word2vec-based algorithms, such as DeepWalk [Perozzi, Al-Rfou, and Skiena2014] and node2vec [Grover and Leskovec2016]. Besides, researchers also consider network embedding as part of dimensionality reduction techniques. For instance, Laplacian Eigenmap (LE) [Belkin and Niyogi2003] aims to learn the low-dimensional representation to expand the manifold where the data lie.
Essentially, these methods mainly focus on preserving microscopic structure of network, like pairwise relationship between vertexes. Nevertheless, scale-free property, one of the most fundamental macroscopic properties of networks, is largely ignored.
Scale-free property depicts that the vertex degrees follow a power-law distribution, which is a common knowledge for many real-world networks. We take an academic network as the example, where each edge indicates if a vertex (researcher) has cited at least one publication of another vertex (researcher). Figure 1(a)
demonstrates the degree distribution of this network. The linearity on log-log scale suggests a power-law distribution: the probability decreases when the vertex degree grows, with a long tail tending to zero[Faloutsos, Faloutsos, and Faloutsos1999, Adamic and Huberman1999]. In other words, there are only a few vertexes of high degrees. The majority of vertexes connected to a high-degree vertex is, however, of low degree, and not likely connected to each other.
Moreover, compared with the microscopic structure, the macroscopic scale-free property imposes a higher level constraint on the vertex representations: only a few vertexes can be close to many others in the learned latent space. Incorporating scale-free property in network embedding can reflect and preserve the sparsity of real-world networks and, in turn, provide effective information to make the vertex representations more discriminative.
In this paper, we study the problem of learning scale-free property preserving network embedding. As the representation of a network, vertex embedding vectors are expected to well reconstruct the network. Most existing algorithms learn network embedding in the Euclidean space. However, we find that most traditional network embedding algorithms will overestimate the number of higher degrees. Figure 1(b) gives an example, where the representation is learned by Laplacian Eigenmap. We analyze and try to understand this theoretically, and study the feasibility of recovering power-law distributed vertex degree in the Euclidean space, by converting our problem to the Sphere Packing problem. Through our analysis we find that theoretically, moderately increasing the dimension of embedding vectors can help to preserve the scale-free property. See details in Section 2.
Inspired by our theoretical analysis, we then propose the degree penalty principle for designing scale-free property preserving network embedding algorithms in Section 3: punishing the proximity between vertexes with high degrees. We further introduce two implementations of our approach based on the spectral techniques [Belkin and Niyogi2003] and the skip-gram model [Mikolov et al.2013]. As Figure 1(c) suggests, our approach can better preserve the scale-free property of networks. In particular, the Kolmogorov-Smirnov (aka. K-S statistic) distance between the obtained degree distribution and its theoretical power-law distribution is , while the value for the degree distribution obtained by LE is (the smaller the better).
To verify the effectiveness of our approach, we conduct experiments on both synthetic data and five real-world datasets in Section 4. The experimental results show that our approach is able to not only preserve the scale-free property of networks, but also outperform several state-of-the-art embedding algorithms in different network analysis tasks.
We summarize our contribution of this paper as follows:
We analyze the difficulty and feasibility of reconstructing a scale-free network based on learned vertex representations in the Euclidean space, by converting our problem to the Sphere Packing problem.
We propose the degree penalty principle and two implementations to preserve the scale-free property of networks and improve the effectiveness of vertex representations.
We validate our proposed principle by conducting extensive experiments and find that our approach achieves a significant improvement on six datasets and three tasks compared to several state-of-the-art baselines.
2 Theoretical Analysis
In this section, we try to study why most network embedding algorithms will overestimate higher degrees theoretically, and analyze if there exists a solution of scale-free property preserving network embedding in the Euclidean space. This section also provides intuitions of our approach in Section 3.
Notations. We consider an undirected network , where is the vertex set containing vertexes and is the edge set. Each indicates an undirected edge between and . We define the adjacency matrix of as , where if and otherwise. Let be a diagonal matrix where is the degree of .
Network embedding. In this paper, we focus on the representation learning for undirected networks. Given an undirected graph , the problem of graph embedding aims to represent each vertex into a low-dimensional space , i.e., learning a function , where is the embedding matrix, and network structures can be preserved in .
Network reconstruction. As the representation of a network, the learned embedding matrix is expected to well reconstruct the network. In particular, one can reconstruct the network edges based on distances between vertexes in the latent space . For example, the probability of an edge existing between and can be defined as
where the Euclidean distance between embedding vectors and of the vertex and , in respective, is considered. In practice, a threshold is chosen and an edge will be created if . We call the above method as -NN, which is geometrically informative and a common method used in many existing work [Shaw and Jebara2009, Belkin and Niyogi2003, Alanis-Lobato, Mier, and Andrade-Navarro2016].
2.2 Reconstructing Scale-free Networks
Given a network, we call it as a scale-free network, when its vertex degrees follow a power-law distribution. In other words, there are only a few vertexes of high degrees. The majority of vertexes connected to a high-degree vertex is, however, of low degree, and not likely connected to each other. Formally, the probability density function of vertex degreehas the following form:
where is the exponent parameter and is the normalization term. In practice, the above power-law form only applies for vertexes with degrees greater than a certain minimum value [Clauset, Shalizi, and Newman2009a].
However, it is difficult to reconstruct a scale-free network in the Euclidean space by -NN. As we see in Figure 1(b), the higher degrees will be overestimated. We aim to explain this theoretically.
Intuition. While reconstructing the network by -NN, we select a certain , and for a vertex with embedding vector , we regard all points that fall in the closed ball of radius centered at , denoted by , as ones having edges with . When is a high-degree vertex, there will be many points in . We expect these points are far away from each other, keeping more vertexes with low degree and thus in turn keep the power-law distribution. However, intuitively, as these points are placed in the same closed ball , it will be more likely that their distances from each other are less than . As a result, there will be many edges created among those points, which violates the assumption of the scale-free property.
Following the above idea, we introduce a theorem below, which is discussed in , and without loss of generality, we set as .
Theorem 1 (Sphere Packing).
There are points in whose distances from each other are larger than or equal to , if and only if, there exists disjoint spheres of radius in .
The center of any sphere of radius in falls in and the distance between any two centers of two different spheres of radius is larger than or equal to 1. ∎
Theorem 1 converts our problem of reconstructing a scale-free network to the Sphere Packing Problem, which seeks to find the optimal packing of spheres in high dimensional spaces [Cohn and Elkies2003, Vance2011, Venkatesh2012]. Figure 2 gives an example, where a network centered by a high-degree vertex is embedded into a two-dimensional space. The embedding result corresponds to an equivalent Sphere Packing solution, which fails to place enough disjoint spheres and leads to a failed network reconstruction (many nonexistent edges are created). Formally, we define the packing density as follows:
Definition 1 (Packing Density).
The packing density is the fraction of the space filled by the spheres making up the packing. For convenience, we define the optimal sphere packing density in as , which means no packing of spheres in achieves a packing density larger than .
As Theorem 1 suggests, we aim to find a packing solution with large density so that more points in a closed sphere can keep their distance larger than 1. However, in general cases, finding the optimal packing density remains an open problem. Still, we are able to derive the upper and lower bounds of for sufficiently large .
Theorem 2 (Upper and lower bounds for ).
Specifically, we have
And the following inequality holds for sufficiently large :
Suppose is in , , and there can be at most points in whose distances from each other are larger than , then
where means taking the integer part. The upper bound holds for sufficiently large
By Theorem 1
we only need to estimate the number of disjoint spheres of radiusthat can be fitted in . The volume of a -dimensional ball of radius is given by . Plugging in the radius, the volumes of and a sphere of radius are respectively and . Since the optimal packing density is given by , we have
Discussion. The lower bound of in Eq. 5 suggests the feasibility to reconstruct scale-free network by -NN in the Euclidean space, when , the dimension of embedding vector, is sufficiently large. For instance, when , we have , which is enough to keep scale-free property holds for most real-world networks.
3 Our Approach
General idea. Inspired by our theoretical analysis, we propose a principle of degree penalty for designing scale-free property preserving embedding algorithms: while preserving first- and second-order proximity, the proximity between vertexes that have high degrees shall be punished. We give the general idea behind this principle below.
Scale-free property is featured by the ubiquitous existence of “big hubs” which attract most edges in the network. Most of existing network embedding algorithms, explicitly or implicitly, attempt to keep these big hubs close to their neighbors by preserving first-order and/or second-order proximity [Belkin and Niyogi2003, Tang et al.2015, Perozzi, Al-Rfou, and Skiena2014]. However, connecting to big hubs does not imply proximity as strong as connecting to vertexes with mediocre degrees. Taking social network as an example, where a famous celebrity may receive a lot of followers. However, the celebrity may not be familiar or similar to her followers. Besides, two followers of the same celebrity may not know each other at all and can be totally dissimilar. As a comparison, a mediocre user is more likely to known and to be similar to her followers.
From another perspective, a high-degree vertex is more likely to hurt the scale-free property, as placing more disjoint spheres in a closed ball is more difficult (Section 2).
The degree penalty principle can be implemented by various methods. In this paper, we introduce two proposed models based on our principle, implemented by spectral techniques and skip-gram models respectively.
3.1 Model I: DP-Spectral
Our first model, Degree Penalty based Spectral Embedding (DP-Spectral), mainly utilizes graph spectral techniques. Given a network and its adjacency matrix , we define a matrix to indicate the common neighbors of any two vertexes in :
where is the number of common neighbors of and . can be also regarded as a measurement of second-order proximity, and can be easily extended to further consider first-order proximity by
As we aim to model the influence of vertex degrees in our model, we further extend to consider degree penalty as
where is the model parameter used to indicate the strength of degree penalty, and is a diagonal matrix where is the degree of . Thus is proportional to and is inversely proportional to vertex degrees.
Objective. Our goal is to learn vertex representations , where is the th row of and represents the embedding vector of vertex , and minimize
under the constraint
Optimization. In order to minimize Eq. 9, we utilize graph Laplacian, which is an analogy to the Laplacian operator in multivariate calculus. Formally, we define graph Laplacian, , as follows:
The desired minimizing the objective function (9) is obtained by putting
is an eigenvector of. In practice, we use normalized form of and , (i.e., and ).
3.2 Model II: DP-Walker
Our second method, Degree Penalty based Random Walk (DP-Walker), utilizes a skip-gram model, word2vec.
We start with a brief description of the word2vec model and its applications on network embedding tasks. Given a text corpus, Mikolov et al. proposed word2vec to learn the representation of each word by facilitating the prediction of its context. Inspired by it, some network embedding algorithms like DeepWalk [Perozzi, Al-Rfou, and Skiena2014] and node2vec [Grover and Leskovec2016] define a vertex’s “context” by co-occurred vertexes in random walk generated paths.
Specifically, the Random Walk rooted at vertex is a stochastic process , where is a vertex sampled from the neighbors of and m is the path length. Traditional methods regard
as a uniform distribution where each neighbor ofhas the equal chance to be sampled.
However, as our proposed Degree Penalty principle suggests, a neighbor of a vertex with high degree may not be similar to . In other words, shall have less chance to be sampled as one of ’s context. Thus, we define the probability of the random walk jumping from to as
where can be found in Eq. 7 and is the model parameter. According to Eq. 14, we find that will have a greater chance to be sampled when it has more common neighbors with and has a lower degree. After obtaining random walk generated paths, we enable skip-gram to learn effective vertex representations for by predicting each vertex’s context. This results in an optimization problem
where is the path length we consider. Specifically, for each random walk , we feed it to skip-gram algorithm [Mikolov et al.2013] to update the vertex representations. For implementation, we use Hierarchical Softmax [Morin and Bengio2005, Mnih and Hinton2009]
to estimate the concerned probability distribution.
4.1 Experiment Setup
Datasets. We use four datasets, whose statistics are summarized in Table 1 for the evaluations.
Synthetic: We generate a synthetic dataset by the Preferential Attachment model [Vazquez2003], which describes the generation of scale-free networks.
Coauthor [Leskovec, Kleinberg, and Faloutsos2007]: This network covers scientific collaborations between authors. Vertexes are authors. Vertexes are authors. An undirected edge exists between two authors if they have coauthored at least one paper.
Citation [Tang et al.2008]: Similar to Coauthor, this is also an academic network, where vertexes are authors. Edges indicate citations instead of coauthor relationship.
Mobile: This is a mobile network provided by PPDai333The largest unsecured micro-credit loan platform in China.. Vertexes are PPDai registered users. An edge between two users indicates that one of the users has called the other. Overall, it consists of over one million calling logs.
Baseline methods. We compare the following four network embedding algorithms in our experiments:
Laplacian Eigenmap (LE) [Belkin and Niyogi2003]: This represents spectral-based network embedding algorithms. It aims to learn the low-dimensional representation to expand the manifold where the data lie.
DeepWalk [Perozzi, Al-Rfou, and Skiena2014]: This represents skip-gram model based network embedding algorithms. It first generates random walks on the network, and defines the context of a vertex by its co-occurred vertexes in paths. Then, it learns vertex representations by predicting each vertex’s context. Specifically, we perform random walks starting from each vertex, and each random walk will have a length of .
DP-Spectral: This is a spectral technique based implementation of our degree penalty principle.
DP-Walker: This is another implementation of our approach. It is based on a skip-gram model.
Unless otherwise specified, the embedding dimension for our experiments is .
Tasks. We first utilize different algorithms to learn vertex representations for a certain dataset, then apply the learned embeddings in three different tasks:
Network reconstruction: this task aims to validate if an algorithm is able to preserve the scale-free property of networks. We evaluate the performance of different algorithms by the correlation coefficients between the reconstructed degrees and the degrees in the given network.
Vertex classification: on this task, we consider the vertex labels. For instance, in Citation, each vertex has a label to indicate the researcher’s research area. Specifically, given a vertex , we define its feature vector as , and train a linear regression classifier to determine its label.
4.2 Network Reconstruction
Comparison results. In this task, after obtaining vertex representations, we use the -NN algorithm introduced in Section 2 to reconstruct the network. We then evaluate the performance of different methods on reconstructing power-law distributed vertex degrees by considering three different correlation coefficients of the reconstructed degrees and original degrees: Pearson’s [Shao2007], Spearman’s [Spearman1904], and Kendall’s [Kendall1938]. All of these statistics are used to evaluate some relationship between paired samples. Pearson’s is used to detect linear relationships, while Spearman’s and Kendall’s are capable of finding monotonic relationships.
To select , for each algorithm, we roll over all values of ranging from to with step , and choose the value which maximizes the Pearson’s correlation coefficient between the original and recovered degrees. Our selection of Pearson’s
as evaluation metric is because of the scale-free property of degree distribution.
From Table 2, we see that our algorithms outperform the baselines significantly. Especially, Pearson’s of DP-Walker achieves at least improvement, and Spearman’s of DP-Spectral achieves at least improvement. The good fitness of the vertex degree reconstructed by our algorithms suggests that we can better preserve the scale-free property of the network. Besides, we see that the best to maximize Pearson’s for DP-Spectral is more stable (i.e., around ) than other methods. Thus one can tune DP-Spectral’s parameters more easily in practice.
Preserving Scale-free Property. After reconstructing the network, for each embedding algorithm, we validate its effectiveness to preserve the scale-free property, by fitting the reconstructed degree distribution (e.g., Figure 1(b) and (c)) to a theoretical power-law distribution [Alstott, Bullmore, and Plenz2014]. We then calculate the Kolmogorov-Smirnov distance between these two distributions and find that the proposed methods can always obtain better results compared to baselines (0.115 vs 0.225 averagely).
Parameter analysis. We further study the sensitivity of model parameters: embedding dimension and degree penalty weight . We only present the results on Synthetic and Facebook and omit others due to space limitation. Figure 3(a) and 3(b) shows the Pearson correlation coefficients resulted by our algorithm with different embedding dimensions. When the embedding dimension grows, the performance increases significantly. The figures suggest that when embedding a network into Euclidean space, there is a dimension which is sufficient for preserving scale-free property, and further increase of the embedding dimension has limited effect. Correlation does not change drastically for DP-Walker as the dimension increases, as the figure suggests. The figures also show that DP-Walker is able to obtain fairly high Pearson correlation even in a lower dimension and it requires an embedding dimension lower than DP-Spectral does for a satisfactory performance.
We also study how influences the performance. Figure 3(c) and 3(d) shows that DP-Spectral is more sensitive to the choice of . This is largely due to the fact that in DP-Spectral, the influence of is manifested in the objective function (Eq. 9), which imposes a stronger restriction than its counterpart in DP-Walker, which is embodied in the sampling process. Figure 3(c) and 3(d) shows that the optimal choice of varies from graph to graph. It makes sense, since can be viewed as a punishment on the degrees, and the influence of is therefore supposedly related to the topology of the original network.
4.3 Link Prediction
In this task, we consider the following evaluation metrics: Precision, Recall, and F1-score. Table 3 demonstrates the performance of different methods on the link prediction task. For our methods, we use the model parameter as optimized in Table 2. We see that, in most cases, DP-Spectral obtains the best performance, which suggests that with the help of the proposed principle, we can not only preserve the scale-free property of networks, but also improve the effectiveness of the embedding vectors.
4.4 Vertex Classification
lists the accuracy of vertex classification task on Citation. Our task is to determine an author’s research area, which is a multi-classification problem. We define features as vertex representation obtained by the four different embedding algorithms. Generally, from the table, we see that DP-Walker and DP-Spectral beat respectively Deepwalk and Laplacian Eigenmap. In particular, DP-Spectral achieves the best result for 5 out of 7 labels. Besides, we can also observe its stability of the performance. DP-Spectral’s result on all labels is more stable than other methods. In comparison, LE achieves a satisfactory result for two labels, but for others the result can be poor. Specifically, the standard deviation of DP-Spectral is 0.04, while the value is 0.26 for LE and 0.1 for DeepWalk.
5 Related Work
Network embedding aims to learn representations for vertexes in a given network. Some researchers regard network embedding as part of dimensionality reduction techniques. For example, Laplacian Eigenmaps (LE) [Belkin and Niyogi2003] aims to learn the vertex representation to expand the manifold where the data lie. As a variant of LE, Locality Preserving Projections (LPP) [He et al.2005] learns a linear projection from feature space to embedding space. Besides, there are other linear [Jolliffe2002] and non-linear [Tenenbaum, De Silva, and Langford2000] network embedding algorithms for dimensionality reduction. Recent network embedding works take advancements in natural language processing, most notably models known as word2vec [Mikolov et al.2013]
, which learns the distributed representations of words. Building on word2vec, Perozzi et al. define a vertex’s “context” by their co-occurrence in a random walk path[Perozzi, Al-Rfou, and Skiena2014]. More recently, Grover et al. propose a mixture of width-first and breadth-first search based procedure to generate paths of vertexes [Grover and Leskovec2016]. Dong et al. further develop the model to handle heterogeneous networks [Dong, Chawla, and Swami2017]. LINE [Tang et al.2015] decomposes a vertex’s context into first-order (neighbors) and second-order (two-degree neighbors) proximity. Wang et al. preserve community information in their vertex representations [Wang et al.2017]. However, all of above methods focus on preserving microscopic network structure and ignore macroscopic scale-free property of networks.
Scale-free Networks. The scale-free property has been discovered to be ubiquitous over a variety of network systems [Mood1950, Newman2005, Clauset, Shalizi, and Newman2009b], such as the Internet Autonomous System graph [Faloutsos, Faloutsos, and Faloutsos1999], the Internet router graph [Govindan and Tangmunarunkit2000], the degree distributions of subsets of the world wide web [Barabási and Albert1999]. newman2005power provides a comprehensive list of such work [Newman2005]. However, investigating the scale-free property in a low-dimensional vector space and establishing its cooperation with network embedding have not been fully considered.
In this paper, we study the problem of learning the scale-free property preserving network embeddings. We first analyze the feasibility of reconstructing a scale-free network based on learned vertex representations in the Euclidean space by converting our problem to the Sphere Packing problem. We then propose the degree penalty principle as our solution and introduce two implementations by leveraging spectral techniques and a skip-gram model respectively. The proposed principle can also be implemented using other methods, which is left as our future work. We at last conduct extensive experiments on both synthetic data and five real-world datasets to verify the effectiveness of our approach.
The work is supported by the Fundamental Research Funds for the Central Universities, 973 Program (2015CB352302), NSFC (U1611461, 61625107, 61402403), and key program of Zhejiang Province (2015C01027).
- [Adamic and Huberman1999] Adamic, L., and Huberman, B. A. 1999. The nature of markets in the world wide web. Q. J. Econ.
- [Alanis-Lobato, Mier, and Andrade-Navarro2016] Alanis-Lobato, G.; Mier, P.; and Andrade-Navarro, M. A. 2016. Efficient embedding of complex networks to hyperbolic space via their laplacian. In Scientific reports.
- [Alstott, Bullmore, and Plenz2014] Alstott, J.; Bullmore, E.; and Plenz, D. 2014. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. PLoS ONE 9:e85777.
- [Barabási and Albert1999] Barabási, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. science 286(5439):509–512.
- [Belkin and Niyogi2003] Belkin, M., and Niyogi, P. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6):1373–1396.
- [Clauset, Shalizi, and Newman2009a] Clauset, A.; Shalizi, C. R.; and Newman, M. E. J. 2009a. Power-law distributions in empirical data. SIAM Review 51(4):661–703.
- [Clauset, Shalizi, and Newman2009b] Clauset, A.; Shalizi, C. R.; and Newman, M. E. 2009b. Power-law distributions in empirical data. SIAM review 51(4):661–703.
- [Cohn and Elkies2003] Cohn, H., and Elkies, N. 2003. New upper bounds on sphere packings i. Annals of Mathematics 689–714.
- [Cohn and Zhao2014] Cohn, H., and Zhao, Y. 2014. Sphere packing bounds via spherical codes. Duke Mathematical Journal 163(10):1965–2002.
- [Dong, Chawla, and Swami2017] Dong, Y.; Chawla, N.; and Swami, A. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD’17, 135–144.
- [Faloutsos, Faloutsos, and Faloutsos1999] Faloutsos, M.; Faloutsos, P.; and Faloutsos, C. 1999. On power-law relationships of the internet topology. In COMPUT COMMUN REV, volume 29, 251–262.
- [Govindan and Tangmunarunkit2000] Govindan, R., and Tangmunarunkit, H. 2000. Heuristics for internet map discovery. In INFOCOM’00, volume 3, 1371–1380.
- [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In KDD’16, 855–864.
- [He et al.2005] He, X.; Yan, S.; Hu, Y.; Niyogi, P.; and Zhang, H. 2005. Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3):328–340.
- [Jolliffe2002] Jolliffe, I. 2002. Principal component analysis. Wiley Online Library.
- [Kabatiansky and Levenshtein1978] Kabatiansky, G. A., and Levenshtein, V. I. 1978. On bounds for packings on a sphere and in space. Problemy Peredachi Informatsii 14(1):3–25.
- [Kendall1938] Kendall, M. G. 1938. A new measure of rank correlation. Biometrika 30(1/2):81–93.
- [Leskovec and Mcauley2012] Leskovec, J., and Mcauley, J. 2012. Learning to discover social circles in ego networks. In NIPS’12, 539–547.
- [Leskovec, Kleinberg, and Faloutsos2007] Leskovec, J.; Kleinberg, J. M.; and Faloutsos, C. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery From Data 1(1):2.
- [Mikolov et al.2013] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. In NIPS’13, 3111–3119.
- [Mnih and Hinton2009] Mnih, A., and Hinton, G. E. 2009. A scalable hierarchical distributed language model. In Koller, D.; Schuurmans, D.; Bengio, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems 21. Curran Associates, Inc. 1081–1088.
- [Mood1950] Mood, A. M. 1950. Introduction to the theory of statistics.
Morin, F., and Bengio, Y.
Hierarchical probabilistic neural network language model.In AISTATS’05, 246–252.
- [Newman2005] Newman, M. E. 2005. Power laws, pareto distributions and zipf’s law. Contemporary physics 46(5):323–351.
- [Perozzi, Al-Rfou, and Skiena2014] Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In KDD’14, 701–710.
- [Shao2007] Shao, J. 2007. Mathematical Statistics. Springer, 2 edition.
[Shaw and Jebara2009]
Shaw, B., and Jebara, T.
Structure preserving embedding.
Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, 937–944. New York, NY, USA: ACM.
- [Spearman1904] Spearman, C. 1904. The proof and measurement of association between two things. The American Journal of Psychology 15(1):72–101.
- [Tang et al.2008] Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z. 2008. Arnetminer: extraction and mining of academic social networks. In KDD’08, 990–998.
- [Tang et al.2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Large-scale information network embedding. In WWW’15, 1067–1077.
- [Tenenbaum, De Silva, and Langford2000] Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319–2323.
- [Vance2011] Vance, S. 2011. Improved sphere packing lower bounds from hurwitz lattices. Advances in Mathematics 227(5):2144–2156.
- [Vazquez2003] Vazquez, A. 2003. Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E 67(5):056104.
- [Venkatesh2012] Venkatesh, A. 2012. A note on sphere packings in high dimension. International mathematics research notices 2013(7):1628–1642.
- [Wang et al.2017] Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; and Yang, S. 2017. Community preserving network embedding. In AAAI’17.