1 Introduction
Network analysis has attracted considerable research efforts in many areas of artificial intelligence, as networks are able to encode rich and complex data, such as human relationships and interactions. One major challenge of network analysis is how to represent network properly so that the network structure can be preserved. The most straightforward way is to represent the network by its adjacency matrix. However, it suffers from the data sparsity. Other traditional network representation relies on handcrafted network feature design (e.g., clustering coefficient), which is inflexible, nonscalable, and requires hard human labor.
In recent years, network representation learning, also known as network embedding, has been proposed and aroused considerable research interest. It aims to automatically project a given network into a lowdimensional latent space, and represent each vertex by a vector in that space. For example, a number of recent works apply advances in natural language processing (NLP), most notably models known as word2vec
[Mikolov et al.2013], to network embedding and propose word2vecbased algorithms, such as DeepWalk [Perozzi, AlRfou, and Skiena2014] and node2vec [Grover and Leskovec2016]. Besides, researchers also consider network embedding as part of dimensionality reduction techniques. For instance, Laplacian Eigenmap (LE) [Belkin and Niyogi2003] aims to learn the lowdimensional representation to expand the manifold where the data lie.Essentially, these methods mainly focus on preserving microscopic structure of network, like pairwise relationship between vertexes. Nevertheless, scalefree property, one of the most fundamental macroscopic properties of networks, is largely ignored.
Scalefree property depicts that the vertex degrees follow a powerlaw distribution, which is a common knowledge for many realworld networks. We take an academic network as the example, where each edge indicates if a vertex (researcher) has cited at least one publication of another vertex (researcher). Figure 1(a)
demonstrates the degree distribution of this network. The linearity on loglog scale suggests a powerlaw distribution: the probability decreases when the vertex degree grows, with a long tail tending to zero
[Faloutsos, Faloutsos, and Faloutsos1999, Adamic and Huberman1999]. In other words, there are only a few vertexes of high degrees. The majority of vertexes connected to a highdegree vertex is, however, of low degree, and not likely connected to each other.Moreover, compared with the microscopic structure, the macroscopic scalefree property imposes a higher level constraint on the vertex representations: only a few vertexes can be close to many others in the learned latent space. Incorporating scalefree property in network embedding can reflect and preserve the sparsity of realworld networks and, in turn, provide effective information to make the vertex representations more discriminative.
In this paper, we study the problem of learning scalefree property preserving network embedding. As the representation of a network, vertex embedding vectors are expected to well reconstruct the network. Most existing algorithms learn network embedding in the Euclidean space. However, we find that most traditional network embedding algorithms will overestimate the number of higher degrees. Figure 1(b) gives an example, where the representation is learned by Laplacian Eigenmap. We analyze and try to understand this theoretically, and study the feasibility of recovering powerlaw distributed vertex degree in the Euclidean space, by converting our problem to the Sphere Packing problem. Through our analysis we find that theoretically, moderately increasing the dimension of embedding vectors can help to preserve the scalefree property. See details in Section 2.
Inspired by our theoretical analysis, we then propose the degree penalty principle for designing scalefree property preserving network embedding algorithms in Section 3: punishing the proximity between vertexes with high degrees. We further introduce two implementations of our approach based on the spectral techniques [Belkin and Niyogi2003] and the skipgram model [Mikolov et al.2013]. As Figure 1(c) suggests, our approach can better preserve the scalefree property of networks. In particular, the KolmogorovSmirnov (aka. KS statistic) distance between the obtained degree distribution and its theoretical powerlaw distribution is , while the value for the degree distribution obtained by LE is (the smaller the better).
To verify the effectiveness of our approach, we conduct experiments on both synthetic data and five realworld datasets in Section 4. The experimental results show that our approach is able to not only preserve the scalefree property of networks, but also outperform several stateoftheart embedding algorithms in different network analysis tasks.
We summarize our contribution of this paper as follows:

We analyze the difficulty and feasibility of reconstructing a scalefree network based on learned vertex representations in the Euclidean space, by converting our problem to the Sphere Packing problem.

We propose the degree penalty principle and two implementations to preserve the scalefree property of networks and improve the effectiveness of vertex representations.

We validate our proposed principle by conducting extensive experiments and find that our approach achieves a significant improvement on six datasets and three tasks compared to several stateoftheart baselines.
2 Theoretical Analysis
In this section, we try to study why most network embedding algorithms will overestimate higher degrees theoretically, and analyze if there exists a solution of scalefree property preserving network embedding in the Euclidean space. This section also provides intuitions of our approach in Section 3.
2.1 Preliminaries
Notations. We consider an undirected network , where is the vertex set containing vertexes and is the edge set. Each indicates an undirected edge between and . We define the adjacency matrix of as , where if and otherwise. Let be a diagonal matrix where is the degree of .
Network embedding. In this paper, we focus on the representation learning for undirected networks. Given an undirected graph , the problem of graph embedding aims to represent each vertex into a lowdimensional space , i.e., learning a function , where is the embedding matrix, and network structures can be preserved in .
Network reconstruction. As the representation of a network, the learned embedding matrix is expected to well reconstruct the network. In particular, one can reconstruct the network edges based on distances between vertexes in the latent space . For example, the probability of an edge existing between and can be defined as
(1) 
where the Euclidean distance between embedding vectors and of the vertex and , in respective, is considered. In practice, a threshold is chosen and an edge will be created if . We call the above method as NN, which is geometrically informative and a common method used in many existing work [Shaw and Jebara2009, Belkin and Niyogi2003, AlanisLobato, Mier, and AndradeNavarro2016].
2.2 Reconstructing Scalefree Networks
Given a network, we call it as a scalefree network, when its vertex degrees follow a powerlaw distribution. In other words, there are only a few vertexes of high degrees. The majority of vertexes connected to a highdegree vertex is, however, of low degree, and not likely connected to each other. Formally, the probability density function of vertex degree
has the following form:(2) 
where is the exponent parameter and is the normalization term. In practice, the above powerlaw form only applies for vertexes with degrees greater than a certain minimum value [Clauset, Shalizi, and Newman2009a].
However, it is difficult to reconstruct a scalefree network in the Euclidean space by NN. As we see in Figure 1(b), the higher degrees will be overestimated. We aim to explain this theoretically.
Intuition. While reconstructing the network by NN, we select a certain , and for a vertex with embedding vector , we regard all points that fall in the closed ball of radius centered at , denoted by , as ones having edges with . When is a highdegree vertex, there will be many points in . We expect these points are far away from each other, keeping more vertexes with low degree and thus in turn keep the powerlaw distribution. However, intuitively, as these points are placed in the same closed ball , it will be more likely that their distances from each other are less than . As a result, there will be many edges created among those points, which violates the assumption of the scalefree property.
Following the above idea, we introduce a theorem below, which is discussed in , and without loss of generality, we set as .
Theorem 1 (Sphere Packing).
There are points in whose distances from each other are larger than or equal to , if and only if, there exists disjoint spheres of radius in .
Proof.
The center of any sphere of radius in falls in and the distance between any two centers of two different spheres of radius is larger than or equal to 1. ∎
Remark.
Theorem 1 converts our problem of reconstructing a scalefree network to the Sphere Packing Problem, which seeks to find the optimal packing of spheres in high dimensional spaces [Cohn and Elkies2003, Vance2011, Venkatesh2012]. Figure 2 gives an example, where a network centered by a highdegree vertex is embedded into a twodimensional space. The embedding result corresponds to an equivalent Sphere Packing solution, which fails to place enough disjoint spheres and leads to a failed network reconstruction (many nonexistent edges are created). Formally, we define the packing density as follows:
Definition 1 (Packing Density).
The packing density is the fraction of the space filled by the spheres making up the packing. For convenience, we define the optimal sphere packing density in as , which means no packing of spheres in achieves a packing density larger than .
As Theorem 1 suggests, we aim to find a packing solution with large density so that more points in a closed sphere can keep their distance larger than 1. However, in general cases, finding the optimal packing density remains an open problem. Still, we are able to derive the upper and lower bounds of for sufficiently large .
Theorem 2 (Upper and lower bounds for ).
Specifically, we have
(3) 
And the following inequality holds for sufficiently large :
(4) 
Eq. 4 is one of the best upper bound when [Cohn and Zhao2014]. The proof of the above theorem can be found in the work of kabatiansky1978bounds.
Theorem 3.
Suppose is in , , and there can be at most points in whose distances from each other are larger than , then
(5) 
where means taking the integer part. The upper bound holds for sufficiently large
Proof.
By Theorem 1
we only need to estimate the number of disjoint spheres of radius
that can be fitted in . The volume of a dimensional ball of radius is given by . Plugging in the radius, the volumes of and a sphere of radius are respectively and . Since the optimal packing density is given by , we haveDiscussion. The lower bound of in Eq. 5 suggests the feasibility to reconstruct scalefree network by NN in the Euclidean space, when , the dimension of embedding vector, is sufficiently large. For instance, when , we have , which is enough to keep scalefree property holds for most realworld networks.
3 Our Approach
General idea. Inspired by our theoretical analysis, we propose a principle of degree penalty for designing scalefree property preserving embedding algorithms: while preserving first and secondorder proximity, the proximity between vertexes that have high degrees shall be punished. We give the general idea behind this principle below.
Scalefree property is featured by the ubiquitous existence of “big hubs” which attract most edges in the network. Most of existing network embedding algorithms, explicitly or implicitly, attempt to keep these big hubs close to their neighbors by preserving firstorder and/or secondorder proximity [Belkin and Niyogi2003, Tang et al.2015, Perozzi, AlRfou, and Skiena2014]. However, connecting to big hubs does not imply proximity as strong as connecting to vertexes with mediocre degrees. Taking social network as an example, where a famous celebrity may receive a lot of followers. However, the celebrity may not be familiar or similar to her followers. Besides, two followers of the same celebrity may not know each other at all and can be totally dissimilar. As a comparison, a mediocre user is more likely to known and to be similar to her followers.
From another perspective, a highdegree vertex is more likely to hurt the scalefree property, as placing more disjoint spheres in a closed ball is more difficult (Section 2).
The degree penalty principle can be implemented by various methods. In this paper, we introduce two proposed models based on our principle, implemented by spectral techniques and skipgram models respectively.
3.1 Model I: DPSpectral
Our first model, Degree Penalty based Spectral Embedding (DPSpectral), mainly utilizes graph spectral techniques. Given a network and its adjacency matrix , we define a matrix to indicate the common neighbors of any two vertexes in :
(6) 
where is the number of common neighbors of and . can be also regarded as a measurement of secondorder proximity, and can be easily extended to further consider firstorder proximity by
(7) 
As we aim to model the influence of vertex degrees in our model, we further extend to consider degree penalty as
(8) 
where is the model parameter used to indicate the strength of degree penalty, and is a diagonal matrix where is the degree of . Thus is proportional to and is inversely proportional to vertex degrees.
Objective. Our goal is to learn vertex representations , where is the th row of and represents the embedding vector of vertex , and minimize
(9) 
under the constraint
(10) 
Eq. 10 is provided such that the embedding vectors will not collapse onto a subspace of dimension less than [Belkin and Niyogi2003].
Optimization. In order to minimize Eq. 9, we utilize graph Laplacian, which is an analogy to the Laplacian operator in multivariate calculus. Formally, we define graph Laplacian, , as follows:
(11) 
Observe that
(12) 
The desired minimizing the objective function (9) is obtained by putting
(13) 
3.2 Model II: DPWalker
Our second method, Degree Penalty based Random Walk (DPWalker), utilizes a skipgram model, word2vec.
We start with a brief description of the word2vec model and its applications on network embedding tasks. Given a text corpus, Mikolov et al. proposed word2vec to learn the representation of each word by facilitating the prediction of its context. Inspired by it, some network embedding algorithms like DeepWalk [Perozzi, AlRfou, and Skiena2014] and node2vec [Grover and Leskovec2016] define a vertex’s “context” by cooccurred vertexes in random walk generated paths.
Specifically, the Random Walk rooted at vertex is a stochastic process , where is a vertex sampled from the neighbors of and m is the path length. Traditional methods regard
as a uniform distribution where each neighbor of
has the equal chance to be sampled.However, as our proposed Degree Penalty principle suggests, a neighbor of a vertex with high degree may not be similar to . In other words, shall have less chance to be sampled as one of ’s context. Thus, we define the probability of the random walk jumping from to as
(14) 
where can be found in Eq. 7 and is the model parameter. According to Eq. 14, we find that will have a greater chance to be sampled when it has more common neighbors with and has a lower degree. After obtaining random walk generated paths, we enable skipgram to learn effective vertex representations for by predicting each vertex’s context. This results in an optimization problem
(15) 
where is the path length we consider. Specifically, for each random walk , we feed it to skipgram algorithm [Mikolov et al.2013] to update the vertex representations. For implementation, we use Hierarchical Softmax [Morin and Bengio2005, Mnih and Hinton2009]
to estimate the concerned probability distribution.
4 Experiments
4.1 Experiment Setup
Datasets. We use four datasets, whose statistics are summarized in Table 1 for the evaluations.

Synthetic: We generate a synthetic dataset by the Preferential Attachment model [Vazquez2003], which describes the generation of scalefree networks.

Facebook [Leskovec and Mcauley2012]: This dataset is a subnetwork of Facebook^{1}^{1}1http://facebook.com, where vertexes indicate users of Facebook, and edges denote friendship between users.

Twitter [Leskovec and Mcauley2012]: This dataset is a subnetwork of Twitter^{2}^{2}2http://twitter.com, where vertexes indicate users of Twitter, and edges denote following relationships.

Coauthor [Leskovec, Kleinberg, and Faloutsos2007]: This network covers scientific collaborations between authors. Vertexes are authors. Vertexes are authors. An undirected edge exists between two authors if they have coauthored at least one paper.

Citation [Tang et al.2008]: Similar to Coauthor, this is also an academic network, where vertexes are authors. Edges indicate citations instead of coauthor relationship.

Mobile: This is a mobile network provided by PPDai^{3}^{3}3The largest unsecured microcredit loan platform in China.. Vertexes are PPDai registered users. An edge between two users indicates that one of the users has called the other. Overall, it consists of over one million calling logs.
Synthetic  Coauthor  Citation  Mobile  
10000  4039  81306  5242  48521  198959  
399580  88234  1768149  28980  357235  1151003 
Baseline methods. We compare the following four network embedding algorithms in our experiments:

Laplacian Eigenmap (LE) [Belkin and Niyogi2003]: This represents spectralbased network embedding algorithms. It aims to learn the lowdimensional representation to expand the manifold where the data lie.

DeepWalk [Perozzi, AlRfou, and Skiena2014]: This represents skipgram model based network embedding algorithms. It first generates random walks on the network, and defines the context of a vertex by its cooccurred vertexes in paths. Then, it learns vertex representations by predicting each vertex’s context. Specifically, we perform random walks starting from each vertex, and each random walk will have a length of .

DPSpectral: This is a spectral technique based implementation of our degree penalty principle.

DPWalker: This is another implementation of our approach. It is based on a skipgram model.
Unless otherwise specified, the embedding dimension for our experiments is .
Tasks. We first utilize different algorithms to learn vertex representations for a certain dataset, then apply the learned embeddings in three different tasks:

Network reconstruction: this task aims to validate if an algorithm is able to preserve the scalefree property of networks. We evaluate the performance of different algorithms by the correlation coefficients between the reconstructed degrees and the degrees in the given network.

Link prediction: given two vertexes and , we feed their embedding vectors, and
, to a linear regression classifier and determine whether there exists an edge between
and . Specifically, we use as the feature, and randomly select about pairs of vertexes for training and evaluation. 
Vertex classification: on this task, we consider the vertex labels. For instance, in Citation, each vertex has a label to indicate the researcher’s research area. Specifically, given a vertex , we define its feature vector as , and train a linear regression classifier to determine its label.
4.2 Network Reconstruction
Dataset  Method ()  Pearson  Spearman  Kendall 
Synthetic  LE (0.55)  0.14  0.054  0.039 
DeepWalk (0.91)  0.47  0.22  0.18  
DPSpectral (0.52)  0.92  0.79  0.63  
DPWalker (0.95)  0.94  0.63  0.52  
LE (0.52)  0.48  0.18  0.12  
DeepWalk (0.81)  0.73  0.65  0.49  
DPSpectral (0.52)  0.87  0.67  0.51  
DPWalker (0.84)  0.75  0.73  0.57  
LE (0.81)  0.17  0.19  0.17  
DeepWalk (0.51)  0.08  0.21  0.26  
DPSpectral (0.93)  0.50  0.34  0.27  
DPWalker (0.087)  0.40  0.33  0.27  
Coauthor  LE (0.50)  0.32  0.04  0.03 
DeepWalk (0.92)  0.66  0.31  0.25  
DPSpectral (0.51)  0.64  0.69  0.55  
DPWalk (0.93)  0.75  0.44  0.35  
Citation  LE (0.99)  0.11  0.27  0.20 
DeepWalk (0.97)  0.51  0.28  0.20  
DPSpectral (0.50)  0.45  0.72  0.54  
DPWalker (0.98)  0.62  0.65  0.50  
Mobile  LE (0.51)  0.10  0.05  0.04 
DeepWalk (0.71)  0.77  0.22  0.20  
DPSpectral (0.50)  0.40  0.68  0.60  
DPWalker (0.78)  0.93  0.22  0.20 
Comparison results. In this task, after obtaining vertex representations, we use the NN algorithm introduced in Section 2 to reconstruct the network. We then evaluate the performance of different methods on reconstructing powerlaw distributed vertex degrees by considering three different correlation coefficients of the reconstructed degrees and original degrees: Pearson’s [Shao2007], Spearman’s [Spearman1904], and Kendall’s [Kendall1938]. All of these statistics are used to evaluate some relationship between paired samples. Pearson’s is used to detect linear relationships, while Spearman’s and Kendall’s are capable of finding monotonic relationships.
To select , for each algorithm, we roll over all values of ranging from to with step , and choose the value which maximizes the Pearson’s correlation coefficient between the original and recovered degrees. Our selection of Pearson’s
as evaluation metric is because of the scalefree property of degree distribution.
From Table 2, we see that our algorithms outperform the baselines significantly. Especially, Pearson’s of DPWalker achieves at least improvement, and Spearman’s of DPSpectral achieves at least improvement. The good fitness of the vertex degree reconstructed by our algorithms suggests that we can better preserve the scalefree property of the network. Besides, we see that the best to maximize Pearson’s for DPSpectral is more stable (i.e., around ) than other methods. Thus one can tune DPSpectral’s parameters more easily in practice.
Preserving Scalefree Property. After reconstructing the network, for each embedding algorithm, we validate its effectiveness to preserve the scalefree property, by fitting the reconstructed degree distribution (e.g., Figure 1(b) and (c)) to a theoretical powerlaw distribution [Alstott, Bullmore, and Plenz2014]. We then calculate the KolmogorovSmirnov distance between these two distributions and find that the proposed methods can always obtain better results compared to baselines (0.115 vs 0.225 averagely).
Parameter analysis. We further study the sensitivity of model parameters: embedding dimension and degree penalty weight . We only present the results on Synthetic and Facebook and omit others due to space limitation. Figure 3(a) and 3(b) shows the Pearson correlation coefficients resulted by our algorithm with different embedding dimensions. When the embedding dimension grows, the performance increases significantly. The figures suggest that when embedding a network into Euclidean space, there is a dimension which is sufficient for preserving scalefree property, and further increase of the embedding dimension has limited effect. Correlation does not change drastically for DPWalker as the dimension increases, as the figure suggests. The figures also show that DPWalker is able to obtain fairly high Pearson correlation even in a lower dimension and it requires an embedding dimension lower than DPSpectral does for a satisfactory performance.
We also study how influences the performance. Figure 3(c) and 3(d) shows that DPSpectral is more sensitive to the choice of . This is largely due to the fact that in DPSpectral, the influence of is manifested in the objective function (Eq. 9), which imposes a stronger restriction than its counterpart in DPWalker, which is embodied in the sampling process. Figure 3(c) and 3(d) shows that the optimal choice of varies from graph to graph. It makes sense, since can be viewed as a punishment on the degrees, and the influence of is therefore supposedly related to the topology of the original network.
Dataset  Method  Precision  Recall  F1 
Synthetic  LE  0.52  0.53  0.53 
Deepwalk  0.51  0.51  0.51  
DPSpectral  0.64  0.68  0.66  
DPWalker  0.61  0.63  0.62  
LE  0.75  0.92  0.83  
Deepwalk  0.84  0.97  0.90  
DPSpectral  0.76  0.98  0.85  
DPWalker  0.82  0.95  0.89  
LE  0.58  0.35  0.43  
Deepwalk  0.65  0.77  0.70  
DPSpectral  0.59  0.98  0.73  
DPWalker  0.54  0.58  0.56  
Coauthor  LE  0.61  0.83  0.70 
Deepwalk  0.55  0.58  0.56  
DPSpectral  0.62  0.89  0.73  
DPWalker  0.56  0.66  0.61  
Citation  LE  0.54  0.56  0.55 
Deepwalk  0.54  0.56  0.55  
DPSpectral  0.52  0.99  0.68  
DPWalker  0.55  0.57  0.56  
Mobile  LE  0.75  0.36  0.48 
Deepwalk  0.55  0.60  0.57  
DPSpectral  0.63  0.89  0.74  
DPWalker  0.54  0.58  0.56 
4.3 Link Prediction
In this task, we consider the following evaluation metrics: Precision, Recall, and F1score. Table 3 demonstrates the performance of different methods on the link prediction task. For our methods, we use the model parameter as optimized in Table 2. We see that, in most cases, DPSpectral obtains the best performance, which suggests that with the help of the proposed principle, we can not only preserve the scalefree property of networks, but also improve the effectiveness of the embedding vectors.
4.4 Vertex Classification
Table 4
lists the accuracy of vertex classification task on Citation. Our task is to determine an author’s research area, which is a multiclassification problem. We define features as vertex representation obtained by the four different embedding algorithms. Generally, from the table, we see that DPWalker and DPSpectral beat respectively Deepwalk and Laplacian Eigenmap. In particular, DPSpectral achieves the best result for 5 out of 7 labels. Besides, we can also observe its stability of the performance. DPSpectral’s result on all labels is more stable than other methods. In comparison, LE achieves a satisfactory result for two labels, but for others the result can be poor. Specifically, the standard deviation of DPSpectral is 0.04, while the value is 0.26 for LE and 0.1 for DeepWalk.
5 Related Work
Method  Acrh  CN  CS  DM  THM  GRA  UNK 
LE  0.36  0.75  0.14  0.37  0.46  0.13  0.86 
Deepwalk  0.54  0.54  0.52  0.56  0.56  0.56  0.85 
DPWalker  0.56  0.57  0.54  0.58  0.58  0.55  0.85 
DPSpectral  0.71  0.74  0.78  0.76  0.74  0.75  0.85 
Network embedding.
Network embedding aims to learn representations for vertexes in a given network. Some researchers regard network embedding as part of dimensionality reduction techniques. For example, Laplacian Eigenmaps (LE) [Belkin and Niyogi2003] aims to learn the vertex representation to expand the manifold where the data lie. As a variant of LE, Locality Preserving Projections (LPP) [He et al.2005] learns a linear projection from feature space to embedding space. Besides, there are other linear [Jolliffe2002] and nonlinear [Tenenbaum, De Silva, and Langford2000] network embedding algorithms for dimensionality reduction. Recent network embedding works take advancements in natural language processing, most notably models known as word2vec [Mikolov et al.2013]
, which learns the distributed representations of words. Building on word2vec, Perozzi et al. define a vertex’s “context” by their cooccurrence in a random walk path
[Perozzi, AlRfou, and Skiena2014]. More recently, Grover et al. propose a mixture of widthfirst and breadthfirst search based procedure to generate paths of vertexes [Grover and Leskovec2016]. Dong et al. further develop the model to handle heterogeneous networks [Dong, Chawla, and Swami2017]. LINE [Tang et al.2015] decomposes a vertex’s context into firstorder (neighbors) and secondorder (twodegree neighbors) proximity. Wang et al. preserve community information in their vertex representations [Wang et al.2017]. However, all of above methods focus on preserving microscopic network structure and ignore macroscopic scalefree property of networks.Scalefree Networks. The scalefree property has been discovered to be ubiquitous over a variety of network systems [Mood1950, Newman2005, Clauset, Shalizi, and Newman2009b], such as the Internet Autonomous System graph [Faloutsos, Faloutsos, and Faloutsos1999], the Internet router graph [Govindan and Tangmunarunkit2000], the degree distributions of subsets of the world wide web [Barabási and Albert1999]. newman2005power provides a comprehensive list of such work [Newman2005]. However, investigating the scalefree property in a lowdimensional vector space and establishing its cooperation with network embedding have not been fully considered.
6 Conclusion
In this paper, we study the problem of learning the scalefree property preserving network embeddings. We first analyze the feasibility of reconstructing a scalefree network based on learned vertex representations in the Euclidean space by converting our problem to the Sphere Packing problem. We then propose the degree penalty principle as our solution and introduce two implementations by leveraging spectral techniques and a skipgram model respectively. The proposed principle can also be implemented using other methods, which is left as our future work. We at last conduct extensive experiments on both synthetic data and five realworld datasets to verify the effectiveness of our approach.
Acknowledgements.
The work is supported by the Fundamental Research Funds for the Central Universities, 973 Program (2015CB352302), NSFC (U1611461, 61625107, 61402403), and key program of Zhejiang Province (2015C01027).
References
 [Adamic and Huberman1999] Adamic, L., and Huberman, B. A. 1999. The nature of markets in the world wide web. Q. J. Econ.
 [AlanisLobato, Mier, and AndradeNavarro2016] AlanisLobato, G.; Mier, P.; and AndradeNavarro, M. A. 2016. Efficient embedding of complex networks to hyperbolic space via their laplacian. In Scientific reports.
 [Alstott, Bullmore, and Plenz2014] Alstott, J.; Bullmore, E.; and Plenz, D. 2014. powerlaw: A Python Package for Analysis of HeavyTailed Distributions. PLoS ONE 9:e85777.
 [Barabási and Albert1999] Barabási, A.L., and Albert, R. 1999. Emergence of scaling in random networks. science 286(5439):509–512.
 [Belkin and Niyogi2003] Belkin, M., and Niyogi, P. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6):1373–1396.
 [Clauset, Shalizi, and Newman2009a] Clauset, A.; Shalizi, C. R.; and Newman, M. E. J. 2009a. Powerlaw distributions in empirical data. SIAM Review 51(4):661–703.
 [Clauset, Shalizi, and Newman2009b] Clauset, A.; Shalizi, C. R.; and Newman, M. E. 2009b. Powerlaw distributions in empirical data. SIAM review 51(4):661–703.
 [Cohn and Elkies2003] Cohn, H., and Elkies, N. 2003. New upper bounds on sphere packings i. Annals of Mathematics 689–714.
 [Cohn and Zhao2014] Cohn, H., and Zhao, Y. 2014. Sphere packing bounds via spherical codes. Duke Mathematical Journal 163(10):1965–2002.
 [Dong, Chawla, and Swami2017] Dong, Y.; Chawla, N.; and Swami, A. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD’17, 135–144.
 [Faloutsos, Faloutsos, and Faloutsos1999] Faloutsos, M.; Faloutsos, P.; and Faloutsos, C. 1999. On powerlaw relationships of the internet topology. In COMPUT COMMUN REV, volume 29, 251–262.
 [Govindan and Tangmunarunkit2000] Govindan, R., and Tangmunarunkit, H. 2000. Heuristics for internet map discovery. In INFOCOM’00, volume 3, 1371–1380.
 [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In KDD’16, 855–864.
 [He et al.2005] He, X.; Yan, S.; Hu, Y.; Niyogi, P.; and Zhang, H. 2005. Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3):328–340.
 [Jolliffe2002] Jolliffe, I. 2002. Principal component analysis. Wiley Online Library.
 [Kabatiansky and Levenshtein1978] Kabatiansky, G. A., and Levenshtein, V. I. 1978. On bounds for packings on a sphere and in space. Problemy Peredachi Informatsii 14(1):3–25.
 [Kendall1938] Kendall, M. G. 1938. A new measure of rank correlation. Biometrika 30(1/2):81–93.
 [Leskovec and Mcauley2012] Leskovec, J., and Mcauley, J. 2012. Learning to discover social circles in ego networks. In NIPS’12, 539–547.
 [Leskovec, Kleinberg, and Faloutsos2007] Leskovec, J.; Kleinberg, J. M.; and Faloutsos, C. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery From Data 1(1):2.
 [Mikolov et al.2013] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. In NIPS’13, 3111–3119.
 [Mnih and Hinton2009] Mnih, A., and Hinton, G. E. 2009. A scalable hierarchical distributed language model. In Koller, D.; Schuurmans, D.; Bengio, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems 21. Curran Associates, Inc. 1081–1088.
 [Mood1950] Mood, A. M. 1950. Introduction to the theory of statistics.

[Morin and
Bengio2005]
Morin, F., and Bengio, Y.
2005.
Hierarchical probabilistic neural network language model.
In AISTATS’05, 246–252.  [Newman2005] Newman, M. E. 2005. Power laws, pareto distributions and zipf’s law. Contemporary physics 46(5):323–351.
 [Perozzi, AlRfou, and Skiena2014] Perozzi, B.; AlRfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In KDD’14, 701–710.
 [Shao2007] Shao, J. 2007. Mathematical Statistics. Springer, 2 edition.

[Shaw and Jebara2009]
Shaw, B., and Jebara, T.
2009.
Structure preserving embedding.
In
Proceedings of the 26th Annual International Conference on Machine Learning
, ICML ’09, 937–944. New York, NY, USA: ACM.  [Spearman1904] Spearman, C. 1904. The proof and measurement of association between two things. The American Journal of Psychology 15(1):72–101.
 [Tang et al.2008] Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z. 2008. Arnetminer: extraction and mining of academic social networks. In KDD’08, 990–998.
 [Tang et al.2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Largescale information network embedding. In WWW’15, 1067–1077.
 [Tenenbaum, De Silva, and Langford2000] Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319–2323.
 [Vance2011] Vance, S. 2011. Improved sphere packing lower bounds from hurwitz lattices. Advances in Mathematics 227(5):2144–2156.
 [Vazquez2003] Vazquez, A. 2003. Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E 67(5):056104.
 [Venkatesh2012] Venkatesh, A. 2012. A note on sphere packings in high dimension. International mathematics research notices 2013(7):1628–1642.
 [Wang et al.2017] Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; and Yang, S. 2017. Community preserving network embedding. In AAAI’17.
Comments
There are no comments yet.