Code for NeurIPS paper "vGraph: A Generative Model for Joint CommunityDetection and Node Representation Learning"
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.READ FULL TEXT VIEW PDF
In this paper, we study how to simultaneously learn two highly correlate...
Representation learning of static and more recently dynamically evolving...
This paper addresses the problem of community detection in networked dat...
In most practical contexts network indexed data consists not only of a
Most existing community-related studies focus on detection, which aim to...
Community detection refers to the task of discovering groups of vertices...
Although the computational and statistical trade-off for modeling single...
Code for NeurIPS paper "vGraph: A Generative Model for Joint CommunityDetection and Node Representation Learning"
Implementation of the paper "vGraph: A Generative Model For Joint Community Detection and Node Representational Learning" under NeurIPS Reproducibility challenge 2019
Graphs, or networks, are a general and flexible data structure to encode complex relationships among objects. Examples of real-world graphs include social networks, airline networks, protein-protein interaction networks, and traffic networks. Recently, there has been increasing interest from both academic and industrial communities in analyzing graphical data. Examples span a variety of domains and applications such as node classification cao2015grarep ; tang2015line and link prediction gao2011temporal ; wang2011community in social networks, role prediction in protein-protein interaction networks krogan2006global , and prediction of information diffusion in social and citation networks newman2004finding .
One fundamental task of graph analysis is community detection, which aims to cluster nodes into multiple groups called communities. Each community is a set of nodes that are more closely connected to each other than to nodes in different communities. A community level description is able to capture important information about a graph’s global structure. Such a description is useful in many real-world applications, such as identifying users with similar interests in social networks newman2004finding or proteins with similar functionality in biochemical networks krogan2006global . Community detection has been extensively studied in the literature, and a number of methods have been proposed, including algorithmic approaches ahn2010link ; derenyi2005clique and probabilistic models gopalan2013efficient ; mcauley2014discovering ; yang2013overlapping ; yang2013community
. A classical approach to detect communities is spectral clusteringwhite2005spectral
, which assumes that neighboring nodes tend to belong to the same communities and detects communities by finding the eigenvectors of the graph Laplacian.
Another important task of graph analysis is node representation learning, where nodes are described using low-dimensional features. Node representations effectively capture local graph structure and are often used as features for many prediction tasks. Modern methods for learning node embeddings grover2016node2vec ; perozzi2014deepwalk ; tang2015line have proved effective on a variety of tasks such as node classification cao2015grarep ; tang2015line , link prediction gao2011temporal ; wang2011community and graph visualization tang2016node ; wang2016structural .
Clustering, which captures the global structure of graphs, and learning node embeddings, which captures local structure, are typically studied separately. Clustering is often used for exploratory analysis, while generating node embeddings is often done for predictive analysis. However, these two tasks are very correlated and it may be beneficial to perform both tasks simultaneously. The intuition is that (1) node representations can be used as good features for community detection (e.g., through K-means)cavallari2017learning ; rozemberczki2018gemsec ; tsitsulin2018verse , and (2) the node community membership can provide good contexts for learning node representations wang2017community . However, how to leverage the relatedness of node clustering and node embedding in a unified framework for joint community detection and node representation learning is under-explored.
In this paper, we propose a novel probabilistic generative model called vGraph for joint community detection and node representation learning. vGraph assumes that each node can be represented as a mixture of multiple communities and is described by a multinomial distribution over communities , i.e., . Meanwhile, each community is modeled as a distribution over the nodes , i.e., . vGraph models the process of generating the neighbors for each node. Given a node , we first draw a community assignment from . This indicates which community the node is going to interact with. Given the community assignment , we generate an edge by drawing another node according to the community distribution . Both the distributions and
are parameterized by the low-dimensional representations of the nodes and communities. As a result, this approach allows the node representations and the communities to interact in a mutually beneficial way. We also design a very effective algorithm for inference with backpropagation. We use variational inference for maximizing the lower-bound of the data likelihood. The Gumbel-Softmaxjang2016categorical trick is leveraged since the community membership variables are discrete. Inspired by existing spectral clustering methods dong2012clustering , we added a smoothness regularization term to the objective function of the variational inference routine to ensure that community membership of neighboring nodes is similar. The whole framework of vGraph is very flexible and general. We also show that it can be easily extended to detect hierarchical communities.
In the experiment section, we show results on three tasks: overlapping community detection, non-overlapping community detection, and node classification– all using various real-world datasets. Our results show that vGraph is very competitive with existing state-of-the-art approaches for these tasks. We also present results on hierarchical community detection.
Community Detection. Many community detection methods are based on matrix factorization techniques. Typically, these methods try to recover the node-community affiliation matrix by performing a low-rank decomposition of the graph adjacency matrix or other related matrices kuang2012symmetric ; li2018community ; wang2011community ; yang2013overlapping . These methods are not scalable due to the complexity of matrix factorization, and their performance is restricted by the capacity of the bi-linear models. Many other studies develop generative models for community detection. Their basic idea is to characterize the generation process of graphs and cast community detection as an inference problem yang2013community ; zhang2015incorporating ; zhou2015infinite . However, the computational complexity of these methods is also high due to complicated inference. Compared with these approaches, vGraph is more scalable and can be efficiently optimized with backpropagation and Gumbel-Softmax jang2016categorical ; maddison2016concrete . Additionally, vGraph is able to learn and leverage the node representations for community detection.
Node Representation Learning.
The goal of node representation learning is to learn distributed representations of nodes in graphs so that nodes with similar local connectivity tend to have similar representations. Some representative methods include DeepWalkperozzi2014deepwalk , LINE tang2015line , node2vec grover2016node2vec and GraphRep cao2015grarep . Typically, these methods explore the local connectivity of each node by conducting random walks with either breadth-first search perozzi2014deepwalk or depth-first search tang2015line . Despite their effectiveness in a variety of applications, these methods mainly focus on preserving the local structure of graphs, therefore ignoring global community information. In vGraph, we address this limitation by treating the community label as a latent variable. This way, the community label can provide additional contextual information which enables the learned node representations to capture the global community information.
Framework for node representation learning and community detection. There exists previous work cavallari2017learning ; jia2019communitygan ; tsitsulin2018verse ; tu2018unified ; wang2017community that attempts to solve community detection and node representation learning jointly. However, their optimization process alternates between community assignment and node representation learning instead of simultaneously solving both tasks cavallari2017learning ; tu2018unified . Compared with these methods, vGraph is scalable and the optimization is done end-to-end.
Mixture Models. Methodologically, our method is related to mixture models, particularly topic models (e.g. PSLA hofmann1999probabilistic and LDA blei2003latent ). These methods simulate the generation of words in documents, in which topics are treated as latent variables, whereas we consider generating neighbors for each node in a graph, and the community acts as a latent variable. Compared with these methods, vGraph parameterizes the distributions with node and community embeddings, and all the parameters are trained with backpropagation.
Graphs are ubiquitous in the real-world. Two fundamental tasks on graphs are community detection and learning node embeddings, which focus on global and local graph structures respectively and hence are naturally complementary. In this paper, we study jointly solving these two tasks. Let represent a graph, where is a set of vertices and is the set of edges. Traditional graph embedding aims to learn a node embedding for each where is predetermined. Community detection aims to extract the community membership for each node. Suppose there are communities on the graph , we can denote the community assignment of node as . We aim to jointly learn node embeddings and community affiliation of vertices .
In this section, we introduce our generative approach vGraph, which aims at collaboratively learning node representations and detecting node communities. Our approach assumes that each node can belong to multiple communities representing different social contexts epasto2019single . Each node should generate different neighbors under different social contexts. vGraph parameterizes the node-community distributions by introducing node and community embeddings. In this way, the node representations can benefit from the detection of node communities. Similarly, the detected community assignment can in turn improve the node representations. Inspired by existing spectral clustering methods dong2012clustering , we add a smoothness regularization term that encourages linked nodes to be in the same communities.
vGraph models the generation of node neighbors. It assumes that each node can belong to multiple communities. For each node, different neighbors will be generated depending on the community context. Based on the above intuition, we introduce a prior distribution for each node and a node distribution for each community . The generative process of each edge can be naturally characterized as follows: for node , we first draw a community assignment , representing the social context of during the generation process. Then, the linked neighbor is generated based on the assignment through . Formally, this generation process can be formulated in a probabilistic way:
vGraph parameterizes the distributions and by introducing a set of node embeddings and community embeddings. Note that different sets of node embeddings are used to parametrize the two distributions. Specifically, let denote the embedding of node used in the distribution , denote the embedding of node used in , and denote the embedding of the -th community. The prior distribution and the node distribution conditioned on a community are parameterized by two softmax models:
Calculating Eq. 3 can be expensive as it requires summation over all vertices. Thus, for large datasets we can employ negative sampling as done in LINE tang2015line using the following objective function:
where , is a noise distribution, and is the number of negative samples. This, combined with stochastic optimization, enables our model to be scalable.
To learn the parameters of vGraph, we try to maximize the log-likelihood of the observed edges, i.e., . Since directly optimizing this objective is intractable for large graphs, we instead optimize the following evidence lower bound (ELBO) kingma2013auto :
where is a variational distribution that approximates the true posterior distribution , and
represents the Kullback-Leibler divergence between two distributions.
Specifically, we parametrize the variational distribution
with a neural network as follows:
where denotes element-wise multiplication. We chose element-wise multiplication because it is symmetric and it forces the representation of the edge to be dependent on both nodes.
The variational distribution represents the community membership of the edge . Based on this, we can easily approximate the community membership distribution of each node , i.e., by aggregating all its neighbors:
where is the set of neighbors of node . To infer non-overlapping communities, we can simply take the of . However, when detecting overlapping communities instead of thresholding as in jia2019communitygan , we use
That is, we assign each edge to one community and then map the edge communities to node communities by gathering nodes incident to all edges within each edge community as in ahn2010link .
For optimization, we need to optimize the lower bound (5) w.r.t. the parameters in the variational distribution and the generative parameters. If is continuous, the reparameterization trick kingma2013auto can be used. However,
is discrete in our case. In principle, we can still estimate the gradient using a score function estimatorglynn1990likelihood ; williams1992simple
. However, the score function estimator suffers from a high variance, even when used with a control variate. Thus, we use the Gumbel-Softmax reparametrizationjang2016categorical ; maddison2016concrete to obtain gradients for the evidence lower bound. More specifically, we use the straight-through Gumbel-Softmax estimator jang2016categorical .
A community can be defined as a group of nodes that are more similar to each other than to those outside the group pei2015nonnegative . For a non-attributed graph, two nodes are similar if they are connected and share similar neighbors. However, vGraph does not explicitly weight local connectivity in this way. To resolve this, inspired by existing spectral clustering studies dong2012clustering , we augment our training objective with a smoothness regularization term that encourages the learned community distributions of linked nodes to be similar. Formally, the regularization term is given below:
is a tunable hyperparameter ,is a regularization weight, and is the distance between two distributions (squared difference in our experiments). Motivated by rozemberczki2018gemsec , we set to be the Jaccard’s coefficient of node and , which is given by:
where denotes the set of neighbors of . The intuition behind this is that serves as a similarity measure of how similar the neighbors are between two nodes. Jaccard’s coefficient is used for this metric and thus the higher the value of Jaccard’s coefficient, the more the two nodes are encouraged to have similar distribution over communities.
By combining the evidence lower bound and the smoothness regularization term, the entire loss function we aim to minimize is given below:
For large datasets, negative sampling can be used for the first term.
One advantage of vGraph’s framework is that it is very general and can be naturally extended to detect hierarchical communities. In this case, suppose we are given a -level tree and each node is associate with a community, the community assignment can be represented as a
-dimensional path vector, as shown in Fig. 3. Then, the generation process is formulated as below: (1) a tree path is sampled from a prior distribution . (2) The context is decoded from with . Under this model, the likelihood of the network is
At every node of the tree, there is an embedding vector associated with the community. Such a method is similar to the hierarchical softmax parameterization used in language models morin2005hierarchical .
As vGraph can detect both overlapping and non-overlapping communities, we evaluate it on three tasks: overlapping community detection, non-overlapping community detection, and vertex classification.
We evaluate vGraph on 20 standard graph datasets. For non-overlapping community detection and node classification, we use 6 datasets: Citeseer, Cora, Cornell, Texas, Washington, and Wisconsin. For overlapping communtiy detection, we use 14 datasets, including Facebook, Youtube, Amazon, Dblp, Coauthor-CS. For Youtube, Amazon, and Dblp, we consider subgraphs with the 5 largest ground-truth communities due to the runtime of baseline methods. To demonstrate the scalability of our method, we additionally include visualization results on a large dataset – Dblp-full. Dataset statistics are provided in Table 1. More details about the datasets is provided in Appendix A.
For overlapping community detection, we use F1-Score and Jaccard Similarity to measure the performance of the detected communities as in yang2013community ; li2018community . For non-overlapping community detection, we use Normalized Mutual Information (NMI) tian2014learning and Modularity. Note that Modularity does not utilize ground truth data. For node classification, Micro-F1 and Macro-F1 are used.
For overlapping community detection, we choose four competitive baselines: BigCLAM yang2013overlapping , a nonnegative matrix factorization approach based on the Bernoulli-Poisson link that only considers the graph structure; CESNA yang2013community , an extension of BigCLAM, that additionally models the generative process for node attributes; Circles mcauley2014discovering , a generative model of edges w.r.t. attribute similarity to detect communities; and SVI gopalan2013efficient , a Bayesian model for graphs with overlapping communities that uses a mixed‐membership stochastic blockmodel.
To evaluate node embedding and non-overlapping community detection, we compare our method with the five baselines: MF wang2011community , which represents each vertex with a low-dimensional vector obtained through factoring the adjacency matrix; DeepWalk perozzi2014deepwalk , a method that adopts truncated random walk and Skip-Gram to learn vertex embeddings; LINE tang2015line , which aims to preserve the first-order and second-order proximity among vertices in the graph; Node2vec grover2016node2vec , which adopts biased random walk and Skip-Gram to learn vertex embeddings; and ComE cavallari2017learning
, which uses a Gaussian mixture model to learn an embedding and clustering jointly using random walk features.
For all baseline methods, we use the implementations provided by their authors and use the default parameters. For methods that only output representations of vertices, we apply K-means to the learned embeddings to get non-overlapping communities. Results report are averaged over 5 runs. No node attributes are used in all our experiments. We generate node attributes using node degree features for those methods that require node attributes such as CESNA yang2013community and Circles mcauley2014discovering . It is hard to compare the quality of community results when the numbers of communities are different for different methods. Therefore, we set the number of communities to be detected, , as the number of ground-truth communities for all methods, as in li2018community . For vGraph, we use full-batch training when the dataset is small enough. Otherwise, we use stochastic training with a batch size of 5000 or 10000 edges. The initial learning rate is set to 0.05 and is decayed by 0.99 after every 100 iterations. We use the Adam optimizer and we trained for 5000 iterations. When smoothness regularization is used, is set to . For community detection, the model with the lowest loss is chosen. For node classification, we evaluate node embeddings after 1000 iterations of training. The dimension of node embeddings is set to 128 in all experiments for all methods. For the node classification task, we randomly select 70% of the labels for training and use the rest for testing.
Table 2 shows the results on overlapping community detection. Some of the methods are not very scalable and cannot obtain results in 24 hours on some larger datasets. Compared with these studies, vGraph outperforms all baseline methods in 11 out of 14 datasets in terms of F1-score or Jaccard Similarity, as it is able to leverage useful representations at node level. Moreover, vGraph is also very efficient on these datasets, since we use employ variational inference and parameterize the model with node and community embeddings. By adding the smoothness regularization term (vGraph+), we see a farther increase performance, which shows that our method can be combined with concepts from traditional community detection methods.
The results for non-overlapping community detection are presented in Table 3. vGraph outperforms all conventional node embeddings + K-Means in 4 out of 6 datasets in terms of NMI and outperforms all 6 in terms of modularity. ComE, another framework that jointly solves node embedding and community detection, also generally performs better than other node embedding methods + K-Means. This supports our claim that learning these two tasks collaboratively instead of sequentially can further enhance performance. Compare to ComE, vGraph performs better in 4 out of 6 datasets in terms of NMI and 5 out of 6 datasets in terms of modularity. This shows that vGraph can also outperform frameworks that learn node representations and communities together.
Table 4 shows the result for the node classification task. vGraph significantly outperforms all the baseline methods in 9 out of 12 datasets. The reason is that most baseline methods only consider the local graph information without modeling the global semantics. vGraph solves this problem by representing node embeddings as a mixture of communities to incorporate global context.
In order to gain more insight, we present visualizations of the facebook107 dataset in Fig. 6(a). To demonstrate that our model can be applied to large networks, we present results of vGraph on a co-authorship network with around 100,000 nodes and 330,000 edges in Fig. 6(b). More visualizations are available in appendix B. We can observe that the community structure, or “social context”, is reflected in the corresponding node embedding (node positions in both visualizations are determined by t-SNE of the node embeddings). To demonstrate the hierarchical extension of our model, we visualize a subset of the co-authorship dataset in Fig. 10. We visualize the first-tier communities and second-tier communities in panel (a) and (b) respectively. We can observe that the second-tier communities grouped under the same first-tier communities interact more with themselves than they do with other second-tier communities.
In this paper, we proposed vGraph, a method that performs overlapping (and non-overlapping) community detection and learns node and community embeddings at the same time. vGraph casts the generation of edges in a graph as an inference problem. To encourage collaborations between community detection and node representation learning, we assume that each node can be represented by a mixture of communities, and each community is defined as a multinomial distribution over nodes. We also design a smoothness regularizer in the latent space to encourage neighboring nodes to be similar. Empirical evaluation on 20 different benchmark datasets demonstrates the effectiveness of the proposed method on both tasks compared to competitive baselines. Furthermore, our model is also readily extendable to detect hierarchical communities.
Journal of machine Learning research, 3(Jan):993–1022, 2003.
Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages 289–296. Morgan Kaufmann Publishers Inc., 1999.
Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning, 8(3-4):229–256, 1992.
Citeseer, Cora, Cornell, Texas, Washington, and Wisconsin are available online111https://linqs.soe.ucsc.edu. For Youtube, Amazon, and Dblp, we consider subgraphs with the 5 largest ground-truth communities due to the runtime of the baseline methods.
Facebook222https://snap.stanfod.edu/data/ego-Facebook.html is a set of Facebook ego-networks. It contains 10 different ego-networks with identified circles. Social circles formed by friends are regarded as ground-truth communities.
Youtube333http://snap.stanford.edu/data/com-Youtube.html is a network of social relationships of Youtube users. The vertices represent users; the edges indicate friendships among the users; the user-defined groups are considered as ground-truth communities.
Amazon444http://snap.stanford.edu/data/com-Amazon.html is collected by crawling amazon website. The vertices represent products and the edges indicate products frequently purchased together. The ground-truth communities are defined by the product categories on Amazon.
Dblp555http://snap.stanford.edu/data/com-DBLP.html is a co-authorship network from Dblp. The vertices represent researchers and the edges indicate co-author relationships. Authors who have published in a same journal or conference form a community.
Coauthor-CS666https://aminer.org/aminernetwork is a computer science co-authorship network. We chose 21 conferences and group them into five categories: Machine Learning, Computer Linguistics, Programming language, Data mining, and Database.