Studies on attributed networks, which describe complex systems with features in a simplified way, are more and more popular in recent years. Different from the networks only characterizing the relationships between the nodes, attributed networks provide more information by collecting node features. Node features are ubiquitous in both nature and society. For example, in social networks, individual attributes are gender, nationality, location, and interests. In the protein-protein interaction networks, a protein is defined by the amino acid types, the protein structures (-helices,
-sheet or turns) and so on. A paper consists of a title, keywords, authors, and venue in the academic citation networks. In the World Wide Web, a web page contains words, pictures, videos, and so forth. The additional information plays a vital role in network analysis tasks, such as node clustering, node classification, link prediction, outlier detection, etc.
For the past few years, attributed network embedding or representation learning (RL) has become a research hotspot. As opposed to the traditional methods for specific tasks Gao et al. (2010); Perozzi et al. (2014a); Ruan et al. (2013), RL methods are task-independent and aim to map nodes to low-dimension and continuous embeddings, preserving both the topological properties and the attribute information of the attributed networks. The learned embeddings are used for a variety of tasks as new features of the nodes. Recently, many attributed network RL methods are proposed for broader applications, and these methods fall into two categories: discriminative methods and generative methods.
Discriminative methods can be further divided into three types: matrix factorization methods, random-walk based approaches, and graph neural networks (GNNs). Matrix factorization (MF) methods, such as TADW Yang et al. (2015) or AANE Huang et al. (2017), preprocess the node links into relation matrix or transform the attributes into the similarity matrix and then decompose the constructed matrice. However, these methods are time-consuming Pan et al. (2016); thus, they are unsuitable for large-scale attributed networks. Random-walk based methods for attributed networks are mainly based on DeepWalk Perozzi et al. (2014b) and Node2Vec Grover and Leskovec (2016) to learn embeddings. TriDNR Pan et al. (2016) uses DeepWalk to model the structural information and then adopts Paragraph2Vec to describe the relations among the nodes, the attributes, and the labels. Feat-Walk Huang et al. (2019) first constructs the similarity matrix of the attributes and then performs DeepWalk on both the adjacent and similarity matrices. DANE Gao and Huang (2018), ASNE Liao et al. (2018), and ANRL Zhang et al. (2018) first learn the structural proximity through executing Node2Vec or calculating the -order neighbors and then use deep neural networks to encode structural and attributed proximities to the embeddings nonlinearly. Above mentioned methods are transductive, which means that we need to retrain the algorithms when the new nodes come since the nodes do not share their parameters with others. To address this problem, many inductive methods, i.e., graph neural networks (GNNs), are proposed. Among all GNNs, graph convolutional network (GCN) Kipf and Welling (2016a) is the most popular method. Hamilton et. al. conclude the GCN and its variations to message passing algorithms adopting various aggregators to learn the node embeddings by aggregating the local attribute information Hamilton et al. (2017). Graph attention network (GAT) Veličković et al. (2017) introduces attention mechanism to describe the impact of valuable information on node embeddings. These discriminative methods usually require bias knowledge to choose random walk strategy or to predefine the objective functions elaborately, which profoundly influences their performance. However, gaining proper priori information is expensive and challengeable.
Generative methods generate new samples according to probability theory and then regard the gap between real and generative samples as the objective function. Thus, this kind of methods is more suitable for real-world networks without bias knowledge than the discriminative methods. For example, variational graph auto-encoder (VGAE)Kipf and Welling (2016b) considers a two-layer GCN as an encoder to learn the node embeddings, then calculates the link probability between two nodes according to the inner product of their embeddings, finally decodes the network topology according to the link probability. Adversarial regularized variational graph auto-encoder (ARVGA) Pan et al. (2018) incorporates the adversarial model to the VGAE for robust representation learning. Based on the decoding process, VGAE and ARVGA assume that the more similar the embeddings of two nodes are, the more likely they are connected. On the perspective of the generative model, these two methods can only generate the topological structure of networks but the node attributes.
The approaches above mainly focus on assortative networks, i.e., networks with communities, assuming that the embeddings of nodes who link densely are similar. However, most of them are unable to work well on the disassortative networks, such as networks with multipartite structures, hubs, or hybrid structures Yang et al. (2012), because the nodes in the same block do not have to linked densely in this kind of networks. One common model to characterize networks with complicated structural patterns is stochastic block model Holland and Leinhardt (1981), which introduces the concept of “block” and block-block link probability to fit in with diverse patterns. Recently, various extensions of SBM are presented for different tasks, like structural pattern detection Abbe (2017), link prediction Guimerà and Sales-Pardo (2009), signed networks analysis Jiang (2015); Yang et al. (2017), and dynamic networks evolution Yang et al. (2011). However,these SBMs only consider the network topology and they are not suitable for dealing with attributed networks. Additionally, the learning algorithms for SBMs cannot obtain the node embeddings directly. Based on the above analysis, finding a proper representation learning method for both assortative and disassortative attributed networks is a challenging problem.
To address the above problems, we propose a novel attributed network generative model and its learning algorithm inspired by the stochastic block model and neural networks. The main attributes of this paper are as follows:
The model can characterize and generate attributed networks with various structural patterns, such as communities, multipartite structures, hubs, or any hybrid of the mentioned structures.
The method introduces the node embedding as the latent variable for both the assortative and disassortative networks. Instead of learning embeddings directly, the proposed model deduces the corresponding distribution, which makes the model more robust.
Compared with the traditional probability model, the proposed model can model the datasets concerning more complicated distribution. The attributed networks can be regarded as the transformation from the latent embeddings with simple distributions through the complex neural networks.
2 The Attributed Network Generative Model
Let denote the attributed network with nodes, and each node has -dimension attributes. is the adjacent matrix and its element denotes node links to node , otherwise . or denotes the binary or continuous attribute matrix and its row denotes the attributes of node .
In this work, to model assortative and the disassortative attributed networks, we introduce “block” to embedding methods. Specifically, we assume three conditions: (a) A node belongs to one of blocks. (b) The embeddings of nodes in the same block are similar. (c) The nodes in the same blocks share similar link patterns. For example, we can describe an assortative networks with communities as follows: the link probabilities of any two nodes intra-blocks and inter-blocks are 0.9 and 0.1, respectively. We can also depict a disassortative networks with multipartite structures as that the link probabilities of any two nodes intra-blocks and inter-blocks are 0.1 and 0.9, respectively. Thus, we can model both the assortative and the disassortative networks.
Mathematically, we define an attributed network generative model (ANGM) as a 4-tuple:
where the -dimension vector refers to the node assignment probability wherein denotes how likely a node belongs to block , and it satisfies . is a matrix, and its elements denotes the link probability of two nodes in block and , respectively. and are two matrices, where is the dimension of embeddings, and
denote the mean and the standard deviation of the embeddings of the nodes in block. Given an attributed network, we can deduce two latent variables: membership vector and embedding matrix , wherein denotes node belongs to block and vector denotes the embedding of the node . Figure 1 shows the probabilistic graphic model of ANGM.
Based on ANGM, the generation process of an attributed network is designed as follows:
For each node :
Assign node to one of blocks according to the multinomial distribution: ;
Generate the embedding of node
according to the Gaussian distribution:;
Generate the attributes of node :
if is binary, i.e., ,
is generated according to a Bernoulli distribution:, where , .
if is continuous, i.e., , is generated according to a Gaussian distribution: , where , and .
where denotes the neural networks parameterized by , the input of is and the output is the parameters of the Bernoulli distribution or the Gaussian distribution. models the nonlinearity between the embeddings and the attributes.
For each node pair ():
Generate the link between node and node according to a Bernoulli distribution: .
According to the probabilistic graphic model as shown in Figure 1 and the generation process, the likelihood of the complete-data is written as:
According to the generative process, each factor of the likelihood in Eq. (2) is defined as follows. First, the probability of assigning nodes is
Then, the probability of generate nodes’ embeddings is
As for the probability of generating node attributes, if ,
Finally, the probability of generating the links between each pair of nodes is
The proposed generative model for attributed networks has two advantages. (1) It can generate networks with different structural patterns by setting different . For example, we can generate networks with communities by setting for , otherwise (), multipartite structures. (2) It defines the similarity of the node embedding from the perspective of “block” instead of “neighbors”. Thus, it considers global structural information.
3 The Learning Method
In this section, we will introduce the ANGM learning algorithm to fit the model to the given attributed networks. Based on Eq. (2), the log-likelihood of the observed data is
Our goal is to maximize for finding the optimal model for the given attributed network. However, it is intractable to calculate Eq. (9) directly. Thus, we introduce a variational distribution , and then we use Jensen’s inequality to gain the lower bound of Eq. (9). Alternatively, we will maximize the log-likelihood’s lower bound as shown in Eq. (10).
According to the mean-field theory, we know
We use neural networks parameterized by to calculate . The input is the node attribute , and the outputs are the parameters of the Gaussian distribution. For node ,
Then we assume that
where denotes the probability of node belonging to block .
Note that, we assume here. It is easy to extend to by using Gaussian distribution.
To minimize the , we will use the coordinate descent to optimize , , , and , and then use Adam to optimize the parameters of neural networks, i.e., and .
Now, we derive the update formulas of the parameters unrelated to the neural networks.
In this section, we first compare our model with the state-of-the-art methods for node clustering and node classification on real-world networks. Then, we visualize the learned embeddings on the synthetic networks to show the performance of our proposed model on both assortative and disassortative networks .
Here, we select seven state-of-the-art node embedding methods and SBM.
Node2Vec Grover and Leskovec (2016) is a random-walk based method for learning node embedding using only network topology.
GAE Kipf and Welling (2016b) is a graph auto-encoder method. The network topology and attribute are mapped to vectors by GCN, and then the vectors are decoded into the networks using the the embedding inner product.
VGAE Kipf and Welling (2016b) is a variational version of GAE based on VAE.
ASNE Liao et al. (2018) first learns the structure embeddings using Node2Vec, then feds the structure embeddings and attributes to the deep neural networks to learn the final embeddings.
ARGE Pan et al. (2018) adds the adversarial model to GAE to learn more robust embeddings.
ARVGE Pan et al. (2018) is a variational version of ARGE.
ANRL Zhang et al. (2018) is a neighbor enhancement auto-encoder for attribute information. It uses Node2Vec to learn structural proximity and then adopts the attribute-aware skip-gram model to fuse the topology and the attributes information.
SBM Latouche et al. (2012) is a probabilistic method for generating and analyzing networks with different structures.
4.2 Node Clustering and Node Classification on Real-world Networks
In this section, we will test our method on five real-world networks as shown in Table 1. Cornell, Texas, Washington, Wisconsin (Corn., Texa., Wash., and Wisc for short) are hypertext datasets from four universities. Citeseer (Cite. for short) is an academic citation network. Table 1 shows the statistic features of the datasets, where , , , and are numbers of nodes, edges, blocks, and attributes.
4.2.1 Data Analysis
First, we analyze the structures contained in real-world networks.
According to the definition of the structural patterns Yang et al. (2012), we show the block-block link probability matrices and block models of two selected networks: Cornell and Citeseer. Based on the ground truth, the element in the block matrices are calculated by , where denotes the number of links between block and in the real-world network and represents the number of links between block and in a full-linked network with the same ground truth as the real world network. For communities, generally speaking, node is more possibly connected to node if they belong to the same blocks. For multipartite structures, two nodes in different blocks are more likely to connect. From Figure 2 (a) and (b), Cornell is a disassortative network containing a community (block 2) and three multipartite structures (blocks 1-3, blocks 3-5, and blocks 2-4). From Figure 2 (c) and (d), the structural patterns are all communities in Citeseer which is an assortative network. Thus, the structures in Cornell are more complicated than those in Citeseer. Texas, Washington, Wisconsin are also disassortative networks.
4.2.2 Experiments Settings
For all embedding algorithms, we set the embedding dimension to 32 on Cornell, Texas, Washington, Wisconsin, and 128 on Citeseer for fairness. We first use them to learn the embeddings and then use GMM and SVM for clustering and classification tasks, respectively. For classification, we set 60% nodes to the training set and 40% nodes to the testing set. We choose the normalized mutual information (NMI) Kuncheva and Hadjitodorov (2004) and accuracy (AC) Xu et al. (2003) to evaluate the performances of the methods for clustering task. For the classification task, we choose Macro-F1 and Micro-F1 Pillai et al. (2012) as evaluation criteria.
4.2.3 Experimental Results
In Table 2, ANGM performs best on 5 and 4 of 6 networks under the NMI and AC metrics, respectively. ANGM improves the NMI score more than 20% on Cornell compared with the second-best method ANRL. In terms of AC metric, ANGM increases more than 15% compared with the second best method ARVGE on Wisconsin. On Citeseer with communities, ANGM comes to the second, but its AC score is only 3% less than that of the best method ANRL. From Table 3 we can see that the Macro-F1 and Micro-F1 of ANGM are the largest on 4 of 5 networks, especially on Washington, at 62.13% and 72.83% , which are 15% and 7% more than those of the second method (ASNE) respectively. On Citeseer, although ANGM is not the best, the difference of NMIs (or ACs) among the best four methods (ANGM, ANRL, GAE, and VGAE) is less than 1.5%.
From Tables 2 and 3, we can conclude that ANGM performs better on disassortative networks with complicated structural patterns (Cornell, Taxes, Washington, and Wisconsin) and comparably on assortative networks with communities (Citeseer) compared with other algorithms. This is because our proposed methods use to fit networks with different structures and the compared methods are designed for only assortative networks.
4.3 Visualization of Representation on Synthetic Networks with Different Structures
To test and visualize the performance of our method on networks with different structures, we generate four types of attributed networks.
4.3.1 Generation Model for Synthetic Networks
First, we use a simplified version of ANGM, i.e., , to generate attributed networks with different structures: communities, multipartite, hubs and hybrid structures. and are the numbers of nodes and blocks, respectively; are as the same meaning as they are in Section LABEL:model. In terms of attributes, we assume that the nodes in the same block have the similar attributes. If node belongs to block , we assume that the elements of the -dimension matrix are set as if , otherwise . For topologies, we generate 4 types of networks with as follows. For networks with communities, we set if , otherwise . For networks with multipartite structures, we set if , otherwise . For networks with hubs, we set if or or , otherwise . For networks with hybrid structures containing communities and multipartite networks (), we set as followings:
Here, we set , , , , , and . Figure 3 shows the adjacent and attribute matrices of the generated networks.
4.3.2 Experimental results
We first perform the embedding methods on four types of networks and then map the embeddings into the two dimension space by applying t-SNE Van Der Maaten (2014) and visualize them as shown Figures 4-7. Besides, we use GMM to cluster the nodes. Table 4 shows the clustering NMI and AC of the eight methods on four synthetic networks.
(1) ANGM finds all blocks on four types of networks and both NMI and AC of ANGM achieve to 100%, because the parameter in ANGM is capable to characterize networks with various structural patterns.
(2) Node2Vec performs worst on all networks, especially on networks with multipartite structures, because it only use the topology information but the attribute information.
(3) For graph auto-encoder based methods, the variational versions (VGAE and ARVGE) outperform the non-variational versions (GAE and ARGE). Compared with GAE and AEGE which learn the embeddings directly, VGAE and ARVGE learn the distributions of the embeddings, which enhances the robustness of the algorithms. ANGM also learns the distributions of the embeddings, thus, ANGM is robust too.
(4) Among four types of networks, most comparing algorithms, especially the GCN-based algorithms (GAE, VGAE, ARGE, and ARVGE), perform worst on the network with multipartite structures and perform best on that with communities. Since they assume that the attributes propagate based on the links, therefore, they are only suitable to the case of linked nodes sharing the similar embeddings.
In this paper, we propose a novel block-based generative model for the attributed networks representation learning. Accordingly, we introduce “block” to attributed network embedding methods. The link patterns related to blocks can define assortative networks with communities and disassortative networks with multipartite structures, hubs, or any hybrid of them. Then, we use neural networks to depict the nonlinearity between the node embeddings and the node attributes. The topology information and the attribute information are joint together by assuming that the nodes in the same blocks share similar embeddings and similar link patterns. Finally, the variational inference is introduced for learning the parameters of the proposed model. Experiments show that our proposed model remarkably outperforms state-of-the-art methods on both real-word and synthetic attributed networks with various structural patterns.
Acknowledgements.This work was supported by the National Natural Science Foundation of China [grant numbers 61572226, 61876069]; Jilin Province Natural Science Foundation [grant number 20150101052JC]; Jilin Province Key Scientific and Technological Research and Development project [grant numbers 20180201067GX, 20180201044GX].
Community detection and stochastic block models: recent developments.
The Journal of Machine Learning Research18 (1), pp. 6446–6531. Cited by: §1.
Deep attributed network embedding.
Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3364–3370. Cited by: §1.
- On community outliers and their efficient detection in information networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 813–822. Cited by: §1.
- Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Cited by: §1, item .
- Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences 106 (52), pp. 22073–22078. Cited by: §1.
- Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §1.
An exponential family of probability distributions for directed graphs. Journal of the american Statistical association 76 (373), pp. 33–50. Cited by: §1.
- Accelerated attributed network embedding. In Proceedings of the 2017 SIAM international conference on data mining, pp. 633–641. Cited by: §1.
- Large-scale heterogeneous feature embedding. Proceedings of the AAAI Conference on Artificial Intelligence 33, pp. 3878–3885. External Links: Cited by: §1.
- Stochastic block model and exploratory analysis in signed networks. Physical Review E 91 (6), pp. 062805. Cited by: §1.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1.
Variational graph auto-encoders.
NIPS Workshop on Bayesian Deep Learning. Cited by: §1, item , item .
- Using diversity in cluster ensembles. In 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), Vol. 2, pp. 1214–1219. Cited by: §4.2.2.
Variational bayesian inference and complexity control for stochastic block models. Statistical Modelling 12 (1), pp. 93–115. Cited by: item .
- Attributed social network embedding. IEEE Transactions on Knowledge and Data Engineering 30 (12), pp. 2257–2270. Cited by: §1, item .
Adversarially regularized graph autoencoder for graph embedding.. In IJCAI, pp. 2609–2615. Cited by: §1, item , item .
- Tri-party deep network representation. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 1895–1901. Cited by: §1.
- Focused clustering and outlier detection in large attributed graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1346–1355. Cited by: §1.
- Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §1.
F-measure optimisation in multi-label classifiers. In
Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 2424–2427. Cited by: §4.2.2.
- Efficient community detection in large networks using content and links. In Proceedings of the 22nd international conference on World Wide Web, pp. 1089–1098. Cited by: §1.
- Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research 15 (1), pp. 3221–3245. Cited by: §4.3.2.
- Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §1.
- Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 267–273. Cited by: §4.2.2.
- Characterizing and extracting multiplex patterns in complex networks.. IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 42 (2), pp. 469. Cited by: §1, §4.2.1.
- Stochastic blockmodeling and variational bayes learning for signed network analysis. IEEE Transactions on Knowledge and Data Engineering 29 (9), pp. 2026–2039. Cited by: §1.
- Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Cited by: §1.
- Detecting communities and their evolutions in dynamic social networks¡ªa bayesian approach. Machine learning 82 (2), pp. 157–189. Cited by: §1.
- ANRL: attributed network representation learning via deep neural networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 3155–3161. External Links: Cited by: §1, item .