Networks constitute a natural paradigm to represent real-world relational data that contain various relationships between entities ranging from online social network of users, and academic publication network of authors, to protein-protein interaction (PPI) network in the physical world. Due to the pervasive nature of networks, analyzing and mining useful knowledge from networks has been an actively researched topic for the past decades. Among various tools for network analysis, network embedding, which learns continuous vector representations for nodes in a network, has recently garnered attention, and has been effectively and efficiently applied to various downstream network-based applications, such as node classification (Dong et al., 2017; Park et al., 2020; Liu et al., 2019), and link prediction (Abu-El-Haija et al., 2017; Epasto and Perozzi, 2019).
A common underlying idea of network embedding methods is that a node embedding vector should be able to preserve the neighborhood structure of the node, i.e., local structural characteristics. Deepwalk (Perozzi et al., 2014) is a pioneering method that leverages the node co-occurrence information to learn the representations of nodes in a network such that nodes that frequently co-occur together have similar vector representations (Grover and Leskovec, 2016). Specifically, the co-occurrence based methods usually perform random walks on a network to obtain node sequences on which the skip-gram (Mikolov et al., 2013) model is applied. However, past research on network embedding mostly assumes the existence of a single vector representation for each node, whereas in reality a node usually has multiple aspects. For example, nodes (e.g., authors) in an academic publication network often belong to multiple research communities, and thus modeling each node with a single vector entails information loss.
In this regard, several attempts have been recently made to model the multiple aspects of a node by learning multiple vector representations for each node (Epasto and Perozzi, 2019; Liu et al., 2019; Wang et al., 2019a). However, there still remain several limitations. First, the aspects of the nodes are determined based on an offline clustering algorithm performed in advance (Figure 1a). In particular, to determine the node aspects, PolyDW (Liu et al., 2019) performs clustering based on matrix factorization (MF) on the adjacency matrix, and similarly, Splitter (Epasto and Perozzi, 2019) performs a local clustering algorithm based on ego-network analysis that encodes the role of a node in different local communities. However, as the clustering is done prior to the actual embedding learning, the cluster membership distribution (i.e., aspect distribution) for each node is fixed, which leads to each node always having the same aspect distribution regardless of its current context. This is especially problematic for the co-occurrence based network embedding methods, because the context of a node changes dynamically over multiple random walks. Moreover, owing to the offline clustering, the embedding methods cannot be trained in an end-to-end manner, which eventually leads to the final embedding quality largely dependent on the clustering.
Another limitation is that the interactions among the aspects are not explicitly captured. Although the interactions among aspects are implicitly revealed by the cluster membership of each node, it is only used for sampling the node aspects (Epasto and Perozzi, 2019), and not directly incorporated for training the aspect embedding vectors. Going back to our example of academic publication network, authors can belong to multiple research communities, and these communities interact with each other. For example, data mining (DM) and database (DB) communities are more related to each other than data mining and computer architecture (CA) communities, and thus such interactions (i.e., relatedness: DM DB, and diversity: DM CA) should be captured. Furthermore, viewing the aspects in a broader perspective, we argue that the interactions among aspects differ according to the inherent characteristics of the networks. For example, nodes in an academic publication network tend to belong to fewer number of communities (i.e., aspects are inherently less diverse) than nodes in a PPI network. This is because an ordinary author in an academic publication network usually works on a limited number of research topics, whereas each protein in a PPI network is associated with various functions and involved in various interactions. As such, modeling the interactions among aspects is challenging, and thus should be done in a systematic way rather than implicitly by clustering.
In the light of these issues, we propose asp2vec, a novel end-to-end framework for multi-aspect network embedding. The core idea is to dynamically assign aspects to each node according to its local context as illustrated in Figure 1b (Sec. 4.1). More precisely, we selectively assign (sample) a single aspect for each node based on our assumption that each node should belong to a single aspect under a certain local context. For example, even an author with diverse research areas, who belongs to multiple research communities (i.e., multiple aspects), focuses on a single research topic when collaborating with a particular group of people (i.e., a single aspect within a local context). We materialize this idea by devising the aspect selection module based on the Gumbel-Softmax trick (Jang et al., 2016), which approximates the sampling from a categorical distribution in a differentiable fashion, thereby enabling an end-to-end training procedure (Sec. 4.2). Moreover, we introduce the aspect regularization framework to simultaneously capture the interactions and relationships among aspects in terms of both relatedness and diversity (Sec. 4.3). Finally, we demonstrate that our proposed framework can be readily extended to heterogeneous networks whose nodes and edges are multiple-typed (Sec. 4.5). Through our extensive experiments on 13 real-world datasets, including various types of homogeneous networks and a heterogeneous network, we demonstrate the effectiveness of asp2vec in multiple downstream tasks including link prediction and author identification, in comparison with state-of-the-art multi-aspect network embedding methods. The source code of asp2vec can be found in https://github.com/pcy1302/asp2vec/.
2. Related Work
Network embedding. Network embedding methods aim at learning low-dimensional vector representations for nodes in a graph while preserving the network structure (Perozzi et al., 2014; Grover and Leskovec, 2016), and various other properties such as node attributes (Park et al., 2020), and structural role (Ribeiro et al., 2017). More precisely, inspired by word2vec (Mikolov et al., 2013), DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)
perform truncated random walks on graphs to obtain multiple node sequences, and then perform skip-gram to learn node embeddings by making an analogy between random walk sequences on a network and sentences in natural language. In another line of research, graph neural networks (GNNs)(Wu et al., 2019) recently have drawn intensive attention. Their main idea is to represent a node by aggregating information from its neighborhood (Kipf and Welling, 2016; Veličković et al., 2017). However, their performance largely depends on the available node labels (Hamilton et al., 2017; Kipf and Welling, 2016; Ma et al., 2019), whereas our proposed framework is fully unsupervised. Recently, Velivckovic et al (Veličković et al., 2018) proposed a GNN–based unsupervised network embedding method, called DGI, which maximizes the mutual information between the global representation of a graph and its local patches, i.e., node embeddings. However, while random walks naturally capture higher-order structural information and allow for the characterizing of different node contexts in networks, the performance of GNNs degenerates with larger number of convolution layers (Li et al., 2018). Moreover, unlike random walk–based algorithms (Dong et al., 2017), GNNs cannot be easily applied to heterogeneous networks without complicated design (Schlichtkrull et al., 2018; Park et al., 2020).
Multi-aspect network embedding. While most network embedding methods focus on learning a single embedding vector for each node, several recent methods (Epasto and Perozzi, 2019; Liu et al., 2019; Wang et al., 2019a; Yang et al., 2018; Sun et al., 2019; Ma et al., 2019) have been proposed to learn multiple embedding vectors for each node in a graph. More precisely, PolyDW (Liu et al., 2019) performs MF–based clustering to determine the aspect distribution for each node. Then, given random walk sequences on a graph, the aspect of the target and the context nodes are sampled independently according to the aspect distribution, and thus the target and the context nodes may have different aspects. Moreover, although capturing the local neighborhood structure (Yang et al., 2018) is more important for the co-occurrence based network embedding methods, PolyDW exploits clustering that focuses on capturing the global structure. Splitter (Epasto and Perozzi, 2019) performs local clustering to split each node into multi-aspect representations, however, it blindly trains all the aspect embedding vectors to be close to the original node embedding vector rather than considering the aspect of the target node. MCNE (Wang et al., 2019a) introduces a binary mask layer to split a single vector into multiple conditional embeddings, however, the number of aspects should be defined in the datasets (e.g., number of different types of user behaviors). Moreover, it formalizes the task of item recommendation as a supervised task, and combines network embedding methods with BPR (Rendle et al., 2012), whereas our proposed framework is trained in a fully unsupervised manner with an arbitrary number of aspects. MNE (Yang et al., 2018) trains multi-aspect node embeddings based on MF, while considering the diversity of the aspect embeddings. However, the aspect selection process is ignored, and the aspect embeddings are simply trained to be as dissimilar as possible to each other without preserving any relatedness among them.
Heterogeneous network embedding. A heterogeneous network (HetNet) contains multi-typed nodes and multi-typed edges, which should be modeled differently from the homogeneous counterpart, and there has been a line of research on heterogeneous network embedding (Fu et al., 2017; Dong et al., 2017; Hussein et al., 2018). Specifically, metapath2vec (Dong et al., 2017) extends Deepwalk by introducing a random walk scheme that is conditioned on meta-paths, where the node embeddings are learned by heterogeneous skip-gram. While various single aspect embedding methods have been proposed, multi-aspect network embedding is still in its infancy. More precisely, a line of work (Shi et al., 2018; Chen et al., 2018) commonly projects nodes into multiple embedding spaces defined by multiple aspects, and the node embeddings are trained on each space. However, these methods define the aspects based on the edges whose ground-truth labels should be predefined (e.g., the edge between “Obama” and “Honolulu” is labeled as “was_born_in”), which is not always the case in reality. Finally, TaPEm (Park et al., 2019) introduces the pair embedding framework that directly models the relationship between two nodes of different types, but still each node has a single embedding vector. Furthermore, although GNNs have also been recently applied to HetNets, they are either semi-supervised methods (Wang et al., 2019b), or cannot be applied to a genuine HetNet whose both nodes and edges are multiple typed (Park et al., 2020).
Random Walk–based Unsupervised Network Embedding.
Inspired by the recent advancement of deep learning methods, especially, skip-gram based word2vec(Mikolov et al., 2013), previous network representation learning methods (Grover and Leskovec, 2016; Perozzi et al., 2014) started viewing nodes in a graph as words in a document. More precisely, these methods commonly perform a set of truncated random walks on a graph, which are analogous to sequences of words, i.e., sentences. Then, the representation for each node is learned by optimizing the skip-gram objective, which is to maximize the likelihood of the context given the target node:
where is the target embedding vector for node , is the context embedding vector for node , is the number of embedding dimensions, denotes a dot product between and , w is a node sequence, and denotes the nodes within the context window of node . The final embedding vector for node is generally obtained by averaging its target and context embedding vector. i.e., . By maximizing the above Eqn. 1, nodes that frequently appear together within a context window will be trained to have similar embeddings. Despite its effectiveness in learning node representations, it inherently suffers from a limitation that each node is represented by a single embedding vector.
3.1. Problem Statement
Definition 3.1 ().
(Multi-aspect Node Representation Learning)
Given a graph , where represents the set of nodes, and represents the set of edges, where connects and , we aim to derive a multi-aspect embedding matrix for each node , where is an arbitrary predefined number of aspects. More precisely, each aspect embedding vector should 1) preserve the network structure information, 2) capture node ’s different aspects, and 3) model the interactions among the aspects (i.e., relatedness and diversity).
4. Proposed Framework: asp2vec
We present our proposed framework for context–based multi–aspect network embedding (Sec. 4.1) that includes the aspect selection module (Sec. 4.2) and the aspect regularization framework (Sec. 4.3). Then, we demonstrate how asp2vec can be further extended to a heterogeneous network (Sec 4.5). Figure 2 summarizes the overall architecture of our proposed framework, called asp2vec.
4.1. Context–based Multi–aspect Network Embedding
Given a target node and its local context, the core idea is to first determine the current aspect of the target node based on its local context, and then predict the context node with respect to the selected aspect of the target node. More precisely, given a target node embedding vector , and its currently selected aspect , our goal is to predict its context embedding vectors with respect to the selected aspect . i.e., . Formally, for each random walk sequence , we aim to maximize the following objective:
where is the embedding vector of node in terms of aspect ,
denotes the probability ofbeing selected to belong to the aspect , where , and denotes the probability of a context node given a target node whose aspect is . Note that this is in contrast to directly maximizing the probability of its context node regardless of the aspect of the target node as in Eqn. 1.
4.2. Determining the Aspect of the Center Node
Next, we introduce the aspect selection module that computes the aspect selection probability . Here, we assume that the aspect of each target node can be readily determined by examining its local context . i.e., . More precisely, we apply softmax to model the probability of ’s aspect:
where we leverage a readout function to summarize the information encoded in the local context of node , i.e., , with respect to aspect .
4.2.1. Gumbel-Softmax based Aspect Selection
Note that modeling the aspect selection probability based on softmax (Eqn. 3
) assigns a probability distribution over the number of aspects, which is continuous. However, we argue that although a node may simultaneously belong to multiple aspects in a global perspective, it should be assigned to a single aspect within a local context. For example, even an author with diverse research areas, who belongs to multiple research communities, focuses on a single research topic when collaborating with a particular group of people. In other words, the aspect selection should be done in a discrete manner based on a categorical distribution over aspects. i.e., hard selection. However, since the hard selection is a non-differentiable operation, it blocks the flow of the gradients down to the embedding vectors, and prevents the end-to-end back-propagation optimization. As a workaround, we adopt the Gumbel-Softmax trick (Jang et al., 2016), which approximates the sampling from a categorical distribution by reparameterizing the gradients with respect to the parameters of the distribution. More precisely, given a -dimensional categorical distribution with class probability , the gumbel-max trick provides a simple way to draw a one-hot sample from the categorical distribution as follows:
where is the gumbel noise drawn from Gumbel(0,1) distribution, which can be obtained as follows:
Due to the non-differentiable nature of the argmax() operation in Eqn. 4, we further approximate it by using softmax to ensure the differentiability as follows, which is known as the Gumbel-Softmax trick (Jang et al., 2016):
where is the temperature parameter to control the extent to which the output approximates the argmax() operation: As , samples from the Gumbel-Softmax distribution become one-hot. Finally, replacing the softmax in Eqn. 3 with the Gumbel-Softmax in Eqn. 6, we obtain the following aspect selection probability:
Combining the loss for all walks described in Eqn. 2, the minimization objective for learning the context-based multi-aspect node representation is given by:
4.2.2. Readout Function
Recall that we expect the output of to capture the current aspect of the target node by leveraging its local context information . Among various choices for , we choose the average pooling operation:
It is worth noting that thanks to its simplicity and efficiency, we choose the average pooling over the more advanced techniques that assign different weights to each context node, such as recurrent neural network–based or attention–based pooling(Zhou et al., 2016).
Discussion. If each node is associated with some attribute information, determining the aspect of each node can be more intuitive than solely relying on the context node embeddings originally learned to preserve the graph structure. In this regard, we study in Sec. 4.5.2 how to model the aspect when node attributes are given.
4.3. Modeling Relationship Among Aspects
As different aspect embeddings are intended to capture different semantics of each node, they should primarily be sufficiently diverse among themselves. However, at the same time, these aspects are inherently related to each other to some extent. For example, in an academic publication network, each node (i.e., author) may belong to various research communities (i.e.aspects), such as DM, CA, and DB community. In this example scenario, we can easily tell that DM and DB communities are more related than DM and CA communities, which shows the importance of modeling the interaction among aspects. In short, aspect embeddings should not only be 1) diverse (e.g., DMCA) so as to independently capture the inherent properties of individual aspects, but also 2) related (e.g., DMDB) to each other to some extent so as to capture some common information shared among aspects.
To this end, we introduce a novel aspect regularization framework, called , which is given by:
where is the aspect embedding matrix in terms of aspect for all nodes ; , and measures the similarity score between two aspect embedding matrices: large refers to the two aspects sharing some information in common, whereas small means aspect and aspect capture vastly different properties. The aspect similarity score is computed by the sum of the similarity scores between aspect embeddings of each node:
where is the embedding vector of node in terms of aspect , and denotes the aspect similarity score (ASS) between embeddings of aspects and with respect to node . By minimizing in Eqn. 10, we aim to learn diverse
aspect embeddings capturing inherent properties of each aspect. In this work, we evaluate the ASS based on the cosine similarity between two aspect embedding vectors given by the following equation:
However, as mentioned earlier, the aspect embeddings should not only be diverse but also to some extent related to each other (e.g., DMDB). In other words, as the aspects are not completely independent from each other, we should model their interactions. To this end, we introduce a binary mask to selectively penalize the aspect embedding pairs according to their ASSs (Ayinde et al., 2019). More precisely, the binary mask is defined as:
where is a threshold parameter that controls the amount of information shared between a pair of aspect embedding vectors of a given node: large encourages the aspect embeddings to be related to each other, whereas small encourages diverse aspect embeddings. Specifically, for a given node , if the absolute value of ASS between the aspect embedding vectors of aspect and , i.e., , is greater than , then we would like to penalize, and accept it otherwise. In other words, we allow two aspect embedding vectors of a node to be similar to each other to some extent (as much as ). By doing so, we expect that the aspect embeddings to be sufficiently diverse, but at the same time share some information in common.
Note that the aspect regularization framework is more effective for networks with diverse aspects. For example, users in social networks tend to belong to multiple (diverse) communities compared with authors in academic networks, because authors usually work on a limited number of research topics. Therefore, as the aspect regularization framework encourages the aspect embeddings to be diverse, it is expected to be more effective in social networks than in academic networks. We later demonstrate that this is indeed the case, which implies that asp2vec can also uncover the hidden characteristics of networks. We consider both positive and negative similarities by taking an absolute value of the cosine similarity, i.e., , because negative similarity still means the two aspects are semantically related yet in an opposite direction.
where is the coefficient for the aspect regularization.
Discussions on the Number of Parameters. A further appeal of our proposed framework is its superior performance with a relatively limited number of parameters. More precisely, PolyDW (Liu et al., 2019), which is one of the most relevant competitors, requires parameters for node embeddings ( each for target and context) in addition to the parameters required for the offline clustering method. i.e., matrix factorization, which requires another , thus in total. Moreover, Splitter (Epasto and Perozzi, 2019) requires for node embeddings ( for Deepwalk parameters and for persona embedding), and additional parameters for local and global clustering. On the other hand, our proposed framework only requires for node embeddings ( for target and for context) without additional parameters for clustering. Hence, we argue that asp2vec outperforms state-of-the-art multi-aspect network embedding methods with less parameters.
4.4. Task: Link Prediction
To evaluate the performance of our framework, we perform link prediction, which is to predict the strength of the linkage between two nodes. Link prediction is the best choice for evaluating the multi-aspect network embedding methods, because aspect embeddings are originally designed to capture various interactions among nodes, such as membership to multiple communities within a network, which is best revealed by the connection information. Moreover, link prediction is suggested by recent work as the primary evaluation task for unsupervised network embedding methods compared with node classification that involves a labeling process that may be uncorrelated with the graph itself (Abu-El-Haija et al., 2017; Epasto and Perozzi, 2019).
Recall that in our proposed framework, a node has a center embedding vector , and different aspect embedding vectors , which add up to embedding vectors in total for each node. In order to obtain the final embedding vector for node , we first compute the average of the aspect embedding vectors, and add it to the center embedding vector:
where is the final embedding vector for node . Note that in previous work (Liu et al., 2019; Epasto and Perozzi, 2019) that learn multiple embedding vectors for each node, the final link prediction is done by calculating the sum (Liu et al., 2019) or the maximum (Epasto and Perozzi, 2019) of the cosine similarity between all possible pairs of aspect embedding vectors. Both require dot product operations to compute the link probability between all pairs of nodes, which is time consuming. In this work, we simply use the final embedding vector
on which any off-the-shelf classification algorithm, such as logistic regression, is trained facilitating more practical usage in the real-world.
4.5. Extension to Heterogeneous Network
Heretofore, we described our approach for learning context–based multi–aspect node representations for a network with a a single type of nodes and a single type of edges. i.e., homogeneous network. In this section, we demonstrate that our proposed multi-aspect network embedding framework, asp2vec, can be readily extended to learn representations for nodes in a HetNet. Note that PolyDW (Liu et al., 2019) also showed its extendability to a HetNet. However, it is only limited to a bipartite network without node attribute information, whereas asp2vec can be extended to any type of HetNets with node attribute information.
Recall that in Sec. 4.4 we defined our primary task as link prediction. In this regard, among various link prediction tasks that can be defined in a HetNet such as, recommendation (Hu et al., 2018) (i.e., user-item), and author identification (i.e., paper-author), we specifically focus on the task of author identification in big scholarly data whose task is to predict the true authors of an anonymized paper (Chen and Sun, 2017; Park et al., 2019; Zhang et al., 2018), i.e., link probability between paper-typed nodes and author-typed nodes. Note that each paper is assumed to be associated with its content information. i.e., abstract text.
4.5.1. Context–based Multi–aspect Heterogeneous Network Embedding
To begin with, we first perform meta-path guided random walk (Dong et al., 2017) to generate sequences of nodes, which is an extension of the truncated random walk (Perozzi et al., 2014; Grover and Leskovec, 2016) to a HetNet. More precisely, random walks are conditioned on a meta-path (Dong et al., 2017), which is a composition of relations that contains a certain semantic. Figure 3 shows an example metapath “Author-Paper-Author (APA)” that refers to a co-authorship relation. After generating a set of random walks guided by a meta-path , we aim to maximize the probability of context nodes given a target node similar to Eqn. 2. However, unlike the former case, a walk now contains multiple types of nodes. Hence, we propose asp2vec-het by revising Eqn. 2 to incorporate the heterogeneity as follows:
where denotes the set of node types (e.g., author, paper, and venue), and denotes node ’s context nodes with type .
In order to apply asp2vec to a HetNet, we need to consider the following two cases and model them differently: 1) case 1: the aspect of the target node is unknown, thus should be inferred, and 2) case 2: the aspect of the target node is already revealed, thus there is no need to infer its aspect. Concretely, taking a sample of random walk guided by meta-path APA (Figure 3), case 1 is when the target node is author, and case 2 is when the target node is paper. Specifically, in case 2 shown in Figure 3b, we argue that the aspect (i.e. topic) of a paper can be easily inferred by looking at its content (i.e., text), whereas in case 1 shown in Figure 3a, the aspect of an author should still be inferred from its context. Therefore, case 1 should be modeled by our asp2vec-het framework in Eqn. 17, and case 2 can be simply modeled by the previous single aspect embedding method, such as metapath2vec (Dong et al., 2017) as follows:
where denotes all predefined meta-path schemes.
Note that although we specifically described the scenario in the author identification for the ease of explanations, asp2vec-het can be easily applied to other tasks such as recommendation, where the task is to infer the user-item pairwise relationship.
4.5.2. Determining node aspect in a HetNet.
Recall that in Sec. 4.2, given a homogeneous network, we determine the aspect of a target node based on its context nodes. In a HetNet, the node aspects can be similarly determined, however, we should take into account the heterogeneity of the node types among the context nodes. In other words, some node types might be more helpful than others in terms of determining the aspect of the target node. For example, for a target node “author” in Figure 3, a context node whose type is “paper” is more helpful than another author-typed node because a paper is usually written about a single topic, whereas an author can work on various research areas. Moreover, it is important to note that determining the aspect of the target node becomes even more straightforward if node attribute information is provided, for example, the content of the paper. In this regard, we leverage the paper abstract information to obtain the paper embedding vector, and rely only on the paper nodes to determine the aspect of the target author node. Specifically, we employ a GRU-based paper encoder introduced in (Zhang et al., 2018) that takes pretrained word vectors of an abstract, and returns a single vector that encodes the paper content. Refer to Section 3.1 of (Zhang et al., 2018) for more detail.
|Dataset||Num. nodes||Num. edges|
|Homogeneous Network||Social Network||Filmtrust (Dir.)||1,642||1,853|
|Wikipedia (Word co-occurrence)||4,777||184,812|
|HetNet||DBLP||Num. authors||Num. papers||Num. venues|
Datasets. We evaluate our proposed framework on thirteen commonly used publicly available datasets including a HetNet. The datasets can be broadly categorized into social network, protein network, word-co-occurrence network, and academic network. Table 2 shows the statistics of the datasets used in our experiments. Refer to the appendix for more detail.
Methods Compared. As asp2vec is an unsupervised multi-aspect network embedding framework that can be applied on both homogeneous and heterogeneous networks, we compare with the following unsupervised methods:
Unsupervised embedding methods for homogeneous networks
Models a single aspect
DGI (Veličković et al., 2018): It is the state-of-the-art unsupervised network embedding method that maximizes the mutual information between the graph summary and the local patches of a graph.
Models multiple aspects
PolyDW (Liu et al., 2019): It performs MF–based clustering to obtain aspect distribution for each node from which an aspect for the target and the context nodes are independently sampled.
Splitter (Epasto and Perozzi, 2019): It splits each node into multiple embeddings by performing local graph clustering, called persona2vec.
Unsupervised embedding methods for heterogeneous networks
Models a single aspect
Camel (Zhang et al., 2018): It is a task-guided heterogeneous network embedding method developed for the author identification task in which content-aware skip-gram is introduced.
Models multiple aspects
TaPEm (Park et al., 2019): It is the state-of-the-art task-guided heterogeneous network embedding method that introduces the pair embedding framework to directly capture the pairwise relationship between two heterogeneous nodes.
Experimental Settings and Evaluation Metrics. For evaluations of asp2vec and asp2vec-het, we perform link prediction and author identification, respectively. For link prediction, we follow the protocol of Splitter (Epasto and Perozzi, 2019; Abu-El-Haija et al., 2017), and for author identification, we follow the protocol of (Zhang et al., 2018; Park et al., 2019)
. As for the evaluation metrics, we use AUC-ROC for link prediction, and recall, F1, and AUC for author identification. Note that for all the tables in this section, the number of aspects
is 1 for all the single aspect embedding methods, i.e. DW, DGI, M2V++, Camel, and TaPEm. For more information regarding the hyperparameters and the model training–related details, refer to the appendix.
5.1. Performance Analysis
Overall Evaluation. Table 1 shows the link prediction performance of various methods. We have the following observations: 1) asp2vec generally performs well on all datasets, especially outperforming other methods on social networks, PPI and Wikipedia networks. We attribute such behavior to the fact that nodes in these networks inherently exhibit multiple aspects compared with nodes in the academic networks, where each node is either a paper (Cora) or an author (ca-HepTh, ca-AstroPh, and 4area) that generally focuses on a single research topic, thus we have less distinct aspects to be captured. In particular, nodes (authors) in ca-AstroPh and ca-HepTh networks are even more focused as they are specifically devoted to Astro Physics and High Energy Physics Theory, respectively. On the other hand, social networks contain various communities, for example, BlogCatalog is reported to have 39 ground truth groups in the network. 2) We observe that the prediction performance on the academic networks are generally high for all methods, which aligns with the results reported in (Epasto and Perozzi, 2019) where DW generally performs on par with Splitter under the same embedding size. We conjecture that this is because links in academic networks are relatively easy to predict because academic research communities are relatively focused, thereby nodes having less diverse aspects as mentioned above. 3) We observe that DGI generally performs better as the embedding size increases, outperforming others in some datasets when . However, DGI is not scalable to large dimension sizes, which is also mentioned in the original paper (Veličković et al., 2018) (DGI fixed for all datasets, but due to memory limitation, is reduced to 256 for pubmed dataset that contains 19,717 nodes).
Benefit of Aspect Selection Module. Table 3 shows the comparisons between asp2vec with conventional softmax (Eqn. 3), and the Gumbel-Softmax trick (Eqn. 7). We found that the Gumbel-Softmax indeed is beneficial, and more importantly, we observe that the improvements are more significant for social networks, PPI, and Wikipedia network, compared with the academic networks. This verifies that the aspect modeling is more effective for networks with inherently diverse aspects, such as social and PPI networks.
To further verify the benefit of the aspect selection module compared with the offline clustering–based method, we examine how the aspects are actually assigned to nodes. Our assumption is nodes that frequently appear in random walks (i.e.,
) are more likely to exhibit more aspects compared with the less frequently appearing ones. Therefore, the variance of aspect probability distribution of a frequently appearing node will be relatively smaller than that of the less frequently appearing one. For example, given four aspects, a frequently appearing node may have an aspect probability distribution [0.2,0.3,0.3,0.2], whereas that of a less frequently appearing node is likely to have a skewed distribution that looks like [0.7,0.0,0.3,0.0] with a higher variance. In Figure4, we plot the variance of the aspect probability distribution of each target node according to its frequency within the random walks. Precisely, each dot in the figure represents the variance of the aspect distribution to which each target node is assigned. As expected, asp2vec (Figure 4 bottom) tends to assign low variance aspect distribution to high frequency nodes, and low frequency nodes have high variance aspect distribution. On the other hand, as PolyDW determines aspects by an offline clustering algorithm, there is no such tendency observed (Figure 4 Top), which verifies the superiority of our context–based aspect selection module.
|dim ()||No||Threshold||best vs. No|
Benefit of Aspect Regularizer. Table 4 shows the link prediction AUC-ROC over various thresholds () for . 1) We observe that the performance drops significantly when , that is, when the aspect regularization framework is not incorporated (no ), which verifies the benefit of the aspect regularization framework. 2) The aspect regularization framework is less effective on the academic networks (see best vs. No ). Again, this is because academic networks inherently have less diverse aspects, which reduce the need for modeling diverse aspect embeddings. 3) In Figure 5, we illustrate heatmaps where each element denotes the cosine similarity between a pair of aspect embedding vectors. More precisely, we compute the pairwise similarity between aspect embedding vectors for all nodes, and then compute their average to obtain the final heatmap. We observe that the aspect embeddings are trained to be highly similar to each other without (Figure 5a), which shows the necessity of the aspect regularization framework. Moreover, when (Figure 5b), the similarity among the aspect embeddings is small, which demonstrates that a small encourages the aspect embeddings to be diverse. Finally, when (Figure 5c), we observe that some of the aspect embeddings are trained to be less similar (i.e. green) than others (i.e. red) allowing more flexibility in learning the aspect embeddings.
Evaluations on HetNet. Table 5 shows the result of author identification. 1) We observe that asp2vec-het significantly outperforms the state-of-the-art task-guided author identification methods in all the metrics, which verifies the benefit of our context–based aspect embedding framework. More precisely, asp2vec-het outperforms TaPEm (Park et al., 2019), which is a recently proposed method whose motivation is similar to ours in that it models multiple aspects of each node by the pair embedding framework. This verifies that explicitly learning multiple aspect embedding vectors for each node is superior to learning an embedding vector for every possible node pairs. 2) Recall that there are multiple types of nodes in a HetNet, and some types of nodes are more helpful than others in determining the aspect of a target node. Table 6 shows the performance when different types of context nodes are used for determining the aspect of the target node, i.e., author node. We observe that using paper (type) only performs the best, whereas using author only performs the worst. This is expected because we encode the paper abstract to generate paper embeddings, and thus the aspect of the target author can be readily determined by examining paper nodes within its context nodes. On the other hand, the author embeddings only capture the local structural information, which is less deterministic of the target aspect compared to the node attributes.
In this paper, we present a novel multi-aspect network embedding method called asp2vec that dynamically determines the aspect based on the context information. Major components of asp2vec is 1) the aspect selection module, which is based on the Gumbel-Softmax trick to approximate the discrete sampling of the aspects, and facilitate end-to-end training of the model, and 2) the aspect regularization framework that encourages the learned aspect embeddings to be diverse, but to some extent related to each other. We also demonstrate how to extend asp2vec to a HetNet. Through experiments on multiple networks with various types, we empirically show the superiority of our proposed framework.
Acknowledgment: IITP2018-0-00584, IITP2019-0-01906, IITP2019-2011-1-00783, 2017M3C4A7063570, 2016R1E1A1A01942642
- Learning edge representations via low-rank asymmetric projections. In CIKM, Cited by: §A.2, §1, §4.4, §5.
Regularizing deep neural networks by enhancing diversity in feature extraction. TNNLS. Cited by: §4.3.
- PME: projected metric embedding on heterogeneous networks for link prediction. In KDD, Cited by: §2.
- Task-guided and path-augmented heterogeneous network embedding for author identification. In WSDM, Cited by: §4.5.
- Metapath2vec: scalable representation learning for heterogeneous networks. In KDD, Cited by: §1, §2, §2, §4.5.1, §4.5.1, 1st item.
- Is a single embedding enough? learning node representations that capture multiple social contexts. In WWW, Cited by: 1st item, 5th item, §A.2, §1, §1, §1, §2, §4.3, §4.4, §4.4, 2nd item, §5.1, §5.
- Hin2vec: explore meta-paths in heterogeneous information networks for representation learning. In CIKM, Cited by: §2.
- Node2vec: scalable feature learning for networks. In KDD, Cited by: 3rd item, 4th item, §A.2, §1, §2, §3, §4.5.1, 1st item.
- Inductive representation learning on large graphs. In NeurIPS, Cited by: §2.
Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In KDD, Cited by: §4.5.
- Are meta-paths necessary? revisiting heterogeneous graph embeddings. In CIKM, Cited by: §2.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144. Cited by: §1, §4.2.1.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI, Cited by: §2.
- Is a single vector enough? exploring node polysemy for network embedding. In KDD, Cited by: 1st item, §1, §1, §2, §4.3, §4.4, §4.5, 1st item.
- Disentangled graph convolutional networks. In ICML, Cited by: §2, §2.
- Distributed representations of words and phrases and their compositionality. In NeurIPS, Cited by: §1, §2, §3.
- Unsupervised attributed multiplex network embedding. Cited by: §1, §2, §2.
- Task-guided pair embedding in heterogeneous network. In CIKM, Cited by: 7th item, §A.2, §2, §4.5, 1st item, 1st item, §5.1, §5.
- Deepwalk: online learning of social representations. In KDD, Cited by: 2nd item, 4th item, §1, §2, §3, §4.5.1, 1st item.
- BPR: bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618. Cited by: §2.
- Struc2vec: learning node representations from structural identity. In KDD, Cited by: §2.
- Modeling relational data with graph convolutional networks. In ESWC, Cited by: §2.
- Easing embedding learning by comprehensive transcription of heterogeneous information networks. In KDD, Cited by: §2.
- VGraph: a generative model for joint community detection and node representation learning. In NeurIPS, Cited by: §2.
- Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.
- Deep graph infomax. arXiv preprint arXiv:1809.10341. Cited by: §2, 2nd item, §5.1.
- MCNE: an end-to-end framework for learning multiple conditional network representations of social network. In KDD, Cited by: §1, §2.
- Heterogeneous graph attention network. In WWW, Cited by: §2.
- A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §2.
- Multi-facet network embedding: beyond the general solution of detection and representation. In AAAI, Cited by: §2.
- Camel: content-aware and meta-path augmented metric learning for author identification. In WWW, Cited by: §A.2, §4.5.2, §4.5, 2nd item, §5.
Attention-based bidirectional long short-term memory networks for relation classification. In ACL, Cited by: §4.2.2.
Appendix for Reproducibility
Appendix A Experimental Details
a.1. Code and Datasets
is implemented using PyTorch. The source code and instructions on how to run the code can be found here111https://github.com/pcy1302/asp2vec. Table 7 shows the urls of the author implementations of the compared methods, and Table 8 shows the urls from which the datasets can be downloaded. Nodes in social networks refer to users, nodes in PPI network refer to protein, nodes Wikipedia refer to words, nodes in Cora (citation network) refer to papers, and nodes in ca-HepTh, ca-AstroPh and 4area (all three are co-authorship networks) refer to authors. Each node in Cora is associated with a bag-of-words representation of a document (size=1433), whereas the remaining datasets do not have node attribute information. Finally, as DGI uses node attribute information, we used the adjacency matrix to initialize the node attribute matrix to apply DGI on datasets without node attribute information (i.e., all but Cora).
|Methods||URL link to the code|
a.2. Evaluation Protocol
Homogeneous Network: Link Prediction. For link prediction in homogeneous networks, we follow the convention of (Abu-El-Haija et al., 2017; Grover and Leskovec, 2016; Epasto and Perozzi, 2019). More precisely, we first split the original graph into to two equal sized set of edges, i.e., , and . To obtain , we randomly remove the edges while preserving the connectivity of the original graph. To obtain the negative counterparts of and , we randomly generate a set of edges of size and split it in half into and
, respectively. We train a logistic regression classifier on the(Epasto and Perozzi, 2019; Abu-El-Haija et al., 2017), and the link prediction performance is then measured by ranking (AUC-ROC) the removed edges among . Finally, for the above preprocessing, we used 222https://github.com/google/asymproj_edge_dnn/blob/master/create_dataset_arrays.py, which is from the implementation of (Abu-El-Haija et al., 2017).
Heterogeneous Network: Author Identification. For author identification in a heterogeneous network, we follow the convention of (Zhang et al., 2018; Park et al., 2019). More precisely, we use papers published before timestamp T () for training, and split papers that are published after timestamp T in half to make validation and test datasets. We report the test performance when the validation result is the best. For final evaulation, we randomly sample a set of negative authors and combine it with the set of true authors to generate 100 candidate authors for each paper, and then rank the positive paper among them.
a.3. Experimental Settings
For all the compared methods, we use the hyperparameters that are reported to be the best by the original papers. For PolyDW (Liu et al., 2019), number of random walks for each node is 110, the length of each walk is 11, window size is 4, number of samples per walk ( in (Liu et al., 2019)) is 10, and the number of negative samples is 5. For Splitter (Epasto and Perozzi, 2019), number of random walks for each node is 40, the length of each walk is 10, the window size is set to 5, and the number of negative samples is 5. We found that the above setting works the best.
Prior to training asp2vec, we perform warm-up step. More precisely, we initialize target and aspect embedding vectors of asp2vec with the final trained embedding vecotrs of Deepwalk (Perozzi et al., 2014). We found that the warm-up step usually gives better performance.
For DW and asp2vec, we follow the setting of node2vec (Grover and Leskovec, 2016) where the number of random walks for each node is 10, the length of each walk is 80, the window size is 3, and the number of negative samples is 2. and of asp2vec are set to 0.5, and 0.01, respectively. Note that we also tried other settings than those mentioned above, but no significant different results were obtained.
Moreover, we first tried gensim implementation333https://radimrehurek.com/gensim/models/word2vec.html of DW (Perozzi et al., 2014) / node2vec (Grover and Leskovec, 2016) but we found that the results were inconsistent. Specifically, for BlogCatalog, PPI, Wikipedia, and Cora, we found the performance of gensim DW significantly underperformed our implementation of DW. Therefore, we reported the results of our DW/node2vec which produced consistent results that were consistently better than gensim implementation.
It is important to note that while we successfully reproduced the results of Splitter on ca-HepTh, and ca-AstroPh reported in (Epasto and Perozzi, 2019), Wiki-vote could not be reproduced. This is because the implementation of Splitter is intended for an undirected graph, whereas Wiki-vote is a directed graph. We conjecture that Epasto et al. (Epasto and Perozzi, 2019) converted Wiki-vote into an undirected graph, and reported the results. However, we used a directed version of Wiki-vote as input, which decreased the link prediction performance.
For fair comparisons, we fixed the number of dimensions for each node throughout all the compared methods. For example, if for DW, then we set and for asp2vec to make the total dimension size , even though we do not concatenate the aspect embedding vectors.
For asp2vec-het, we set the window size to 3, and number of negative samples to 2, and for a fair comparison, we follow (Park et al., 2019) and we use a metapath “APA”.
Other Results. Table 10 shows the results for various number of aspects when is fixed to 20. We observe that consistently gives competitive results albeit not the best for some cases. This is the reason why we fixed in our experiments. Table 9 shows results for different combinations of and that multiplied to be 200. We observe that the best combination of and differs according to the datasets, where we could observe even better results compared with the results reported in our paper. Again, since and consistently gives competitive results, we used it for our experiments with dim=200 in our paper. Algorithm 1 shows the pseudocode for training asp2vec.