1 Introduction
The representations of objects (nodes) in large graphstructured data, such as social or biological networks, have been proved extremely effective as feature inputs for graph analysis tasks. Recently, there have been many attempts in the literature to extend neural networks to deal with representation learning of graphs, such as Graph Convolutional Networks (GCN) [15], GraphSAGE [12] and Graph Attention Networks (GAT) [34].
In spite of enormous success, previous graph neural networks mainly proposed representation learning methods by describing the neighborhoods as a perceptual whole, and they have not gone deep into the exploration of semantic information in graphs. Taking the movie network as an example, the paths based on composite relations of “MovieActorMovie” and “MovieDirectorMovie” may reveal two different semantic patterns, i.e., the two movies have the same actor (director). Here the semantic pattern is defined as a specific knowledge expressed by the corresponding path. Although several researchers [35, 30] attempt to capture these graph semantics of composite relations between two objects by metapaths, existing work relies on the given heterogeneous information such as different types of objects and distinct object connections. However, in the real world, quite a lot of graphstructured data do not have the explicit characteristics. As shown in Figure 1, in a scholar cooperation network, there are usually no explicit node (relation) types and all nodes are connected through the same relation, i.e., “Coauthor”. Fortunately, behind the same relation, there are various implicit factors which may express different connecting reasons, such as “Classmate” and “Colleague” for the same relation “Coauthor”. These factors can further compose diverse semanticpaths (e.g. “StudentAdvisorStudent” and “AdvisorStudentAdvisor”), which reveal sophisticated semantic associations and help to generate more informative representations. Then, how to automatically exploit comprehensive semantic patterns based on the implicit factors behind a general graph is a nontrivial problem.
In general, there are several challenges to solve this problem. Firstly, it is an essential part to adaptively infer latent factors behind graphs. We notice that several researches begin to explore desired latent factors behind a graph by disentangled representations [20, 18]
. However, they mainly focus on inferring the latent factors by the disentangled representation learning while failing to discriminatively model the independent implicit factors behind the same connections. Secondly, after discovering the latent factors, how to select the most meaningful semantics and aggregate the diverse semantic information remain largely unexplored. Last but not the least, to further exploit the implicit semantic patterns and to be capable of conducting inductive learning are quite difficult.
To address above challenges, in this paper, we propose a novel Semantic Graph Convolutional Networks (SGCN), which sheds light on the exploration of implicit semantics in the node aggregating process. Specifically, we first propose a latent factor routing method with the DisenConv layer [20]
to adaptively infer the probability of each latent factor that may have caused the link from a given node to one of its neighborings. Then, for further exploring the diverse semantic information, we transfer the probability between every two connected nodes to the corresponding semantic adjacent matrix, which can present the semanticpaths in a graph. Afterwards, most semantic strengthen methods like the semantic level attention module can be easily integrated into our model and aggregate the diverse semantic information from these semanticpaths. Finally, to encourage the independence of the implicit semantic factors and conduct the inductive learning, we design an effective joint loss function to maintain the independent mapping channels of different factors. This loss function is able to focus on different semantic characteristics during the training process.
Specifically, the contributions of this paper can be summarized as follows:

We first break the heterogeneous restriction of semantic representations with an endtoend framework. It automatically infers the independent factor behind the formation of each edge and explores the semantic associations of latent factors behind a graph.

We propose a novel Semantic Graph Convolutional Networks (SGCN), to learn node representations by aggregating the implicit semantics from the graphstructured data.

We conduct extensive experiments on various realworld graphs datasets to evaluate the performance of the proposed model. The results show the superiority of our proposed model by comparing it with many powerful models.
2 Related Works
Graph neural networks (GNNs) [10, 26], especially graph convolutional networks [13], have been proven successful in modeling the structured graph data due to its theoretical elegance [5]. They have made new breakthroughs in various tasks, such as node classification [15] and graph classification [6]. In the early days, the graph spectral theory [13] was used to derive a graph convolutional layer. Then, the polynomial spectral filters [6] greatly reduced the computational cost than before. And, Kipf and Welling [15] proposed the usage of a linear filter to get further simplification. Along with spectral graph convolution, directly performing graph convolution in the spatial domain was also investigated by many researchers [8, 12]. Among them, graph attention networks [34] has aroused considerable research interest, since it adaptively specify weights to the neighbors of a node by attention mechanism [1, 37].
For semantic learning research, there have been studies explored a kind of semanticpath called metapath in heterogeneous graph embedding to preserve structural information. ESim [28] learned node representations by searching the userdefined embedding space. Based on random walk, metapath2vec [7] utilized skipgram to perform a semanticpath. HERec [29] proposed a type constraint strategy to filter the node sequence and captured the complex semantics reflected in heterogeneous graph. Then, Fan et al. [9] suggested a metagraph2vec model for malware detection, where both the structures and semantics are preserved. Sun et al. [30] proposed metagraphbased network embedding models, which simultaneously considers the hidden relations of all meta information of a metagraph. Meanwhile, there were other influential semantic learning approaches in some studies. For instance, many models [4, 17, 25] were utilized to various fields because of their latent semantic analysis ability.
In heterogeneous graphs, two objects can be connected via different semanticpaths, which are called metapaths. It depends on the characteristic that this graph structure has different types of nodes and relations. One metapath is defined as a path in the form of (abbreviated as ), it describes a composite relation , where denotes the composition operator on relations. Actually, in homogeneous graph, the relationships between nodes are also generated for different reasons (latent factors), so we can implicitly construct various types of relationships to extract various semanticpaths correspond to different semantic patterns, so as to improve the performance of GCN model from the perspective of semantic discovery.
3 Semantic Graph Convolutional Networks
In this section, we introduce the Semantic Graph Convolutional Networks (SGCN). We first present the notations, then describe the overall network progressively.
3.1 Preliminary
We focus primarily on undirected graphs, and it is straightforward to extend our approach to directed graphs. We define as a graph, comprised of the nodes set and edges set , and denotes the number of nodes. Each node
has a feature vector
. We use to indicate that there is an edge between node and node . Most graph convolutional networks can be regarded as an aggregation function that outputs the representations of nodes when given features of each node and its neighbors:where the output denotes the representations of nodes. It means that neighborhoods of a node contains rich information, which can be aggregated to describe the node more comprehensively. Different from previous studies [15, 12, 34], in our work, proposed would automatically learn the semanticpath from graph data to explore corresponding semantic pattern.
3.2 Latent Factor Routing
Here we aim to introduce the disentangled algorithm that calculates the latent factors between every two objects. We assume that each node is composed of independent components, hence there are latent factors to be disentangled. For the node
, the hidden representation of
is , where denotes corresponding aspect of node that is pertinent to the th disentangled factor.In the initial stage, we project its feature vector into different subspaces:
(1) 
where and are the mapping parameters and bias of
th subspace, the nonlinear activation function
is [23]. To capture aspect of node comprehensively, we construct from both and , which can be utilized to identify the latent factors. Here we learn the probability of each factor by leveraging neighborhood routing mechanism [20, 18], it is a DisenConv layer:(2) 
(3) 
where iteration , indicates the probability that factor indicates the reason why node reaches neighbor , and satisfies . The neighborhood routing mechanism will iteratively infer and construct . Note that, there are total DisenConv layers, is assigned the value of finally in each layer , more detail can refer to Algorithm 1.
3.3 Discriminative Semantic Aggregation
For the data that various relation types between nodes and their corresponding neighbors are explicit and fixed, it is easily to construct multiple subsemantic graphs as the input data for multiple GCN model. As shown in Figure 2(a) , a heterogeneous graph contains two different types of metapaths (metapath 1, metapath 2). Then can be decomposed to multiple graphs consisting of single semantic graph and , where and its neighbors are connected by pathrelation 1(2) for each node in .
However, we cannot simply transfer the preconstruct multiple graph method to all network architectures. In detail, for a graph with no different types of edges, we have to judge implicit connecting factors of these edges to find semanticpaths. And the probability of each latent factor is calculated in the iteratively running process as mentioned in last section. To solve this dilemma, we propose a novel algorithm to automatically represent semanticpaths during the model running.
After the latent factor routing process, we get the soft probability matrix of node latents , where means the possibility that node connects to because of the factor . In our model, the latent factor should identify the certain connecting cause of each connected node pair. Here we transfer the probability matrix to an semantic adjacent matrix , so the element in only has binary value (0 or 1). In detail, for every node pair and , if denotes the biggest value in . As shown in Figure 2(b), each node is represented by components. In this graph, every node may connect with others by one relationship from types, e.g., the relationship between node and is (denotes ). For node , we can find that it has two semanticpathbased neighbors and . And, the semanticpaths of and are two different types which composed by and respectively. We define the adjacent matrix for virtual semanticpathbased edges,
(4) 
where , , and . For instance, in Figure 2(b), , , and , in this way two semanticpaths start from node can be expressed as and .
In the semantic information aggregation process, we aggregate the latent vectors connected by corresponding semanticpath as:
(5) 
where we just use MeanPooling to avoid large values instead of operator, and are both returned from the last layer of DisenConv operation, in this time that factor probabilities would be stable since the representation of each node considers the influence from neighbors. According to Eq. (5), the aggregation of two latent representations (end points) of one certain semanticpath denotes the mining result of this semantic relation, e.g., and express two different kinds of semantic pattern representations in Figure 2(b), and respectively. And, for all types of semanticpaths start from node , the weight of each type depends on its frequency. Note that, although the semantic adjacent matrix neglects some low probability factors, our semantic paths are integrated with the node states of DisenGCN, which would not lose the crucial information captured by basic GCN model. The advantage of this aggregation method is that our model can distinguish different semantic relations without adding extra parameters, instead of designing various graph convolution networks for different semanticpaths. That is to say, the model does not increase the risk of over fitting after the graph semanticpaths learning. Here we only consider 2orderpaths in our model, however, it can be straightly extended to longer path mining.
3.4 Independence Learning for Mapping Subspaces
In fact, one type of edge in a metapath tries to denote one unique meaning, so the latent factors in our work should not overlap. So, the assumption of using latent factors to construct semanticpaths is that these different factors extracted by latent factor routing module can focus on different connecting causes. In other words, we should encourage the representations of different factors to be of sufficient independence. Before the probability calculating, on our features, the focused point views of subspaces in Eq. (1) should keep different. Our solution considers that the distance between independence factor representations should be sufficient long if they were projected to one subspace.
First, we project the input values in Eq. (1) into an unified space to get vectors and as follow:
(6) 
where is the projection parameter matrix. Then, the independence loss based on distances between unequal factor representations could be calculated as follow:
(7) 
where
denotes an identity matrix,
is elementwise product, . Specifically, we learn a lesson from [33] that scaling the dot products by , to counteract the gradients disappear effect for large values. As long as is minimized in the training process, the distances between different factors tend to be larger, that is, the subspaces would capture sufficient different information to encourage independence among learned latent factors.Next, we would analyze the validity of this optimization. Latent Factor Routing aims to utilize the disentangled algorithm to calculate the latent factors between every two objects. However, this approach is a variant of von MisesFisher (vMF) [2] mixture model, such an EM algorithm cannot optimize the independences of latent factors within the iterative process. And random initialization of the mapping parameters is also not able to promise that subspaces obtain different concerns. For this shortcoming, we give an assumption:
Assumption 3.1
The features in different subspaces keep sufficient independent when the margins of their projections in the unified space are sufficiently distinct.
This assumption is inspired by the Latent Semantic Analysis algorithm (LSA) [16] that projects multidimensional features of a vector space model into a semantic space with less dimensions, which keeps the semantic features of the original space in a statistical sense. So, our optimization approach is listed below:
(8)  
In the above equation, denotes the training parameter to be optimized. We ignore the and in Eq. (7), because they do not affect the optimization procedure. With the increase of Interdistances of subspaces, the IntraVar of factors in each subspace would not larger than the original level (as the random initialization). The InterVar/IntraVar ratio becomes larger, in other word, we get more sufficient independence of mapping subspaces.
3.5 Algorithm Framework
In this section, we describe the overall algorithm of SGCN for performing noderelated tasks. For graph , the groundtruth label of node is , where is the number of classes. The details of our algorithm are shown in Algorithm 1. First, we calculate the independence loss after factor channels capture features. Then, layers of DisenConv operations would return the stable probability matrix . After that, the automatic graph semanticpath representation is learned based on . To apply to different tasks, we design the final layer by a fullyconnected layer , where , . For instance, for the semisupervised node classification task, we implement
(9) 
as the loss function, where , is the set of labeled nodes, and would be joint training by sum up with the task loss function. For the multilabel classification task, since the label consists of more than one positive bits, we define the multilabel loss function for node as:
(10) 
Moreover, for the node clustering task,
denotes the input feature of KMeans.
3.6 Time Complexity Analysis and Optimization
We should notice a problem in Section 3.3 that the time complexity of Eq. (45) by matrix calculation is . Such a complex time complexity will bring a lot of computing load, so we optimize this algorithm in the actual implementation. For realworld datasets, one node connects to neighbors that are far less than the total number of nodes in the graph. Therefore, when we create the semanticpaths based adjacent matrix, the matrix is defined to denote 1order neighbor relationships, is the maximum number of neighbors that we define, and is the id of a neighbor if they are connected by , else . Then the semanticpath relations of type of are denoted by , and the pooling of this semantic pattern is the mean pooling of . According to the analysis above, the time complexity can be reduced to .
4 Experiments
In this section, we empirically assess the efficacy of SGCN on several noderelated tasks, includes semisupervised node classification, node clustering and multilabel node classification. We then provide node visualization analysis and semanticpaths sampling experiments to verify the validity of our idea.
Dataset  Type  Nodes  Edges  Classes  Features  Multilabel 

Pubmed  Citation Network  19,717  44,338  3  500  False 
Citeseer  Citation Network  3,327  4,732  6  3,703  False 
Cora  Citation Network  2,708  5,429  7  1,433  False 
Blogcatalog  Social Network  10,312  333,983  39    True 
POS  Word Cooccurrence  4,777  184,812  40    True 
4.1 Experimental Setup
4.1.1 Datasets.
We conduct our experiments on 5 realworld datasets, Citeseer, Cora, Pubmed, POS and BlogCatalog [27, 11, 32], whose statistics are listed in Table 1. The first three citation networks are benchmark datasets for semisupervised node classification and node clustering. For graph content, the nodes, edges, and labels in these three represent articles, citations, and research areas, respectively. Their node features correspond a bagofwords representation of a document.
POS and BlogCatalog are suitable for multilabel node classification task. Their labels are partofspeech tags and user interests, respectively. In detail, BlogCatalog is a social relationships network of bloggers who post blogs in the BlogCatalog website. These labels represent the blogger’s interests inferred from the text information provided by the blogger. POS (PartofSpeech) is a cooccurrence network of words appearing in the first million bytes of the Wikipedia dump. The labels in POS denote the PartofSpeech tags inferred via the Stanford POSTagger. Due to the two graphs do not provide node features, we use the rows of their adjacency matrices in place of node features for them.
4.1.2 Baselines.
To demonstrate the advantages of our model, we compare SGCN with some representative graph neural networks, including the graph convolution network (GCN) [15] and the graph attention network (GAT) [34]. In detail, GCN [15] is a simplified spectral method of node aggregating, while GAT weights a node’s neighbors by the attention mechanism. GAT achieves state of the art in many tasks, but it contains far more parameters than GCN and our model. Besides, ChebNet [6] is a spectral graph convolutional network by means of a Chebyshev expansion of the graph Laplacian, MoNet [22] extends CNN architectures by learning local, stationary, and compositional taskspecific features. And IPGDN [18] is the advanced version of DisenGCN. We also implement other nongraph convolution network method, including random walk based network embedding DeepWalk [24], linkbased classification method ICA [19], inductive embedding based approach Planetoid [38], label propagation approach LP [39], semisupervised embedding learning model SemiEmb [36] and so on.
In addition, we conduct the ablation experiments into nodes classification and clustering to verify the effectiveness of the main components of SGCN: SGCNpath is our complete model without independence loss, and SGCNindep denotes SGCN without the semanticpath representations.
In the multilabel classification experiment, the original implementations of GCN and GAT do not support multilabel tasks. We therefore modify them to use the same multilabel loss function as ours for fair comparison in multilabel tasks. We additionally include three node embedding algorithms, including DeepWalk [24], LINE [31], and node2vec [11], because they are demonstrated to perform strongly on the multilabel classification. Besides, we remove IPGDN since it is not designed for multilabel task.
4.1.3 Implementation Details.
We train our models on one machine with 8 NVIDIA Tesla V100 GPUs. Some experimental results and the settings of common baselines that we follow [20, 18], and we optimize the parameters of models with Adam [14]. Besides, we tune the hyperparameters of both our model and baselines using hyperopt [3]. In detail, for semisupervised classification and node clustering, we set the number of iterations , the layers , the number of components (denotes the number of mapping channels. Therefore, for our model, the dimension of a component in the SGCN model is ), dropout rate , tradeoff , the learning rate loguniform , the regularization term loguniform . Besides, it should be noted that, in the multilabel node classification, the output dimension is set to 128 to achieve better performance, while setting the dimension of the node embeddings to be 128 as well for other node embedding algorithms. And, when tuning the hyperparameters, we set the number of components in the latent factor routing process. Here makes the best result in our experiments.
4.2 SemiSupervised Node Classification
For semisupervised node classification, there are only 20 labeled instances for each class. It means that the information of neighbors should be leveraged when predicting the labels of target nodes. Here we follow the experimental settings of previous works [38, 15, 34].
We report the classification accuracy (ACC) results in Table 2. The majority of nodes only connect with those neighbors of the same class. According to Table 2, it is obvious that SGCN achieves the best performance amongst all baselines. Here SGCN outperforms the most powerful baseline IPGDN with 1.55%, 0.47% and 1.1% relative accuracy improvements on three datasets, compared with the increasing degrees of previous models, our model express obvious improvements in the node classification task. And our proposed model achieves the best ACC of 85.4% on Cora dataset, it is a great improvement on this dataset. On the other hand, in the ablation experiment (the last three rows of Table 2), the complete
Models Cora Citeseer Pubmed MLP 55.1 46.5 71.4 SemiEmb 59.0 59.6 71.1 LP 68.0 45.3 63.0 DeepWalk 67.2 43.2 65.3 ICA 75.1 69.1 73.9 Planetoid 75.7 64.7 77.2 ChebNet 81.2 69.8 74.4 GCN 81.5 70.3 79.0 MoNet 81.7  78.8 GAT 83.0 72.5 79.0 DisenGCN 83.7 73.4 80.5 IPGDN 84.1 74.0 81.2 SGCNindep 84.2 73.7 82.0 SGCNpath 84.6 74.4 81.6 SGCN 85.4 74.2 82.1 Models Cora Citeseer Pubmed NMI ARI NMI ARI NMI ARI SemiEmb 48.7 41.5 31.2 21.5 27.8 35.2 DeepWalk 50.3 40.8 30.5 20.6 29.6 36.6 Planetoid 52.0 40.5 41.2 22.1 32.5 33.9 ChebNet 49.8 42.4 42.6 41.5 35.6 38.6 GCN 51.7 48.9 42.8 42.8 35.0 40.9 GAT 57.0 54.1 43.1 43.6 35.0 41.4 DIsenGCN 58.4 60.4 43.7 42.5 36.1 41.6 IPGDN 59.2 61.0 44.3 43.0 37.0 42.0 SGCNindep 60.2 59.2 44.7 42.8 37.2 42.3 SGCNpath 60.5 60.7 45.1 44.0 37.3 42.8 SGCN 60.7 61.6 44.9 44.2 37.9 42.5SGCN model is superior to either algorithm in at least two datasets. Moreover, we can find that SGCNindep and SGCNpath are both perform better than previous algorithms to some degree. It reveals the effectiveness of our semanticpaths mining module and the independence learning for subspaces.
4.3 Multilabel Node Classification
In the multilabel classification experiment, every node is assigned one or more labels from a finite set . We follow node2vec [11] and report the performance of each method while varying the number of nodes labeled for training from 10% to 90% , where is the total number of nodes. The rest of nodes are split equally to form a validation set and a test set. Then with the best hyperparameters on the validation sets, we report the averaged performance of 30 runs on each multilabel test set. Here we summarize the results of multilabel node classification by MacroF1 and MicroF1 scores in Figure 3.
Firstly, there is an obvious point that proposed SGCN model achieves the best performances in both two datasets. Compared with DisenGCN model, SGCN combines with semantic semanticpaths can achieve the biggest improvement of 20.0% when we set 10% of labeled nodes in POS dataset. The reason may be that the relation type of POS dataset is Word Cooccurrence, there are lots of regular explicit or implicit semantics amongst these relationships between different words. In the other dataset, although SGCN does not show a full lead but achieves the highest accuracy on both indicators. We find that the GCNbased algorithms are usually superior to the traditional node embedding algorithms in overall effect. Although for the MicroF1 score on Blogcatalog, GCN produces the poor results. In addition, the SGCN algorithm can make both MacroF1 and MicroF2 achieve good results at the same time, and there will be no bad phenomenon in one of them. Because this approach would not ignore the information provided by the classes with few samples but important semantic relationships.
4.4 Node Clustering
To further evaluate the embeddings learned from the above algorithms, we also conduct the clustering task. Following [18], for our model and each baseline, we obtain its node embedding via feed forward when the model is trained. Then we input the node embedding to the KMeans algorithm to cluster nodes. The groundtruth is the same as that of node classification task, and the number of clusters is set to the number of classes. In detail, we employ two metrics of Normalized Mutual Information (NMI) and Average Rand Index (ARI) to validate the clustering results. Since the performance of KMeans is affected by initial centroids, we repeat the process for 20 times and report the average results in Table 3. As can be seen in Table 3, SGCN consistently outperforms all baselines, and GNNbased algorithms usually achieve better performance. Besides, with the semanticpath representation, SGCN and SGCNpath performs significantly better than DisenGCN and IPGDN, our proposed algorithm gets the best results on both NMI and ARI. It shows that SGCN captures a more meaningful node embedding via learning semantic patterns from graph.
4.5 Visualization Analysis and Semanticpaths Sampling
We try to demonstrate the intuitive changes of node representations after incorporating semantic patterns. Therefore, we utilize tSNE [21] to transform feature representations (node embedding) of SGCN and DisenGCN into a 2dimensional space to make a more intuitive visualization. Here we visualize the node embedding of Cora (actually, the change of representation visualization is similar in other datasets), where different colors denote different research areas. According to Figure 5, there is a phenomenon that the visualization of SGCN is more distinguishable than DisenGCN. It demonstrates that the embedding learned by SGCN presents a high intraclass similarity and separates papers into different research areas with distinct boundaries. On the contrary, DisenGCN dose not perform well since the intermargin of clusters are not distinguishable enough. In several clusters, many nodes belong to different areas are mixed with others.
Then, to explore the influence of different scales of semanticpaths on our model performance, we implement a semanticpaths sampling experiment on Cora. As mentioned in the section 3.6, for capturing different numbers of semantic paths, we change the hyperparameter of cut size to restrict the sampling size on each node’s neighbors. As shown in Figure 5, the SGCN model with the path representation achieves higher performances than the first point (). From the perspective of global trend, with the increase of , the classification accuracy of SGCN model is also improved steady, although it get the highest score when . It means that GCN model combines with more sufficient scale semanticpaths can really learn better node representations.
5 Conclusion
In this paper, we proposed a novel framework named Semantic Graph Convolutional Networks which incorporates the semanticpaths automatically during the node aggregating process. Therefore, SGCN provided the semantic learning ability to general graph algorithms. We conducted extensive experiments on various realworld datasets to evaluate the superior performance of our proposed model. Moreover, our method has good expansibility, all kinds of pathbased algorithms in the graph embedding field can be directly applied in SGCN to adapt to different tasks, we will take more explorations in future work.
6 Acknowledgements
This research was partially supported by grants from the National Key Research and Development Program of China (No. 2018YFC0832101), and the National Natural Science Foundation of China (No.s U20A20229 and 61922073). This research was also supported by MeituanDianping Group.
References
 [1] (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §2.
 [2] (2005) Clustering on the unit hypersphere using von misesfisher distributions. J. Mach. Learn. Res. 6 (Sep), pp. 1345–1382. Cited by: §3.4.

[3]
(2013)
Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms
. In Proceedings of the 12th Python in science conference, pp. 13–20. Cited by: §4.1.3.  [4] (2003) Latent dirichlet allocation. J. Mach. Learn. Res. 3, pp. 993–1022. External Links: Link Cited by: §2.

[5]
(2017)
Geometric deep learning: going beyond euclidean data
. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §2.  [6] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2, §4.1.2.
 [7] (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD, pp. 135–144. Cited by: §2.
 [8] (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §2.
 [9] (2018) Gotchasly malware! scorpion a metagraph2vec based malware detection system. In Proceedings of the 24th ACM SIGKDD, pp. 253–262. Cited by: §2.
 [10] (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §2.
 [11] (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD, pp. 855–864. Cited by: §4.1.1, §4.1.2, §4.3.
 [12] (2017) Inductive representation learning on large graphs. In NIPS, pp. 1024–1034. Cited by: §1, §2, §3.1.
 [13] (2015) Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163. Cited by: §2.
 [14] (2015) Adam: A method for stochastic optimization. See DBLP:conf/iclr/2015, Cited by: §4.1.3.
 [15] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2, §3.1, §4.1.2, §4.2.
 [16] (1998) An introduction to latent semantic analysis. Discourse processes 25 (23), pp. 259–284. Cited by: §3.4.
 [17] Learning the compositional visual coherence for complementary recommendations. In IJCAI20, pp. 3536–3543. Cited by: §2.

[18]
(2020)
Independence promoted graph disentangled networks.
Proceedings of the AAAI Conference on Artificial Intelligence
, pp. . Cited by: §1, §3.2, §4.1.2, §4.1.3, §4.4.  [19] (2003) Linkbased classification. In Proceedings of the 20th International Conference on Machine Learning (ICML03), pp. 496–503. Cited by: §4.1.2.
 [20] (2019) Disentangled graph convolutional networks. In International Conference on Machine Learning, pp. 4212–4221. Cited by: §1, §1, §3.2, §4.1.3.
 [21] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §4.5.

[22]
(2017)
Geometric deep learning on graphs and manifolds using mixture model cnns.
In
IEEE Conference on Computer Vision and Pattern Recognition
, pp. 5115–5124. Cited by: §4.1.2.  [23] (2010) Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML10), pp. 807–814. Cited by: §3.2.
 [24] (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD, pp. 701–710. Cited by: §4.1.2, §4.1.2.
 [25] (2019) A structureenriched neural network for network embedding. Expert Systems with Applications, pp. 300–311. Cited by: §2.
 [26] (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.
 [27] (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §4.1.1.
 [28] (2016) Metapath guided embedding for similarity search in largescale heterogeneous information networks. arXiv preprint arXiv:1610.09769. Cited by: §2.
 [29] (2018) Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering 31 (2), pp. 357–370. Cited by: §2.
 [30] (2018) Joint embedding of metapath and metagraph for heterogeneous information networks. In 2018 IEEE International Conference on Big Knowledge, pp. 131–138. Cited by: §1, §2.
 [31] (2015) Line: largescale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §4.1.2.
 [32] (2011) Leveraging social media networks for classification. Data Mining and Knowledge Discovery 23 (3), pp. 447–478. Cited by: §4.1.1.
 [33] (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §3.4.
 [34] (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §1, §2, §3.1, §4.1.2, §4.2.
 [35] (2019) Heterogeneous graph attention network. In The World Wide Web Conference, pp. 2022–2032. Cited by: §1.
 [36] (2012) Deep learning via semisupervised embedding. In Neural networks: Tricks of the trade, pp. 639–655. Cited by: §4.1.2.
 [37] (2020) Estimating early fundraising performance of innovations via graphbased market environment model.. In AAAI, pp. 6396–6403. Cited by: §2.

[38]
(2016)
Revisiting semisupervised learning with graph embeddings
. arXiv preprint arXiv:1603.08861. Cited by: §4.1.2, §4.2. 
[39]
(2003)
Semisupervised learning using gaussian fields and harmonic functions
. In Proceedings of the 20th International conference on Machine learning (ICML03), pp. 912–919. Cited by: §4.1.2.
Comments
There are no comments yet.