1 Introduction
Numerous realworld data applications, such as social networks [JZhang19]
[WMWG17] and proteinprotein interaction networks [FBSB17] exhibit the favorable property of graph data structure. Meanwhile, handling graph data is also very challenging, because each node has its own unique attributes, and the extensive connections among nodes convey complex but important information as well. When learning from the information of individual nodes and the connection information among them simultaneously, the task becomes more challenging.Traditional machine learning methods focus on the features of individual nodes, which obstacle their ability to process graph data. Graph neural networks (GNNs) for representation learning of graphs learn nodes’ new feature vectors through a recursive neighborhood aggregation scheme
[XHLJ10], which completes the fusion of node attributes and structural information in essence. With the support of sufficient training samples, a rich body of successful supervised graph neural network models have been developed [KW17, VCCRLB18, YYRHL18]. However, labeled data is not always available in graph representation learning tasks, and those algorithms are not applicable to the unsupervised learning settings. To alleviate the training sample insufficiency problem, unsupervised graph representation learning has aroused extensive research interest. The task is to learn lowdimensional representation for each graph node such that the representation preserves graph topology structure and node content. Meanwhile, the learned new node representations can be applied to conventional samplebased machine learning algorithms as well.
Most of the existing unsupervised graph representation learning models can be roughly grouped into factorizationbased models and edgebased models. Factorizationbased models capture the global graph information by factorizing the sample affinity matrix
[zhang2016collective, yang2015network, zhang2016collective]. Those methods tend to ignore the node attributes and local neighborhood relationships, which usually contain important information. Edgebased models exploit the local and higherorder neighborhood information by edge connections or randomwalk paths. Nodes tend to have similar representations if they are connected or cooccur in the same path [KW16, DN17, HYL17, GL16, PAS14]. Edgebased models are prone to preserve limited order node proximity and lack a mechanism to preserve the global graph structure. The recently proposed deep graph infomax (DGI) [velivckovic2018deep] model provides a novel direction that considers both global and local graph structure. DGI maximizes the mutual information between graph patch representations and the corresponding highlevel summaries of graphs. It has shown competitive performance even compared with supervised graph neural networks in benchmark homogeneous graphs.In this paper, we explore the mutual informationbased learning framework in heterogeneous graph representation problems. The networked data in the realworld usually contain very complex structures (involving multiple types of nodes and edges), which can be formally modeled as the heterogeneous information networks (HIN). In this paper, we will misuse the terminologies “HIN” and “HG” (heterogeneous graph) in reference to such complex networked data without any differentiation. Compared with homogeneous graphs, heterogeneous graphs contain more detailed information and rich semantics with complex connections among multityped nodes. Taking the bibliographic network in Figure 1 as an example, it contains three types of nodes (Author, Paper and Subject) as well as two types of edges (Write and Belongto). Besides, the individual nodes themselves also carry abundant attribute information (e.g., paper textual contents). Due to the diversity of node and edge types, the heterogeneous graph itself becomes more complex, and the diverse (direct or indirect) connections between nodes also convey more semantic information. In heterogeneous graph studies, metapath [SHYYW11] has been widely used to represent the composite relations with different semantics. As illustrated in Figure 2, the relations between paper nodes can be expressed by PAP and PSP which represent papers written by the same author and papers belonging to the same subject, respectively. GNNs initially proposed for the homogeneous graphs may encounter great challenges to handle these relations with different semantics in heterogeneous graphs.
To address the above challenges, we propose a novel metapath based unsupervised graph neural network model for heterogeneous graphs, namely Heterogeneous Deep Graph Infomax (HDGI). In summary, our contributions in this paper can be summarized as follows:

This paper presents the first model to apply mutual information maximization to representation learning in heterogeneous graphs.

Our proposed method, HDGI, is a novel unsupervised graph neural network with the attention mechanism. It handles graph heterogeneity by utilizing an attention mechanism on metapaths and deals with the unsupervised settings by applying mutual information maximization.

Our experiments demonstrate that the representations learned by HDGI are effective for both node classification tasks and clustering tasks. Moreover, its performance can also beat stateoftheart comparative graph neural network models, where they have the additional supervised label information.
The rest of this paper is organized as follows. We discuss the related work in Section II. In Section III, we present the problem formulation along with important terminologies used in our method. We propose HDGI in Section IV. We present the experiments results and analyses in Section V. Finally, we conclude the paper in Section VI.
2 Related Work
Graph representation learning. Graph representation learning has become a nontrivial topic [CWPZ18] because of the ubiquity of graphs in the real world. As a data type containing rich structural information, many models[GL16, TQWZYM15] acting on graphs learn the representations of nodes based on the structure of the graph. DeepWalk [PAS14] uses the set of random walks over the graph in SkipGram to learn node embeddings. Several methods [MPJZW16, WCWPZY17] attempt to retrieve structural information through the matrix factorization. However, all the above methods are proposed for homogeneous graphs.
Heterogeneous graph learning. In order to handle the heterogeneity of graphs, metapath2vec [DCS17] samples random walks under the guidance of metapaths and learns node embeddings through the skipgram in heterogeneous graphs. HIN2Vec [FCL17] learns the embedding vectors of nodes and metapaths simultaneously while conducts prediction tasks. Wang et al. [WJSWCYY19] consider the attention mechanism in heterogeneous graph learning, where information from multiple metapath defined connections can be learned effectively. From the perspective of attributed graphs, SHNE [ZSC19] captures both structural closeness and unstructured semantic relations through joint optimization of heterogeneous SkipGram and deep semantic encoding.
Graph neural network
. With the success of deep learning in the recent period, graph neural networks (GNNs)
[JiaweiZhang2019Tutorial]have made a lot of progress in graph representation learning. The core idea of GNN is to aggregate the feature information of the neighbors through neural networks to learn the new features that combine the independent information of the node and corresponding structural information in the graph. Most successful GNNs are based on supervised learning including GCN
[KW17], GAT [VCCRLB18], GraphRNN [YYRHL18], SplineCNN [fey2018splinecnn], AdaGCN [sun2019adagcn] and ASGCN [huang2018adaptive]. The unsupervised learning GNNs can be mainly divided into two categories, i.e., random walkbased [PAS14, GL16, KW16, DN17, HYL17] and mutual informationbased [velivckovic2018deep].3 Problem Formulation
In this section, we define the concepts of heterogeneous graph and metapath based adjacency matrix and formulate the problem of heterogeneous graph representation learning.
Definition 3.1 (Heterogeneous Graph (HG))
A heterogeneous graph is defined as with a node type mapping function and an edge type mapping function . Each node belongs to one particular node type in the node type set , and each edge belongs to a particular edge type in the edge type set . The sets of node types and edge types in heterogeneous graphs have the property that .
Problem Definition. (Heterogeneous Graph Representation Learning): Given a heterogeneous graph and the set of node feature vectors , the representation learning task in is to learn a low dimensional node representation which can contain both structural information from and node attributes from . The learned representation H can be applied to the downstream graphrelated tasks such as node classification and node clustering, etc. Note that we only focus on learning the representations of one specific type of node in this paper. We can represent such a set of nodes as the targettype nodes .
In a heterogeneous graph, two neighbor nodes can be connected by different types of edge. Metapaths, which represent node classes and edge types between two neighboring nodes in a HG, have been proposed to model such rich information [SHYYW11]. Metapath is a wellknown concept used in graph studies, and we will not reintroduce its definition in this paper. Formally, we can represent the set of meta paths used in this paper as , where denotes the th meta path type. For example, in Figure 2, PaperAuthorPaper (PAP) and PaperSubjectPaper (PSP) are two types of metapaths between papers.
Definition 3.2 (Metapath based Adjacency Matrix)
For metapath definition , if there exist a metapath between node and , we call that and are “connected neighbors” based on . Such neighborhood information can be represented by a metapath based adjacent matrix , where if , are connected by metapath and otherwise.
4 Hdgi Methodology
A highlevel illustration of the proposed Heterogeneous Deep Graph Infomax (HDGI) model is shown in Figure 3. We summarize the notations used for model description in Table LABEL:tab:notation. In the following, we elaborate the four major components of HDGI: (1) metapath based local representation encoder, (2) global representation encoder, (3) negative samples generator and (4) mutual information based discriminator.
Symbol  Interpretation 

Metapath  
Metapath based adjacency matrix  
The set of node feature vectors  
X  The initial node feature matrix 
The set of nodes with the target type  
The number of nodes in  
The given heterogeneous graph  
Mutual information based discriminator  
Negative samples generator  
Global representation encoder  
The graphlevel summary vector  
Nodelevel representations  
Semanticlevel attention vector  
Attention weight of metapath  
Final negative nodes representations  
H  Final positive nodes representations 
4.1 Hdgi Architecture Overview
The input of HDGI should be a heterogeneous graph along with the set of node feature vectors and the metapath set . Based on the original graph and the metapath set, the set of metapath based adjacency matrices can be calculated. The metapath based local representation encoder is a hierarchical structure: learning individual node representations in terms of every metapath based adjacency matrix respectively and then aggregating them through semanticlevel attention. With the support of the output node representation H from the metapath based local representation encoder, the global representation encoder will output a graphlevel summary vector . Negative samples generator is responsible for generating negative nodes for the graph , and these negative nodes along with the positive nodes from will be used to train the discriminator with the object to maximize mutual information between positive nodes and the graphlevel summary vector .
4.2 Metapath based local representation encoder
The metapath based node encoder has a twolevel structure. We first derive a node representation from each metapath based adjacency matrix , respectively. After that, the node representations based on all of are aggregated by an attention mechanism.
4.2.1 Nodelevel learning
Each of can be viewed as a homogeneous graph. At this step our target is to derive a node representation containing the information of initial node feature and . The initial node feature matrix X can be constructed by stacking the feature vectors in . In HDGI, we try to use GCN [KW17] and GAT [VCCRLB18] as components in the local representation encoder respectively.
Graph Convolutional Network (GCN) [KW17] introduces a spectral graph convolution operator for the graph representation learning. GCN proposes a firstorder approximation, where the node representations learned by GCN will be:
(1) 
where and is the diagonal matrix. Matrix is the filter parameter matrix, which is not shared between different .
Graph Attention Neural Network (GAT) [VCCRLB18]
effectively updates the nodes representations by aggregating the information from their neighbors including the selfneighbor. The learned hidden representation of node can be represented
(2) 
Where W
is a metapath specific weight matrix of the shared linear transformation and
is the set of based neighbors of node is the attention weight between two connected nodes based on . is the number of heads in the multihead attention mechanism.In the experiment section, we will show the performance along with the analysis of using these two GNNs as the nodelevel encoder.
For each metapath , a nodelevel encoder:
(3) 
will be learned in order to output the highlevel representation . After the nodelevel learning, we can obtain the set of node representations based on metapath connections with different semantics.
4.2.2 Semanticlevel learning
The representations learned based on the structural information of each metapath contains only the semanticspecific information in heterogeneous graphs, and in order to aggregate the more general representations of the nodes, we need to combine these representations . The key issue to accomplish this combination is exploring how much each metapath should contribute to the final representations. In other words, we need to learn what weights should be assigned to different metapaths. Here we add a semantic attention layer to learn the weights that each metapath should be assigned:
(4) 
Then fuse the representations of multiple semantics according to the learned weights . Our semantic attention layer is inspired by HAN [WJSWCYY19]
, but the learned weights of metapaths should make the final representations meet the fact that the node belongs to the original graph without any bias from known labels. HAN utilizes classification crossentropy as the loss function, the learning direction is guided by known labels in the training set. However, the attention weights learned in
HDGI are guided by the binary crossentropy loss which indicates whether the node belongs to the original graph. Therefore, the weights learned in HDGI serve for the existence of a node, and because no classification label involves, the weights get no bias from the known labels.In order to make representations based on different metapaths comparable, we first need to transform each node’s representation with a linear transformation, parameterized by a shared weight matrix
and a shared bias vector
. The importance of the representations based on different metapaths will be measured by a shared attention vector . The importance of the metapath can be calculated as:(5) 
According to the importance of metapaths, we will normalize them using the softmax function:
(6) 
Once obtained, the weights of different metapaths are used as coefficients to conduct a linear combination of representations corresponding to them as follows:
(7) 
The representations H serve as the final output local features. It should be mentioned that all parameters in the metapath based local representation encoder are shared for positive nodes and negative nodes generated by the negative samples generator we will introduce later. The global representation encoder will also leverage the representations H to output the graphlevel summary which will be described in the following part.
4.3 Global Representation Encoder
The learning object of HDGI is to maximize the mutual information between local representations and the global representation. The local representations of nodes are included in H, and we need the summary vector to represent the global information of the entire heterogeneous graph. Based on H, we examined three candidate encoder functions:
Averaging encoder function. Our first candidate encoder function is the averaging operator, where we simply take the mean of the node representations to output the graphlevel summary vector :
(8) 
Pooling encoder function
. In this pooling encoder function, each node’s vector will be independently fed through a fullyconnected layer. An elementwise maxpooling operator has applied to summary the information from the nodes set:
(9) 
where denotes the elementwise max operator and
is a nonlinear activation function.
Set2vec encoder function. The final encoder function we examine is Set2vec [VBK16] which is based on an LSTM architecture. Because the original set2vec in [VBK16] works on ordered node sequences, but here we need a summary of the graph concluding comprehensive information from each node instead of merely graph structure. Therefore, we apply the LSTMs to a random permutation of the node’s neighbor on an unordered set.
Among these functions, the simple averaging function achieves the best performance in our experiments. We report the results based on different functions in Figute 5.
4.4 Hdgi Learning
4.4.1 Negative samples generator
The negative samples generator is responsible for generating negative samples (nodes do not exist in the original graph), which will be used to train the mutual information based discriminator. DIM [HFMGBTB19] produces negative patch representations by simply using another image from the training set as a fake input. However, the heterogeneous graph representation learning tasks we face normally are the singlegraph setting. Here, we borrow the idea [velivckovic2018deep] and extend it to heterogeneous graphs.
As our target is to maximize the mutual information between positive nodes and the graphlevel summary vector, the generated negative samples will affect the structural information captured by the model. In this way, we need highquality negative samples that can keep the structural information precisely. In heterogeneous graph , we have rich and complex structural information from the set of metapath based adjacency matrices. In our negative samples generator:
(10) 
we will keep all metapath based adjacency matrices unchanged which can make the overall structure of stable. Then we shuffle the rows of the initial node feature matrix X, which changes the index of nodes in order to corrupt the nodelevel connections among them. According to the spectral theory, the structure of the whole graph does not change, but the initial feature corresponding to each node has changed. We provide a simple example to illustrate the procedure of generating negative samples in Figure 4.
4.4.2 Mutual information based discriminator
According to the proof in Mutual Information Neural Estimation
[BBROBCD18], the mutual information can be estimated by gradient descent over neural networks. Here, we estimate the mutual information by training a discriminator to distinguish between and . The sample is denoted as positive because node belongs to the original graph, and is denoted as negative as the node is the generated fake one. The discriminatoris a binary classifier:
(11) 
Based on the relationship [HFMGBTB19] between JensenShannon divergence and the mutual information, we can maximize the mutual information with the binary crossentropy loss of the discriminator:
(12) 
The above loss can be optimized through the gradient descent, and the representations of nodes can be learned when the optimization is completed.
5 Evaluation
In this section, we evaluate the proposed HDGI framework in three realworld heterogeneous graphs. We first introduce the datasets and experimental settings. Then we report the model performance as compared to other stateoftheart competitive methods. The evaluation results show the superiority of our developed model.
Dataset  Nodetype  # Nodes  Edgetype  # Edges  Metapath 

ACM  Paper (P)  3025  PaperAuthor  9744  PAP 
Author (A)  5835  PaperSubject  3025  PSP  
Subject (S)  56  
IMDB  Movie (M)  4275  MovieActor  12838  MAM 
Actor (A)  5431  MovieDirector  4280  MDM  
Director (D)  2082  Moviekeyword  20529  MKM  
Keyword (K)  7313  
DBLP  Author (A)  4057  AuthorPaper  19645  APA 
Paper (P)  14328  PaperConference  14328  APCPA  
Conference (C)  20  PaperTerm  88420  APTPA  
Term (T)  8789 
5.1 Datasets
We evaluate the performance of HDGI in three heterogeneous graphs, and the detailed descriptions of them are shown in Table 2.

DBLP: The DBLP dataset we use comes from [GLFSH09]. We choose Author as the target node, and authors can be divided into 4 areas: database, data mining, information retrieval, and machine learning. We will use the area an author belongs to as the label. The initial features of the target nodes are the bagofwords embeddings based on profiles. The metapaths we defined in DBLP are AuthorPaperAuthor (APA), AuthorPaperConferencePaperAuthor (APCPA) and AuthorPaperTermPaperAuthor (APTPA).

ACM: ACM dataset is proposed by [WJSWCYY19]. The target nodes we choose are Papers that can be categorized into 3 classes including database, wireless communication, data Mining. We extract 2 metapaths from this graph: PaperAuthorPaper (PAP) and PaperSubjectPaper (PSP). The feature of the ACM dataset is the TFIDFbased embedding of paper keywords and the dimension is 1870.

IMDB: It is a knowledge graph about movies. Movies belonging to three categories (Action, Comedy, and Drama) will be used as target nodes, and the metapaths we choose are MovieActorMovie (MAM), MovieDirectorMovie (MDM) and MovieKeywordMovie (MKM). The feature of the IMDB dataset is composed of {color, title, language, keywords, country, rating, year} with a TFIDF encoding. The dimension of the IMDB movie node is 6334.
Available data  X  A  X, A  X, A, Y  

Dataset  Train  Metric  Raw Feature  Metapath2vec  DeepWalk  DeepWalk+F  DGI  HDGIA  HDGIC  GCN  GAT  HAN 
ACM  20%  MicroF1  0.8590  0.6125  0.5503  0.8785  0.9104  0.9178  0.9227  0.9250  0.9178  0.9267 
MacroF1  0.8585  0.6158  0.5582  0.8789  0.9104  0.9170  0.9232  0.9248  0.9172  0.9268  
80%  MicroF1  0.8820  0.6378  0.5788  0.8965  0.9175  0.9333  0.9379  0.9317  0.9250  0.9400  
MacroF1  0.8802  0.6390  0.5825  0.8960  0.9155  0.9330  0.9379  0.9317  0.9248  0.9403  
DBLP  20%  MicroF1  0.7552  0.6985  0.2805  0.7163  0.8975  0.9062  0.9175  0.8192  0.8244  0.8992 
MacroF1  0.7473  0.6874  0.2302  0.7063  0.8921  0.8988  0.9094  0.8128  0.8148  0.8923  
80%  MicroF1  0.8325  0.8211  0.3079  0.7860  0.9150  0.9192  0.9226  0.8383  0.8540  0.9100  
MacroF1  0.8152  0.8014  0.2401  0.7799  0.9052  0.9106  0.9153  0.8308  0.8476  0.9055  
IMDB  20%  MicroF1  0.5112  0.3985  0.3913  0.5262  0.5728  0.5482  0.5893  0.5931  0.5985  0.6077 
MacroF1  0.5107  0.4012  0.3888  0.5293  0.5690  0.5522  0.5914  0.5869  0.5944  0.6027  
80%  MicroF1  0.5900  0.4203  0.3953  0.6017  0.6003  0.5861  0.6592  0.6467  0.6540  0.6600  
MacroF1  0.5884  0.4119  0.4001  0.6049  0.5950  0.5834  0.6646  0.6457  0.6550  0.6586 
5.2 Experimental Setup
There are many ways to measure the quality of learned representations, and the most commonly used tasks are node classification [PAS14, GL16, HYL17b] and node clustering [DCS17, WJSWCYY19] in graphrelated research works. We evaluate HDGI from both two kinds of tasks.
5.2.1 Comparison methods
We compare our method HDGI with the following stateoftheart methods including both supervised and unsupervised methods:
Unsupervised methods

Raw Feature: It represents the bagofwords embedding, and we will directly test them in tasks.

Metapath2vec [DCS17]: A metapath based heterogeneous graph embedding method, but it can only handle specific one metapath.

DeepWalk [PAS14]: A random walk based graph embedding method, but it is designed to deal with homogeneous graph.

DeepWalk+Raw Feature(DeepWalk+F): We concatenate the embeddings learned from DeepWalk and the bagofwords embeddings as the final representations.

DGI [velivckovic2018deep]: A mutual information based unsupervised learning method which is proposed for homogeneous graph.

HDGIC: The proposed method which uses graph convolutional network to capture local representations.

HDGIA: The proposed method which uses attention mechanism (GAT [VCCRLB18]) to learn local representations.
Supervised methods

GCN [KW17]: GCN is a semisupervised methods for the node classification in homogeneous graphs.

GAT [VCCRLB18]: GAT applies the attention mechanism on homogeneous graphs which requires supervised setting.

HAN [WJSWCYY19]: HAN employs nodelevel attention and semanticlevel attention to capture the information from all metapaths.
For methods designed for homogeneous graphs including DeepWalk, DGI, GCN, GAT, we test the graph ignoring the heterogeneity and graphs constructed from every metapath based adjacency matrix respectively, then report the best result. Metapath2vec can only handle one kind of metapath, thus we test all metapaths for it and report the best results.
5.2.2 Reproducibility
For the proposed HDGI including HDGIC and HDGIA, we optimize the model with Adam [KB15]. The dimension of nodelevel representations in HDGIC is set as 512 and the dimension of is set as 8. For HDGIA, we set the dimension of nodelevel representations as 64 and the attention head is set as 4. The dimension of
is set as 8 as well. We employ Pytorch to implement our model and conduct experiments in the server with 4 GTX1028ti GPUs. Code is available at
https://github.com/YuxiangRen/HeterogeneousDeepGraphInfomax.Data  ACM  DBLP  IMDB  

Method  NMI  ARI  NMI  ARI  NMI  ARI 
DeepWalk  25.47  18.24  7.40  5.30  1.23  1.22 
Raw Feature  32.62  30.99  11.21  6.98  1.06  1.17 
DeepWalk+F  32.54  31.20  11.98  6.99  1.23  1.22 
Metapath2vec  27.59  24.57  34.30  37.54  1.15  1.51 
DGI  41.09  34.27  59.23  61.85  0.56  2.6 
HDGIA  57.05  50.86  52.12  49.86  0.8  1.29 
HDGIC  54.35  49.48  60.76  62.67  1.87  3.7 
5.3 Results
5.3.1 Node classification task
In the node classification task, we will train a logistic regression classifier for unsupervised learning methods, while the supervised methods can output the classification result as endtoend models. We conduct the experiments with two different trainingratios (20% and 80%). To keep the results stable, we repeat the classification process for 10 times and report the average MacroF1 and MicroF1 of all methods in Table 3.
In Table 3, we can observe that HDGIC outperforms all other unsupervised learning methods. When compared with the supervised learning methods but designed for homogeneous graphs like GCN and GAT, HDGI can perform much better as well which proves that the type information and semantic information are very important and need to be handled carefully instead of directly ignoring them in heterogeneous graphs. HDGI is also competitive with the result reported from the supervised model HAN which is designed for heterogeneous graphs. The reason should be that HDGI can capture more global structural information when the mutual information plays a strong role in reconstructing the representation, while supervised loss based GNNs overemphasize the direct neighborhoods [velivckovic2018deep]. This, on the other hand, also suggests that the features learned through supervised learning in graph structures may have limitations, either from the structure or from a taskbased preference. These limitations can affect learning representations from a more general perspective badly.
5.3.2 Node clustering task
In the node clustering task, we use the KMeans to conduct the clustering based on the learned representations. The number of clusters is set as the number of the node classes. We will not use any label in this unsupervised learning task and make the comparison among all unsupervised learning methods. To keep the results stable, we also repeat the clustering process for 10 times and report the average NMI and ARI of all methods in Table 3. DeepWalk can not perform well because they are not able to handle the heterogeneity of graphs. Metapath2vec can not handle diversity semantic information simultaneously which makes the representations not effective enough. The verification based on node clustering tasks also demonstrates that HDGI can learn effective representation considering the structural information, the semantic information and the node independent information simultaneously.
5.3.3 HDGIA vs HDGIC
From the comparison between HDGIC and HDGIA in node classification tasks, the difference in results between them reflects some interesting things. HDGIC has better performance than HDGIA in all experiments, which means that the graph convolution works better than the attention mechanism in capturing local representation. We insist that the reason is that the graph attention mechanism is strictly limited to the direct neighbors of nodes, the graph convolution considering hierarchical dependencies can see farther than the graph attention. This analysis can also be verified by the results of the clustering task.
5.3.4 Comparison between different global representation encoder functions
We present the results of HDGIC with different global representation encoder functions working on the node classification task in Figure 5. We can find the simple average function performs the best compared with other functions. However, we can also find that this advantage is very subtle. In fact, it can be said that each function can perform well on our experimental datasets. But for some of the larger and more complex heterogeneous graphs, our view is consistent with DGI [velivckovic2018deep] that a specified and sophisticated function may perform better. The selection and design of the global encoder function for heterogeneous graphs with different scales and structures is an open question worthy of further discussion in the future.
6 Conclusion
In this paper, we propose an unsupervised graph neural network, HDGI, which learns node representations in heterogeneous graphs. HDGI combines several stateoftheart techniques. It employs convolutionstyle GNNs along with a semanticlevel attention mechanism to capture individual node local representations. Through maximizing the localglobal mutual information by gradient descent over neural networks, HDGI learns highlevel node representations containing graphlevel structural information. It additionally exploits the structure of metapath to analyze the connection semantics in heterogeneous graphs. Node attributes are fused into representations through the localglobal mutual information maximization simultaneously. We demonstrate the effectiveness of learned representations across a variety of node classification and clustering tasks in three heterogeneous graphs. HDGI is particularly competitive in node classification tasks with stateofart supervised methods, where they have the additional supervised label information. We are optimistic that mutual information maximization will be a promising future direction for unsupervised representation learning.
References
Comments
There are no comments yet.