In information retrieval, graph neural network (GNN) as graph learning and representation method has been applied in recommendation (Chang et al., 2020; Wang et al., 2019b; Chen et al., 2021b; Chang et al., 2021; Zhao et al., 2017) and knowledge representation (Gong et al., 2020; Cao et al., 2021; Hao et al., 2020; Mei et al., 2018). Most GNNs focus on homogeneous graphs, while more and more researches (Schlichtkrull et al., 2018; Wang et al., 2019c; Jin et al., 2020; Fang et al., 2019; Yu et al., 2021) show that real world with complex interactions, e.g., social network (Tang et al., 2008; Wang et al., 2019a), can be better modeled by heterogeneous graphs (a.k.a., heterogeneous information networks). Taking recommender system as an example, it can be regarded as a bipartite graph consisting of users and items, and a lot of auxiliary information also has a complex network structure, which can be naturally modeled as a heterogeneous graph. Besides, some works (Guo and Liu, 2015; Liu et al., 2020; Bi et al., 2020; Jiang et al., 2018; Ghazimatin, 2020; Fan et al., 2019; Li et al., 2020; Chen et al., 2021a) have achieved SOTA performance by designing heterogeneous graph neural network (HGNN). In fact, HGNNs can utilize the complex structure and rich semantic information (Wang et al., 2020), and have been widely applied in many fields, such as e-commerce (Zhao et al., 2019; Ji et al., 2021b), and security (Sun et al., 2020; Hu et al., 2019).
|Heterogeneous Graph Transformation||Homogenization of the heterogeneous graph||Relation subgraph extraction||Meta-path subgraph extraction|
|Heterogeneous Message Passing||Direct-aggregation||Dual-aggregation|
|RGCN (Schlichtkrull et al., 2018), HGConv (Yu et al., 2020)||HAN (Wang et al., 2019c), HPN (Ji et al., 2021a)|
However, it is increasingly difficult for researchers in the field to compare existing methods and contribute with novel ones. The reason is that previous evaluations are conducted from the model-level sight, and we cannot accurately know the importance of each component due to diverse architecture designs and applied scenarios. To evaluate them from the sight of module-level, we first propose a unified framework of existing HGNNs that consists of three key components through systematically analyzing their underlying graph data transformation and aggregation procedures, as shown in Figure 1 (right). The first component Heterogeneous Linear Transformation is to map features to a shared feature space and common for general HGNNs. Summarizing the transformed graphs used in different HGNNs, we abstract the second component Heterogeneous Graph Transformation containing relation subgraph extraction, meta-path subgraph extraction, and homogenization of the heterogeneous graph. With that, we can explicitly decouple the selection procedure of receptive field and message passing procedure. Hence the third component Heterogeneous Message Passing Layer can focus on the key procedure involving diverse graph convolution layers. As shown in Table 1, our framework both categorizes existing approaches and facilitates the exploration of novel ones.
With the help of the unified framework, we propose to define a design space for HGNNs, which consists of a Cartesian product of different design dimensions following GraphGym (You et al., 2020). In GraphGym, there have been analysis results of design dimensions for GNNs. To figure out whether the guidelines distilled from GNNs are effective, our design space still contains common design dimensions with GraphGym. Besides, to capture heterogeneity, we distill three model families according to Heterogeneous Graph Transformation in our unified framework. Based on the design space, we build a platform Space4HGNN 111https://github.com/BUPT-GAMMA/Space4HGNN, which offers reproducible model implementation, standardized evaluation for diverse architecture designs, and easy-to-extend API to plug in more architecture design options. We believe Space4HGNN can greatly facilitate the research field of HGNNs. Specifically, we could check the effect of the tricky designs or architecture design quickly, innovate the HGNN models easily, and apply HGNN in other interesting scenarios. In addition, the platform can be used as the basis of neural architecture search for HGNNs in future work.
With the platform Space4HGNN, we conduct extensive experiments and aim to analyze the design dimensions. We first evaluate common design dimensions used in GraphGym with uniform random search, and find that they are partly effective in HGNNs. More importantly, to accurately judge the diverse architecture designs in HGNNs, we comprehensively analyze unique design dimensions in HGNNs. And we sum up the following insights:
Different model families have different suitable scenarios. The meta-path model family has an advantage in node classification task, and the relation model family performs outstandingly in link prediction task.
The preference for different design dimensions may be opposite in different tasks. For example, node classification task prefers to apply L2 Normalization and remove Batch Normalization. However, the better choices of the same datasets for link prediction task are the opposite.
We should select graph convolution carefully which varies greatly across datasets. Besides, the design dimensions like the number of message passing layers, hidden dimension and dropout are all important.
Finally, we distill a condensed design space according to the analysis results, whose scale is reduced by 500 times. We evaluate it in a new benchmark HGB (Lv et al., 2021) and demonstrate the effectiveness of the condensed design space.
And we sum up the following contributions:
As far as we know, we are the first to propose a unified framework and define a design space for HGNNs. They offer us a module-level sight and help us evaluate the influences of different design dimensions, such as high-level architectural designs, and design principles.
We release a platform Space4HGNN for design space in HGNNs, which offers modularized components, standardized evaluation, and reproducible implementation of HGNN. We conduct extensive experimental evaluations to analyze HGNNs comprehensively, and provide findings behind the results based on Space4HGNN. It allows researchers to find more interesting findings and explore more robust and generalized models.
Following the findings, we distill a condensed design space. Experimental results on a new benchmark HGB (Lv et al., 2021) show that we can easily achieve state-of-the-art performance with a simple random search in the condensed space.
2. Related Work
|The edge from node to node|
|The neighbors of node|
The hidden representation of a node
|The trainable weight matrix|
|The node type mapping function|
|The edge type mapping function|
|The message function|
2.1. Heterogeneous Graph Neural Network
Different from GNNs, HGNNs need to handle the heterogeneity of structure and capture rich semantics of heterogeneous graphs. According to the strategies of handling heterogeneity, HGNN can be roughly classified into two categories: HGNN based on one-hop neighbor aggregation (similar to traditional GNN) and HGNN based on meta-path neighbor aggregation (to mine semantic information), shown in Table1.
2.1.1. HGNN based on one-hop neighbor aggregation
To deal with heterogeneity, this kind of HGNN usually contains type-specific convolution. Similar to GNNs, the aggregation procedure occurs in one-hop neighbors. As earliest work and an extension of GCN (Kipf and Welling, 2016), RGCN (Schlichtkrull et al., 2018) assigns different weight matrices to different relation types and aggregates one-hop neighbors. With many GNN variants appearing, homogeneous GNNs inspire more HGNNs and then HGConv (Yu et al., 2020) dual-aggregate one-hop neighbors based on GATConv (Veličković et al., 2017). A recent work SimpleHGN (Lv et al., 2021) designs relation-type weight matrices and embeddings to characterize the heterogeneous attention over each edge. Besides, some earlier models, like HGAT (Linmei et al., 2019), HetSANN (Hong et al., 2020), HGT(Hu et al., 2020), modify GAT (Veličković et al., 2017) with heterogeneity by assigning heterogeneous attention for either nodes or edges.
2.1.2. HGNN based on meta-path neighbor aggregation
Another class of HGNNs is to capture higher-order semantic information with hand-crafted meta-paths. Different from the previous, aggregation procedure occurs in neighbors connected by meta-path. As a pioneering work, HAN (Wang et al., 2019c) first uses node-level attention to aggregate nodes connected by the same meta-path and utilizes semantic-level attention to fuse information from different meta-paths. Because the meta-path subgraph ignores all the intermediate nodes, MAGNN (Fu et al., 2020) aggregates all nodes in meta-path instances to ensure that information will not be missed. Though meta-paths contain rich semantic information, the selection of meta-paths needs human prior and determines the performance of HGNNs. Some works like GTN (Yun et al., 2019) learn meta-paths automatically to construct a new graph. HAN (Wang et al., 2019c) and HPN (Ji et al., 2021a), which are easy to extend, will be included in our framework for generality.
3. A Unified Framework of Heterogeneous Graph Neural Network
As shown in Table 1, we categorize many mainstream HGNN models, which could be applied in many scenarios, e.g, link prediction (Hao et al., 2020; Mei et al., 2018) and recommendation (Bi et al., 2020; Jiang et al., 2018; Ghazimatin, 2020). Through analyzing the underlying graph data and the aggregation procedure of existing HGNNs, we propose a unified framework of HGNN that consists of three main components:
Heterogeneous Linear Transformation maps features or representations with heterogeneity to a shared feature space.
Heterogeneous Graph Transformation offers four transformation methods for heterogeneous graph data to select the receptive field.
Heterogeneous Message Passing Layer defines two aggregation methods suitable for most HGNNs.
3.1. Heterogeneous Linear Transformation
Due to the heterogeneity of nodes, different types of nodes have different semantic features even different dimensions. Therefore, for each type of nodes (e.g., node with node type ), we design a type-specific linear transformation to project the features (or representations) of different types of nodes to a shared feature space. The linear transformation is shown as follows:
where and are the original and projected feature of node, respectively. As shown in Figure 2
(a), we transform node features with a type-specific linear transformation for nodes with features. Nodes without features or full of noise could be assigned embeddings as trainable vectors, which is equivalent to assigning them with a one-hot vector combined with a linear transformation.
3.2. Heterogeneous Graph Transformation
In previous work, aggregation based on one-hop neighbor usually applies the graph convolution layer in the original graph, which implicitly selects the one-hop (relation) receptive field. And aggregation based on meta-path neighbor is usually done on constructed meta-path subgraphs, which explicitly selects the multi-hop (meta-path) receptive field. Relation subgraphs are special meta-path subgraphs (note that the original graph is a special case of relation subgraphs). To unify both, we propose a component to abstract the selection procedure of the receptive field, which determines which nodes are aggregated. Besides, the component decouples the selection procedure of receptive field and message passing procedure introduced in the following subsection.
As shown in Figure 2 (b), we therefore designate a separate stage called Heterogeneous Graph Transformation for graph construction, and categorize it into (i) relation subgraph extraction that extracts the adjacency matrices of the specified relations, (ii) meta-path subgraph extraction that constructs the adjacency matrices based on the pre-defined meta-paths, (iii) mixed subgraph extraction that builds both kinds of subgraphs, (iv) homogenization of the heterogeneous graph (but still preserving and for node and edge type mapping). For relation or meta-path extraction, we could construct subgraphs by specifying relation types or pre-defined meta-paths.
3.3. Heterogeneous Message Passing Layer
In Section 2.1, we introduce a conventional way to classify HGNNs. However, this classification did not find enough commonality from the implementation perspective, resulting in difficulties in defining design space and searching for new models. Therefore, we instead propose to categorize models by their aggregation methods.
|GAT (Veličković et al., 2017)|
|HGAT (Linmei et al., 2019)|
|HetSANN (Hong et al., 2020)|
|HGT (Hu et al., 2020)|
|Simple-HGN (Lv et al., 2021)|
The aggregation procedure is to reduce neighbors directly without distinguishing node types. The basic baselines of HGNN models are GCN, GAT, and other GNNs used in the homogeneous graph. A recent work(Lv et al., 2021) shows that the simple homogeneous GNNs, e.g., GCN and GAT, are largely underestimated due to improper settings.
As shown in Figure 2 (c: the left one), we will explain it under the message passing GNNs formulation and take GAT (Veličković et al., 2017) as an example. The message function is . The feature of node in -th layer is defined as
where is a trainable weight matrix, is neighbors of node and is the normalized attention coefficients between node and , defined by that:
The correlation of node with its neighbor is represented by attention coefficients . Changing the form of yields other heterogeneous variants of GAT, which we summarize in Table 3.
Following (Yu et al., 2020), we define two parts of dual-aggregation: micro-level (intra-type) and macro-level (inter-type) aggregation. As shown in Figure 2 (c: the right one), micro-level aggregation is to reduce node features within the same relation, which generate type-specific features in relation/meta-path subgraphs, and macro-level aggregation is to reduce type-specific features across different relations. When multiple relations have the same destination node types, their type-specific features are aggregated by the macro-level aggregation.
Generally, each relation/meta-path subgraph utilizes the same micro-level aggregation (e.g., graph convolution layer from GCN or GAT). In fact, we can apply different homogeneous graph convolutions for different subgraphs in our framework. The multiple homogeneous graph convolutions combined with macro-level aggregation is another form of heterogeneous graph convolution compared with heterogeneous graph convolution in direct-aggregation. There is a minor difference between the heterogeneous graph convolution of direct-aggregation and that of dual-aggregation. We modify Eq. 3 and define it in Eq. 4, where means the neighbors type of node is the same as type of node .
Example: HAN (Wang et al., 2019c) and HGConv (Yu et al., 2020). In HAN, the node-level attention is equivalent to a micro-level aggregation with GATConv, and the semantic-level attention is macro-level aggregation with attention, which is the same with HGConv. The HGConv uses the relation subgraphs, which means aggregating the one-hop neighbors, but HAN extracts multiple meta-path subgraphs, which means aggregating multi-hop neighbors. According to Heterogeneous Graph Transformation in Section 3.2, the graph constructed can be a mixture of meta-path subgraphs and relation subgraphs. So the dual-aggregation can also be operated in a mixture custom of subgraphs to aggregate different hop neighbors.
4. Design Space for Heterogeneous Graph Neural Network
Inspired by GraphGym (You et al., 2020), we propose a design space for HGNN, which is built as a platform Space4HGNN offering modularized HGNN implementation for researchers introduced at last.
4.1. Designs in HGNN
|Batch Normalization||True, False|
|Dropout||0, 0.3, 0.6|
|Activation||Relu, LeakyRelu, Elu, Tanh, PRelu|
|L2 Normalization||True, False|
|Layer Connectivity||STACK, SKIP-SUM, SKIP-CAT|
|Pre-process Layers||1, 2, 3|
|Message Passing Layers||1, 2, 3, 4, 5, 6|
|Post-process Layers||1, 2, 3|
|Learning Rate||0.1, 0.01, 0.001, 0.0001|
|100, 200, 400|
|Hidden dimension||8, 16, 32, 64, 128|
4.1.1. Common Designs with GraphGym
The common designs with GraphGym involves 12 design dimensions, categorized three aspects, intra-layer, inter-layer and training settings. The dimensions with corresponding choices are shown in Table 4.
4.1.2. Unique Design in HGNNs
With the unified framework, we try to transform the modular components into unique design dimensions in HGNNs. According to (Radosavovic et al., 2019)
, a collection of related neural network architectures, typically sharing some high-level architectural structures or design principles (e.g., residual connections), could be abstracted into a model family. With that, we distill three model families in HGNNs.
The homogenization model family uses the direct-aggregation combined with any graph convolutions. Here we use the term homogenization because all HGNNs included here apply direct-aggregation after the homogenization of the heterogeneous graph mentioned in Section 3.2. The relation model family applies relation subgraph extraction and dual-aggregation. The meta-path model family applies meta-path subgraph extraction and dual-aggregation. As shown in Table 5, three model families involve three design dimensions with candidate choices.
|Model Family||Homogenization, Relation, Meta-path|
|GCNConv, GATConv, SageConv, GINConv|
|Macro-level Aggregation||Mean, Max, Sum, Attention|
4.2. Space4HGNN: Platform for Design Space in HGNN
We developed Space4HGNN
, a novel platform for exploring HGNN designs. We believe Space4HGNN can significantly facilitate the research field of HGNNs. It is implemented with PyTorch222https://pytorch.org/ and DGL 333https://github.com/dmlc/dgl, using the OpenHGNN 444https://github.com/BUPT-GAMMA/OpenHGNN package. It also offers a standardized evaluation pipeline for HGNNs, much like (You et al., 2020) for homogeneous GNNs. For faster experiments, we offer parallel launching. Its highlights are summarized below.
4.2.1. Modularized HGNN Implementation
The implementation closely follows the GNN design space GraphGym. It is easily extendable, allowing future developers to plug in more choices of design dimensions (e.g., a new graph convolution layer or a new macro-aggregation). Additionally, it is easy to import new design dimensions to Space4HGNN, such as score function in link prediction.
4.2.2. Standardized HGNN Evaluation
Space4HGNN offers a standardized evaluation pipeline for diverse architecture designs and HGNN models. Benefiting from OpenHGNN, we can evaluate diverse datasets in different tasks easily and offer visual comparison results presented in Section 5.
We select the Heterogeneous Graph Benchmark (HGB) (Lv et al., 2021), a benchmark with multiple datasets of various heterogeneity (i.e., the number of nodes and edge types). To save time and submission resources, we report the test performance of the configuration with best validation performance in Table 8. Other experiments are evaluated on a validation set with three random 80-20 training-validation splits. The statistics of HGB are shown in Table 10.
5.2. Evaluation Technique
Our design space covers over 40M combinations, and a full grid search will cost too much. We adapt controlled random search from GraphGym (You et al., 2020) setting the number of random experiments to 264, except that we ensure that every combination of dataset, model family, and micro-aggregation receives 2 hits. We draw bar plots and violin plots of rankings of each design choice following the same practice as GraphGym. As shown in Figure 4, in each subplot, rankings of each design choice are aggregated over all 264 setups via bar plot and violin plot. The bar plot shows the average ranking across all the 264 setups (lower is better). The violin plot indicates the smoothed distribution of the ranking of each design choice over all the 264 setups.
5.3. Evaluation of Design Dimensions Common with GraphGym
5.3.1. Overall Evaluation
Findings aligned with GraphGym:
There is no definitive conclusion for the best number of message passing layers; each dataset has its own best number, the same as what GraphGym observed.
The characteristic of training settings (e.g., optimizer and training epochs) is similar to GraphGym.
Findings different from GraphGym:
A single linear transformation (pre-process layer) is usually enough. We think that this is because our heterogeneous linear transformation is node type-specific which has enough parameters to transform representations.
The widely used activation Relu may no longer be as suitable in HGNNs. Tanh, LeakyReLU, and ELU are better alternatives. PReLU, stood out in GraphGym, is not the best choice in our design space.
Different from GraphGym, we found that Dropout is necessary to get better performance. We think the reason is that parameters specific to node types and relation types lead to over-parametrization.
5.3.2. Task-wise Evaluation
We previously observed that BN yields better performance in general. However, task-wise evaluation in Figure 5 showed that BN is better on link prediction but worse on node classification. Meanwhile, although L2-Norm does not seem to help in overall performance, it actually performs better on node classification but worse on link prediction. We think that BN scales and shifts nodes according to the global information, which may lead to more similar representations and damage the performance of the node classification task, and L2-Norm scales the representation and thus the link score to [-1,1], which may invalidate the Sigmoid of the score function.
5.4. Evaluation of Unique Design Dimensions in HGNNs
How to design and apply HGNN is our core issue. This section analyzes unique design dimensions in HGNNs to describe the characteristics of high-level architecture designs. From the average ranking shown in Figure 6, we can see that the meta-path model family has a small advantage in the node classification task. The relation model family outperforms in aggregated results in all datasets, and the homogenization model family is competitive. For the micro-aggregation design dimension, GCNConv and GATConv are preferable for link prediction task and node classification task, respectively. For the macro-aggregation design dimension, Mean and Sum have a more significant advantage.
5.4.1. The Model Family
To more comprehensively describe the corresponding characteristics of different model families, we analyze the results across datasets as shown in Figure 7 and highlight some findings below.
The meta-path model family helps node classification. In node classification task, the meta-path model family outperforms visibly than the other model families on datasets HGBn-ACM and HGBn-DBLP, where we think some informative and effective meta-paths have been empirically discovered. Some experimental analysis for meta-paths can be found in Appendix D.1.
The meta-path model family does not help link prediction. In previous works, few variants of the meta-path model family were applied to the link prediction task. Although our unified framework can apply the meta-path model family to the link prediction task, the meta-path model family does not perform well on all datasets of link prediction as shown in Figure 7. We think this is because the information from the edges in the original graph is important in link prediction, and the meta-path model family ignores it.
The relation model family is a safer choice in all datasets. From Figure 7
, the relation model family stands out in link prediction task, which confirms the necessity to preserve the edges as well as their type information in the original graph. Compared with the homogenization model family, the relation model family has more trainable parameters which have a linear relationship with the number of edge types. Surprisingly, the relation model family is not very effective in HGBl-Freebase with much heterogeneity, which has 8 node types and 36 edge types. We think that too many parameters lead to over-fitting, which may challenge the relation model family. According to the distribution of ranking, the relation model family has a significantly lower probability of being ranked last. Therefore, the relation model family is a safer choice.
The homogenization model family with the least trainable parameters is still competitive. As shown in Figure 7, the homogenization model family is still competitive against the relation model family on HGBl-IMDB, and even outperforms the latter on HGBn-Freebase and HGBl-LastFM. Therefore, the homogenization model family is not negligible as a baseline even on heterogeneous graphs, which aligned with (Lv et al., 2021).
5.4.2. The Micro-aggregation and the Macro-aggregation Design Dimensions
The existing HGNNs are usually inspired by GNNs and apply different micro-aggregation (e.g., GCNConv, GATConv). The micro-aggregation design dimension in our design space brings many variants to the existing HGNNs. As shown in Figure 6, the results of comparison between micro-aggregation vary greatly across tasks. We provide ranking analysis on different datasets in Appendix.
For the macro-aggregation design dimension, Figure 6 shows that Sum has a great advantage in both tasks, which is aligned with the theory that Sum aggregation is theoretically most expressive (Xu et al., 2019). Surprisingly, Attention is not so effective as Sum, and we think the micro-aggregation is powerful enough, resulting that complicated Attention in macro-aggregation is not necessary.
|Design Dimension||Our Condensed Design Space||Condensed Design Space in GraphGym|
|Batch Normalization||True, False||True|
|Activation||ELU LeakyReLU Tanh||PReLU|
|L2 Normalization||True, False||-|
|Layer Connectivity||SKIP-SUM, SKIP-CAT||SKIP-SUM, SKIP-CAT|
|Pre-process Layers||1||1, 2|
|Message Passing Layers||1, 2, 3, 4, 5, 6||2, 4, 6, 8|
|Post-process Layers||1, 2||2, 3|
|Learning Rate||0.1, 0.01||0.01|
|Hidden dimension||64, 128||-|
5.5. Evaluation of Condensed Design Space
Above experiments reveals that it is hard to design a single HGNN model that can guarantee outstanding performance across diverse scenarios in the real world. According to the findings in Section 5.3.1, we condensed the design space to facilitate model searching. Specifically, we remove some bad choices in design dimensions and retain some essential design dimensions (e.g., high-level architectural structures and helpful design principles). The evaluation of HGB shows that a simple random search in the condensed design space can find the best designs. More experimental results compared GraphGym (You et al., 2020) and GraphNAS (Gao et al., 2019) are analyzed in Appendix E.
5.5.1. The Condensed Design Space
For common design dimensions with GraphGym, Table 7 compares the condensed design spaces we and GraphGym proposed. We retain some of the design dimensions same as GraphGym if the findings in Section 5.3.1 are aligned (i.e., layer connectivity, optimizer, and training epochs). We propose our own choices for the different dimensions in conclusion (e.g., Dropout, activation, BN, L2-Norm, etc.). For unique design dimensions in HGNN, we conclude that the micro-aggregation and model family design dimensions vary greatly across datasets or tasks. So we retain all choices in unique design dimensions and aim to find out whether the variants of existing HGNNs could gain improvements in HGB.
The original design space contains over 40M combinations, and the condensed design space contains 70K combinations. So the possible combination of the design dimensions in condensed design space is reduced by nearly 500 times.
5.5.2. Evaluation in Heterogeneous Graph Benchmark (HGB)
To compare with the performance of the standard HGNNs, we evaluate our condensed design space in a new benchmark HGB. We randomly searched 100 designs from condensed design space and evaluated the best design of validation set in HGB. As shown in Table 8, our designs with condensed design space can achieve comparable performance. So we can easily achieve SOTA performance with a simple random search in the condensed design space. Table 9 shows the best designs we found in our condensed design space, which cover the variants of RGCN and HAN. It also confirms that the meta-path model family and the relation model family have great performance in HGB and answers the question “are meta-path or variants still useful in GNNs?” from (Lv et al., 2021). Note that this result does not contradict the conclusion from (Lv et al., 2021), as our design space includes much more components than the vanilla RGCN or HAN model, and proper components can make up shortcomings of an existing model.
In this work, we propose a unified framework of HGNN and define a design space for HGNN, which offers us a module-level sight to evaluate HGNN models. Specifically, we comprehensively analyze the common design dimensions with GraphGym and the unique design dimensions in HGNN. After that, we distill some findings and condense the original design space. Finally, experimental results show that our condensed design space outperforms others, and gains the best average ranking in a benchmark HGB. With that, we demonstrate that focusing on the design space could help drive advances in HGNN research.
For sake of the space, the appendix including preliminary, design space and datasets description, more technical details, and experimental results is provided in: https://anonymous.4open.science/r/Space4HGNN-862F.
- A heterogeneous information network based cross domain insurance recommendation system for cold start users. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2211–2220. Cited by: §1, §3.
- . In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 203–212. Cited by: §1.
- Bundle recommendation with graph convolutional networks. In Proceedings of the 43rd international ACM SIGIR conference on Research and development in Information Retrieval, pp. 1673–1676. Cited by: §1.
- Sequential recommendation with graph neural networks. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 378–387. Cited by: §1.
Graph heterogeneous multi-relational recommendation.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 3958–3966. Cited by: §1.
- Structured graph convolutional networks with stochastic masks for recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 614–623. Cited by: §1.
- DiffMG: differentiable meta graph search for heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Cited by: §E.2.
- Graph neural networks for social recommendation. In The World Wide Web Conference, pp. 417–426. Cited by: §1.
- M-hin: complex embeddings for heterogeneous information networks via metagraphs. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 913–916. Cited by: §1.
- Magnn: metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020, pp. 2331–2341. Cited by: §2.1.2.
Graphnas: graph neural architecture search with reinforcement learning. arXiv preprint arXiv:1904.09981. Cited by: §E.2, §5.5.
- Explaining recommendations in heterogeneous networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2479–2479. Cited by: §1, §3.
- Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. Cited by: Definition A.1.
- Attentional graph convolutional networks for knowledge concept recommendation in moocs in a heterogeneous view. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 79–88. Cited by: §1.
- Automatic feature generation on heterogeneous graph for music recommendation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 807–810. Cited by: §1.
- Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035. Cited by: §A.1, §B.2, Table 1.
- Dynamic link prediction by integrating node vector evolution and local neighborhood representation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1717–1720. Cited by: §1, §3.
- Deep residual learning for image recognition. In , pp. 770–778. Cited by: §B.1.
- An attention-based graph neural network for heterogeneous structural learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 4132–4139. Cited by: Table 1, §2.1.1, Table 3.
- Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 946–953. Cited by: §1.
- Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pp. 2704–2710. Cited by: Table 1, §2.1.1, Table 3.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §B.1.
- Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. Cited by: §B.1, 1st item.
- Heterogeneous graph propagation network. IEEE Transactions on Knowledge and Data Engineering. Cited by: Table 1, §2.1.2.
- Large-scale comb-k recommendation. In Proceedings of the Web Conference 2021, pp. 2512–2523. Cited by: §1.
- Cross-language citation recommendation via hierarchical representation learning on heterogeneous graph. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 635–644. Cited by: §1, §3.
- Learning interaction models of structured neighborhood on heterogeneous information network. arXiv preprint arXiv:2011.12683. Cited by: §1.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §B.2, Table 1, §2.1.1.
- DeepGCNs: can gcns go as deep as cnns?. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §B.1.
- Heterogeneous graph collaborative filtering. arXiv preprint arXiv:2011.06807. Cited by: §1.
Heterogeneous graph attention networks for semi-supervised short text classification.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4821–4830. Cited by: Table 1, §2.1.1, Table 3.
- A heterogeneous graph neural model for cold-start recommendation. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp. 2029–2032. Cited by: §1.
- Are we really making much progress? revisiting, benchmarking, and refining heterogeneous graph neural networks. Cited by: §B.2, Appendix C, §D.1, 3rd item, Table 1, §1, §2.1.1, §3.3.1, Table 3, §5.1, §5.4.1, §5.5.2.
- On link prediction in knowledge bases: max-k criterion and prediction protocols. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 755–764. Cited by: §1, §3.
- Geom-gcn: geometric graph convolutional networks. In International Conference on Learning Representations, Cited by: §D.1.
- On network design spaces for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1882–1890. Cited by: §E.1, §4.1.2.
- Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Cited by: §B.2, Table 1, §1, §2.1.1.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15 (1), pp. 1929–1958. Cited by: §B.1.
- Hgdom: heterogeneous graph convolutional networks for malicious domain detection. In NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium, pp. 1–9. Cited by: §1.
- Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4 (11), pp. 992–1003. Cited by: Definition A.4.
- Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998. Cited by: §1.
- Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §B.2, Table 1, §2.1.1, §3.3.1, Table 3.
- Online user representation learning across heterogeneous social networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 545–554. Cited by: §1.
- Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, pp. 165–174. Cited by: §1.
- A survey on heterogeneous graph embedding: methods. Techniques, Applications and Sources. Cited by: §1.
- Heterogeneous graph attention network. In The World Wide Web Conference, pp. 2022–2032. Cited by: §B.2, Table 1, §1, §2.1.2, §3.3.2.
- How powerful are graph neural networks?. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §B.2, Table 1, §5.4.2.
- Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pp. 5453–5462. Cited by: §B.1.
- Design space for graph neural networks. Advances in Neural Information Processing Systems 33. Cited by: §E.1, §1, §4.2, §4, §5.2, §5.3.1, §5.5.
- Hybrid micro/macro level convolution for heterogeneous graph learning. arXiv preprint arXiv:2012.14722. Cited by: §B.2, §B.2, Table 1, §2.1.1, §3.3.2, §3.3.2.
- Heterogeneous attention network for effective and efficient cross-modal retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1146–1156. Cited by: §1.
Graph transformer networks. Advances in Neural Information Processing Systems 32, pp. 11983–11993. Cited by: §2.1.2.
- Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 635–644. Cited by: §1.
- Intentgc: a scalable graph convolution framework fusing heterogeneous information for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2347–2357. Cited by: §1.
Appendix A Preliminary
a.1. Graph Neural Network
Graph Neural Networks (GNNs) aim to apply deep neural networks to graph-structured data. Here we focus on message passing GNNs which could be implemented efficiently and proven great performance.
Definition A.0 (Message Passing GNNs (Gilmer et al., 2017)).
Message passing GNNs aim to learn a representation vector for each node after -th message passing layers of transformation, and means the output dimension in -th message passing layer. The message passing paradigm defines the following node-wise and edge-wise computation for each layer as:
where means neighbors of node , is a message function defined on each edge to generate a message by combining the features of its incident nodes, and denotes an edge from node to ;
where means neighbors of node , is an update function defined on each node to update the node representation by aggregating its incoming messages using the aggregation function .
Example: GraphSAGE (Hamilton et al., 2017) can be formalized as a message passing GNN, where the message function is and the update function is .
a.2. Heterogeneous Graph
Definition A.0 (Heterogeneous Graph).
A heterogeneous graph, denoted as , consists of a node set and an edge set . A heterogeneous graph is also associated with a node type mapping function and an edge type (or relation type) mapping function . and denote the sets of node types and edge types. Each node has one node type . Similarly, for an edge from node to node , . When or , it is a heterogeneous graph, otherwise it is a homogeneous graph.
Example. As shown in Figure 1 (left), we construct a simple heterogeneous graph to show an academic network. It consists of multiple types of objects (Paper(P), Author (A), Conference(C)) and relations (written-relation between papers and authors, published-relation between papers and conferences).
Definition A.0 (Relation Subgraph).
A heterogeneous graph can also be represented by a set of adjacency matrices , where is the number of edge types . is an adjacency matrix where is non-zero when there is an edge with -th type from node to node . and are numbers of source and target nodes corresponding to the edge type respectively. A relation subgraph of -th edge type is therefore a subgraph whose adjacency matrix is . As shown in Figure 2 (b), the underlying data structure of the academic network in Figure 1 (left) consists of four adjacency matrices.
Definition A.0 (Meta-path (Sun et al., 2011)).
A meta-path is defined as a path in the form of which describes a composite relation between two nodes and , where denotes the -th relation type of meta-path and denotes the composition operator on relations.
Definition A.0 (Meta-path Subgraph).
Given a meta-path , , the adjacency matrix can be obtained by a multiplication of adjacency matrices according relations as
The notion of meta-path subsumes multi-hop connections and a meta-path subgraph is multiple relation subgraphs matrices multiplication shown in Figure 2 (b) (iii). So relation subgraph is a special case of meta-path subgraph which is only composited by a relation subgraph. When meta-path beginning node type and ending node type are the same, meta-path subgraph is a homogeneous graph, otherwise a bipartite graph.
Appendix B Design Space
b.1. Common Design with GraphGym
Same with GNNs, an HGNN contains several Heterogeneous GNN layers, where each layer could have diverse design dimensions. As illustrated in Figure 3, the adopted Heterogeneous GNN layer has an aggregation layer which involves unique design dimensions discussed later, followed by a sequence of modules: (1) batch normalization BN() (Ioffe and Szegedy, 2015); (2) dropout DROP() (Srivastava et al., 2014)
; (3) nonlinear activation function ACT(); (4) L2 Normalization L2-Norm(). Formally, the L-th heterogeneous GNN layer can be defined as:
The layers of message passing, pre-processor and post-processor are supposed to be considered, which are essential design dimensions according to empirical evidence from neural networks. HGNNs face the problems of vanishing gradient, over-fitting and over-smoothing, and the last problem is seen as the obstacles to stack deeper GNN layers. Inspired by ResNet (He et al., 2016) to alleviate the problems, skip connection (Li et al., 2019; Xu et al., 2018) has been proven to significant effect. Therefore, we investigate two choices of skip connections: SKIP-SUM (He et al., 2016) and SKIP-CAT (Huang et al., 2017) with STACK as a basic comparison.
As a part of deep learning, we also want to analyze design dimensions on training settings, like optimizer, learning rate and training epochs. Besides, the hidden dimension is also included here involving the trainable parameters.
b.2. Unique Designs in HGNNs
The Homogenization Model Family
The homogenization model family uses the direct-aggregation combined with any graph convolutions. Here we use the term homogenization because all HGNNs included here apply direct-aggregation after the homogenization of the heterogeneous graph mentioned in Section 3.2. Homogeneous GNNs and heterogeneous variants of GAT mentioned in Section 3.3 all fall into this model family. The homogeneous GNNs are usually evaluated as basic baselines in HGNN papers. Though it losses type information, it is confirmed that the simple homogeneous GNNs can outperform some existing HGNNs (Lv et al., 2021), which means they are nonnegligible and supposed to be seen as a model family. We select four typical graph convolution layers, which are GraphConv (Kipf and Welling, 2016), GATConv (Veličković et al., 2017), SageConv-mean (Hamilton et al., 2017) and GINConv(Xu et al., 2019) as analyzed candidates.
The Relation Model Family
The model family applies relation subgraph extraction and dual-aggregation. The first HGNN model RGCN (Schlichtkrull et al., 2018) is a typical example in relation model family, whose dual-aggregation consists of a micro-level aggregation with SageConv-mean and macro-level aggregation of Sum. HGConv (Yu et al., 2020) is a combination of GATConv and attention. We could get other designs by enumerating the combinations of micro-level and macro-level aggregation. In our experiments, we set the micro-level aggregations the same as graph convolutions in the homogenization model family, and macro-level aggregations are chosen among Mean, Max, Sum, and Attention.
The Meta-path Model Family
The model family applies meta-path subgraph extraction and dual-aggregation. The instance HAN (Wang et al., 2019c) has the same dual-aggregation with HGConv (Yu et al., 2020) in the relation model family but different subgraph extraction. The candidate of micro-level and macro-level aggregations is the same as those in the relation model family.
Appendix C Dataset
We select datasets from the Heterogeneous Graph Benchmark (HGB) (Lv et al., 2021), a benchmark with multiple datasets of various heterogeneity (i.e., the number of nodes and edge types), for node classification and link prediction tasks. The HGB is organized as a public competition, so it does not release the test label to prevent data leakage. Since formal submission to the public leaderboard costs a large amount of time and submission resources, we only report the test performance of the configuration with the best validation performance in Table 8, using the same metrics as in (Lv et al., 2021). Other experiments are evaluated on a validation set with three random 80-20 training-validation splits. The statistics of HGB are shown in Table 10. We select five datasets (DBLP, IMDB, ACM, Freebase, PubMed) for node classification task, six datasets (DBLP, IMDB, ACM, amazon, LastFM, PubMed) for link prediction task.
Appendix D Evaluation of Unique Design Dimensions
d.1. Analysis for Meta-path
In node classification task, the meta-path model family outperforms visibly than the other model families on datasets HGBn-ACM and HGBn-DBLP, where we think some informative and effective meta-paths have been empirically discovered. The micro-aggregation modules are MPNN networks that tend to learn similar representations for proximal nodes in a graph (Pei et al., 2019). Moreover, the meta-path model family aims to bring nodes with the same type topologically closer with meta-path subgraph extraction, hoping that the extracted subgraph is assortative (e.g., citation networks) where node homophily holds (i.e., nodes with the same label tend to be proximal, and vice versa). Based on that, we measure the homophily (Pei et al., 2019) in subgraphs extracted by meta-path , which is defined as
where and represent the label of node and , respectively.
As shown in Table 11, the homophily of homogeneous subgraphs extracted by predefined meta-path in HGBn-ACM and HGBn-DBLP is significantly higher than that in HGBn-PubMed and HGBn-Freebase. For node classification task, the homophily of subgraphs extracted by meta-paths may be a helpful reference for meta-path selection. So for the question “are meta-path or variants still useful in GNNs?” from (Lv et al., 2021), we think that the meta-path model family is still useful with well-defined meta-paths that reveal task-specific semantics.
d.2. Analysis for Micro-aggregation
As shown in Figure 8, the results of comparison between micro-aggregation vary greatly across datasets. The GCNConv has gained significant advantages on datasets HGBl-amazon and HGBl-LastFM. The GATConv performs best on the two datasets. The SOTA model GINConv for graph-level tasks can also stand out in one dataset HGBn-DBLP here. It confirms that there is no single GNN model can perform well in all situations.
Appendix E Evaluation of condensed design space
e.1. Evaluation of Different Design Spaces
Plotting ranking with controlled random search can only work in the same design space, and is not suitable for evaluation across different design spaces. Therefore, we plot for each design space the empirical distribution function (EDF) (Radosavovic et al., 2019): given configurations and their respective scores , EDF is defined as
EDF essentially tells the probability of a random hyperparameter configuration thatcannot achieve a given performance metric. Therefore, with -axis being the performance metric and -axis being the probability, an EDF curve closer to the lower-right corner indicates that a random configuration is more likely to get a better result.
Though the condensed design space in GraphGym is small enough to perform a full grid search for GNNs, it is not so much suitable for HGNNs due to more complicated HGNN models. To verify the effectiveness of our condensed design space, we compare it with the original design space and the condensed design from GraphGym (You et al., 2020). We randomly search 100 designs in three spaces, respectively.
As shown in Figure 9
, our condensed design space outperforms the others. Specifically, the original design space has many bad choices (i.e., optimizer with SGD) and performs worst in the distribution estimates. On the other hand, the best design in the original design space is competitive, but at a much higher search cost. Besides, the better performance in our condensed design space compared with GraphGym shows that we cannot simply transfer the design space condensed from homogeneous graphs to HGNNs, and specific condensation is required.
e.2. Comparison with GraphNAS
We also apply a GNN neural architecture search method GraphNAS (Gao et al., 2019) in the original design space as a comparison. The NAS is to find the best architecture, so we only report the best performance of GraphNAS in Figure 9. Though GraphNAS outperforms in HGBn-DBLP and gains excellent performance in HGBl-ACM and HGBl-IMDB, it performs worst in other datasets. So compared with GraphNAS, our design space has more significant advantages in robustness and stability. We think that we need a more advanced NAS method (i.e., DiffMG (Ding et al., 2021)) for our design space in future work.