Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching

Graph neural networks (GNNs) and message passing neural networks (MPNNs) have been proven to be expressive for subgraph structures in many applications. Some applications in heterogeneous graphs require explicit edge modeling, such as subgraph isomorphism counting and matching. However, existing message passing mechanisms are not designed well in theory. In this paper, we start from a particular edge-to-vertex transform and exploit the isomorphism property in the edge-to-vertex dual graphs. We prove that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation, we propose dual message passing neural networks (DMPNNs) to enhance the substructure representation learning in an asynchronous way for subgraph isomorphism counting and matching as well as unsupervised node classification. Extensive experiments demonstrate the robust performance of DMPNNs by combining both node and edge representation learning in synthetic and real heterogeneous graphs. Code is available at


page 1

page 2

page 3

page 4


Twin Weisfeiler-Lehman: High Expressive GNNs for Graph Classification

The expressive power of message passing GNNs is upper-bounded by Weisfei...

Boosting Graph Structure Learning with Dummy Nodes

With the development of graph kernels and graph representation learning,...

UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

Hypergraph, an expressive structure with flexibility to model the higher...

Edge Representation Learning with Hypergraphs

Graph neural networks have recently achieved remarkable success in repre...

Deep Graph Matching Consensus

This work presents a two-stage neural architecture for learning and refi...

Collaboration-Aware Graph Convolutional Networks for Recommendation Systems

By virtue of the message-passing that implicitly injects collaborative e...

Pathfinder Discovery Networks for Neural Message Passing

In this work we propose Pathfinder Discovery Networks (PDNs), a method f...


Graphs have been widely used in various applications across domains from chemoinformatics to social networks. The isomorphism is one of the important properties in graphs, and analysis on subgraph isomorphisms is useful in real applications. For example, we can determine the properties of compounds by finding functional group information in chemical molecules gilmer2017neural; some sub-structures in social networks are regarded as irreplaceable features in recommender systems ying2018graph. The challenge of finding subgraph isomorphisms requires the exponential computational cost. Particularly, finding and counting require global inference to oversee the whole graph. Existing counting and matching algorithms are designed for some query patterns up to a certain size (e.g., 5), and some of them cannot directly apply to heterogeneous graphs where vertices and edges are labeled with types bhattarai2019ceci; sun2020in.

There has been more attention to using deep learning to count or match subgraph isomorphisms.

liu2020neural (liu2020neural) designed a general end-to-end framework to predict the number of subgraph isomorphisms on heterogeneous graphs, and ying2020neural (ying2020neural) combined node embeddings and voting to match subgraphs. They found that neural networks could speed up 10 to 1,000 times compared with traditional searching algorithms. xu2019how (xu2019how) and morris2019weisfeiler (morris2019weisfeiler) showed that graph neural networks (GNNs) based on message passing are at most as powerful as the WL test weisfeiler1968reduction, and chen2020can (chen2020can) further analyzed the upper-bound of message passing and -WL for subgraph isomorphism counting. These studies show that it is theoretically possible for neural methods to count larger patterns in complex graphs. In heterogeneous graphs, edges play an important role in checking and searching isomorphisms because graph isomorphisms require taking account of graph adjacency and edge types. However, existing message passing mechanisms have not paid enough attention to edge representations gilmer2017neural; schlichtkrull2018modeling; vashishth2020composition; jin2021power.

In this paper, we discuss a particular edge-to-vertex transform and find the one-to-one correspondence between subgraph isomorphisms of original graphs and subgraph isomorphisms of their corresponding edge-to-vertex dual graphs. This property suggests that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation and the theoretical guarantee, we propose new dual message passing networks (DMPNNs) to learn node and edge representations simultaneously in the aligned space. Empirical results show the effectiveness of DMPNNs on all homogenerous and heterogeneous graphs, synthetic data or real-life data.

Our main contributions are summarized as follows:

  1. We prove that there is a one-to-one correspondence between isomorphisms of connected directed heterogeneous multi-graphs with reversed edges and isomorphisms between their edge-to-vertex dual graphs.

  2. We propose dual message passing mechanism and design the DMPNN model to explicitly model edges and align node and edge representations in the same space.

  3. We empirically demonstrate that DMPNNs can count subgraph isomorphisms more accurately and match isomorphic nodes more correctly. DMPNNs also surpass competitive baselines on unsupervised node classification, indicating the necessity of explicit edge modeling for general graph representation learning.

(a) Isomorphism
(b) Subgraph isomorphism
(c) Edge-to-vertex transform (undirected)
(d) Edge-to-vertex transform (directed)
Figure 1: Examples of the isomorphism, subgraph isomorphism, and edge-to-vertex transforms.


To be more general, we assume a graph is a directed heterogeneous multigraph. Let be a graph with a vertex set and each vertex with a different vertex id, an edge set , a label function that maps a vertex to a vertex label, and a label function that maps an edge to a set of edge labels. As we regard each edge can be associated with a set of labels, we can merge multiple edges with the same source and the same target as one edge with multiple labels. A subgraph of , denoted as , is any graph with , satisfying and . To simplify the statement, we let if .

Isomorphisms and Subgraph Isomorphisms

Definition 1 (Isomorphism).

A graph is isomorphic to a graph if there is a bijection such that: , , , .

We write for such isomorphic property and name as an isomorphism. For example, there are two different isomorphisms between the two triangles in Figure 0(a). As a special case, the isomorphism between two empty graphs without any vertex is .

In addition, if a subgraph of is isomorphic to another graph, then the corresponding bijection function is named as a subgraph isomorphism. The formal definition is:

Definition 2 (Subgraph isomorphism).

If a subgraph of is isomorphic to a graph with a bijection , we say contains a subgraph isomorphic to and name as a subgraph isomorphism.

Subgraph isomorphism related problems commonly refer to two kinds of subgraphs: node-induced subgraphs and edge-induced subgraphs. In node-induced subgraph related problems, the possible subgraphs require that for each vertex in , the associated edges in must appear in , i.e., ; in edge-induced subgraph related problems, the required subgraphs are restricted by associating vertices that are incident to edges, i.e., , . Node-induced subgraphs are specific edge-induced subgraphs when is connected. Hence, we assume all subgraphs mentioned in the following are edge-induced for better generalization. Figure 0(b) shows an example of subgraph isomorphism that a graph with four vertices is subgraph isomorphic to the triangle pattern.

Edge-to-vertex Transforms

In graph theory, the line graph of an undirected graph is another undirected graph that represents the adjacencies between edges of , e.g., Figure 0(c). We extend line graphs to directed heterogeneous multigraphs.

Definition 3 (Edge-to-vertex transform).

A line graph (also known as edge-to-vertex dual graph) of a graph is obtained by associating a vertex with each edge and connecting two vertices with an edge from to if and only if the destination of the corresponding edge is exact the source of . Formally, we have: , , , .

We call the bijection as the edge-to-vertex map, and write as where corresponds to the edge-to-vertex transform. There are several differences between undirected line graphs and directed line graphs. As shown in Figure 0(c) and Figure 0(d), except directions of edges, an edge with its inverse in the original graph will introduce two corresponding vertices and a pair of reversed edges in between in the line graph.

There are many properties in the edge-to-vertex graph. As the vertices of the line graph corresponds to the edges of the original graph , some properties of that depend only on adjacency between edges may be preserved as equivalent properties in that depend on adjacency between vertices. For example, an independent set in corresponds to a matching (also known as independent edge set) in . But the edge-to-vertex transform may lose the information of the original graph. For example, two different graphs may have the same line graph. We have one observation that if two graphs are isomorphic, their line graphs are also isomorphic; nevertheless, the converse is not always correct. We will discuss the isomorphism and the edge-to-vertex transform in the next section.

Isomorphisms vs. Edge-to-vertex Transforms

The edge-to-vertex transform can preserve adjacency relevant properties of graphs. In this section, we discuss isomorphisms and the edge-to-vertex transform. Particularly, we analyze the symmetry of isomorphisms in special situations transforming edges to vertices, and we further extend all graphs into this particular kind of structure for searching.

Proposition 4.

If two graphs and are isomorphic with an isomorphism , then their line graphs and are also isomorphic with an isomorphism such that and .

The proof is shown in Appendix A. Furthermore, we conclude that the dual isomorphism satisfies , . We denote for Proposition 4.

The relation between the isomorphism and its dual is non-injective: two line graphs in Figure 1(a) are isomorphic but their original graphs are not, which also indicates may correspond to multiple different (even does not exist). That is to say, the edge-to-vertex transform cannot remain all graph adjacency and guarantee isomorphisms in some situations.

Theorem 5 (Whitney isomorphism theorem).

For connected simple graphs with more than four vertices, there is a one-to-one correspondence between isomorphisms of the graphs and isomorphisms of their line graphs.

Theorem 5 whitney1932congruent concludes the condition for simple graphs. Inspired by it, we add reversed edges associated with special labels for directed graphs so that graphs can be regarded as undirected (Figure 1(b)). Theorem 6 is the extension for directed heterogeneous multigraphs.

Theorem 6.

For connected directed heterogeneous multigraphs with reversed edges (the reverse of one self-loop is itself), there is a one-to-one correspondence between isomorphisms of the graphs and isomorphisms of their line graphs.

The detailed proof is listed in Appendix B. Moreover, we have Corollary 7 for subgraph isomorphisms and their duals.

Corollary 7.

For connect directed heterogeneous multigraphs with reversed edges more than one vertex, there is a one-to-one correspondence between subgraph isomorphisms of the graphs and subgraph isomorphisms of their line graphs.

(a) Non-injective case
(b) Adding reversed edges
Figure 2: Non-isomorphic graphs and their line graphs.

Dual Message Passing Neural Networks

The edge-to-vertex transform and the duality property indicate that searching isomorphisms on the original graph is equivalent to searching on its line graph. Hence, we design the dual message passing to model nodes with original structure and model edges with the line graph structure. Moreover, we extend the dual message passing to heterogeneous multi-graphs.

Conventional Graph Convolutions

kipf2017semi (kipf2017semi) proposed parameterized conventional graph convolutions as the first-order approximation of spectral convolutions , where is the filter in the Fourier domain and

is the scalar feature vector for

vertices of . In practice,

is a diagonal matrix as a function of eigenvalues of the (normalized) graph Laplacian. Considering the computational cost of eigendecomposition is

, it is approximated by shifted Chebyshev polynomials hammond2011wavelets:
where is the diagonal matrix of eigenvalues,

is an identity matrix,

is the largest eigenvalue so that the input of is located in . Therefore, the convolution becomes to
where is the (normalized) graph Laplacian matrix. is bounded by if the Laplacian or by if the Laplacian is noramlzied as , where is the adjacency matrix and corresponds to the number of edges from vertex to vertex , is the degree diagonal matrix and is the (out-)degree of vertex  zhang2011laplacian. Graph convolution networks have shown great success in many fields, including node classification, graph property prediction, graph isomorphism test, and subgraph isomorphism counting. xu2019how (xu2019how) and liu2020neural (liu2020neural) found that the sum aggregation is good at capturing structural information and solving isomorphism problems. Hence, we consider to use the unnormalized graph Laplacian and set .

Dual Message Passing Mechanism

This convolution can also apply on the line graph , then convolutional operation in is

where is the filter for , is the scalar feature vector for vertices of , and is the largest eigenvalue of the Laplacian , which is no greater than . We can use Eq. (3) to acquire the edge representations of because Definition 3 and Corollary 7 show the line graph also preserves the structural information of for subgraph isomorphisms.

However, Eq. (3) results in a new problem: the computation cost is linear to where . To tackle this issue, we combine the two convolutions in an asynchronous manner in .

Proposition 8.

If is a directed graph with vertices and edges, then , where is the adjacency matrix, are the out-degree and in-degree diagonal matrices respectively, and is the oriented incidence matrix where if vertex is the destination of edge , if is the source of , otherwise. In particular, if is with reversed edges, then we have , where is the Laplacian matrix.

Proposition 9.

If is a directed graph with vertices and edges and is the line graph of , then , where is the adjacency matrix of , is an identity matrix, and is the unoriented incidence matrix where if vertex is incident to edge , otherwise. In particular, if is with reversed edges, then is also with reversed edges and . Furthermore, we have , where is the Laplacian matrix of .

We use Proposition 8 to inspect the graph convolutions. The second term of Eq. (2) can be written as , and corresponds to the computation in the edge space. We can design a better filter to replace this subtraction operation so that , where is the result of some specific computation in the edge space, which is straightforward to involve Eq. (3). We are able to generalize Eq. (3) by the same idea, but it does not help to reduce the complexity. The second term of Eq. (3) is equivalent to obtained from Proposition 9. Moreover, corresponds to the computation . We can also enhance this computation by introducing , e.g., . We can get the degree matrix without constructing the line graph because it depends on the vertex degrees of : . We manually set .

Finally, the asynchronous updates are defined as follows:

(4) (5)
where and indicate the parameters at the -th update and and are the updated results. The computation of and the computation of are linear to the number of edges with the help of sparse representations for and .

Heterogeneous Multi-graph Extensions

Different relational message passing variants have been proposed to model heterogeneous graphs. Nevertheless, our dual message passing is natural to handle complex edge types and even edge features. Each edge not only carries the edge-level property, but also stores the local structural information in the corresponding line graph. However, Eq. (5) does not reflect the edge direction since regards the source and the target of one edge as the same. Therefore, we extend Eqs. (4-5) and propose dual message passing neural networks (DMPNNs) to support the mixture of various properties:

(6) (7)
where and are -dim hidden states of nodes and edges in the -th DMPNN layer. and are initialized with features, labels, and other properties, eliminates out-edges, filters out in-edges, and and are trainable parameters that are initialized bounded by and , respectively. For the detailed explanations and reparameterization tricks, see Appendix C. After updates, we finally get -dim node and edge representations and in the aligned space.


We evaluate DMPNNs on the challenging subgraph isomorphism counting and matching tasks.

Besides, we also learn embeddings and classify nodes without any label or attribute on heterogeneous graphs to verify the generalization and the necessity of explicit edge modeling.

Training and testing of DMPNNs and baselines were conducted

on single NVIDIA V100 GPU under PyTorch 

paszke2019pytorch and DGL wang2019dgl frameworks.

Subgraph Isomorphism Counting and Matching

DMPNNs are designed based on the duality of isomorphisms so that evaluation on isomorphism related tasks is the most straightforward. Given a pair of pattern and graph , subgraph isomorphism counting aims to count all different subgraph isomorphisms in , and matching aims to seek out which nodes and edges belong to those isomorphic subgraphs. We report the root mean square error (RMSE) and the mean absolute error (MAE) between global counting predictions and the ground truth, and evaluate graph edit distance (GED) between predicted subgraphs and all isomorphic subgraphs. However, computing GED is NP-hard, so we consider the lower-bound of GED in contiguous space. We use DMPNN and baselines to predict the possible frequency of each node or edge appearing in isomorphic subgraphs. For example, models are expected to return for nodes and for edges given the pair in Figure 0(a), and return for nodes and for edges given Figure 0(b). MAE between node predictions and node frequencies or the MAE between edge predictions and edge frequencies is regarded as the lower-bound of GED. We run experiments on three different seeds and report the best.


We compare with three sequence models and three graph models, including CNN kim2014convolutional, LSTM hochreiter1997long, TXL transformerxl2019dai, RGCN schlichtkrull2018modeling, RGIN liu2020neural, and CompGCN vashishth2020composition. Sequence models embed edges, and we calculate the MAE over edges as the GED. On the contrary, graph models embed nodes so that we consider the MAE over nodes. We jointly train counting and matching prediction modules of DMPNN and other graph baselines:

where is the dataset containing pattern-graph pairs, indicates the ground truth of number of subgraph isomorphisms between pattern and graph , indicates the frequency of vertex appearing in isomorphisms, and are the corresponding predictions. For sequence models, we jointly minimize the MSE of counting predictions and the MSE of edge predictions. We follow the same setting of liu2020neural (liu2020neural) to combine multi-hot encoding and message passing to embed graphs and use pooling operations to make predictions:
where and are trainable matrices to align id and label representations to the same dimension. We also consider the more powerful Deep-LRP chen2020can and add local relational pooling behind dual message passing for node representation learning, denoted as DMPNN-LRP. For a fair comparison, we use 3-layer networks and set the embedding dimensions, hidden sizes, and numbers of filters as 64 for all models. We follow the original setting of Deep-LRP to use 3-truncated BFS. Considering the quadratic computation complexity of TXL, we set the segment size and memory size as 128. All models are trained using AdamW loshchilov2019 with a learning rate 1e-3 and a decay 1e-5.


Table 1 shows the statistics of two synthetic homogeneous datasets with 3-stars, triangles, tailed triangles, and chordal cycles as patterns  chen2020can, one synthetic heterogeneous dataset with 75 random patterns,111This Complex dataset corresponds to the Small dataset in the original paper. But we found some ground truth counts are not correct because VF2 does not check self-loops. We removed all self-loops from patterns and graphs and got the correct ground truth. and one mutagenic compound dataset MUTAG with 24 patterns liu2020neural. In traditional algorithms, adding reversed edges increases the search space dramatically, but it does not take too much extra time on neural methods. Thus, we also conduct experiments on patterns and graphs with reversed edges associated with specific edge labels , which doubles the number of edges and the number of edge labels.

Erdős-Renyi Regular Complex MUTAG
#train 6,000 6,000 358,512 1,488
#valid 4,000 4,000 44,814 1,512
#test 10,000 10,000 44,814 1,512
Max Avg Max Avg Max Avg Max Avg
4 3.80.4 4 3.80.4 8 5.22.1 4 3.50.5
10 7.51.7 10 7.51.7 8 5.92.0 3 2.50.5
1 10 1 10 8 3.41.9 2 1.50.5
1 10 1 10 8 3.82.0 2 1.50.5
10 100 30 18.87.4 64 32.621.2 28 17.94.6
48 27.06.1 90 62.717.9 256 73.666.8 66 39.611.4
1 10 1 10 16 9.04.8 7 3.30.8
1 10 1 10 16 9.44.7 4 3.00.1
Table 1: Statistics of datasets on subgraph isomorphism experiments. and corresponds to patterns and graphs.
Models Homogeneous Heterogeneous
Erdős-Renyi Regular Complex MUTAG
Zero 92.532 51.655 201.852 198.218 121.647 478.990 68.460 14.827 86.661 16.336 6.509 15.462
Avg 121.388 131.007 237.349 156.515 127.211 576.476 66.836 23.882 156.095 14.998 10.036 27.958
CNN 20.386 13.316 NA 37.192 27.268 NA 41.711 7.898 NA 1.789 0.734 NA
LSTM 14.561 9.949 160.951 14.169 10.064 234.351 30.496 6.839 88.739 1.285 0.520 3.873
TXL 10.861 7.105 116.810 15.263 10.721 208.798 43.055 9.576 98.124 1.895 0.830 4.618
RGCN 9.386 5.829 28.963 14.789 9.772 70.746 28.601 9.386 64.122 0.777 0.334 1.441
RGIN 6.063 3.712 22.155 13.554 8.580 56.353 20.893 4.411 56.263 0.273 0.082 0.329
CompGCN 6.706 4.274 25.548 14.174 9.685 64.677 22.287 5.127 57.082 0.300 0.085 0.278
DMPNN 5.062 3.054 23.411 11.980 7.832 56.222 17.842 3.592 38.322 0.226 0.079 0.244
Deep-LRP 0.794 0.436 2.571 1.373 0.788 5.432 27.490 5.850 56.772 0.260 0.094 0.437
DMPNN-LRP 0.475 0.287 1.538 0.617 0.422 2.745 17.391 3.431 35.795 0.173 0.053 0.190
Table 2: Performance on subgraph isomorphism counting and matching.
Models Complex MUTAG


w/o rev 41.711 7.898 NA 1.789 0.734 NA
w/ rev 47.467 10.128 NA 2.073 0.865 NA


w/o rev 30.496 6.839 88.739 1.285 0.520 3.873
w/ rev 32.178 7.575 90.718 1.776 0.835 5.744


w/o rev 43.055 9.576 98.124 1.895 0.830 4.618
w/ rev 37.251 9.156 95.887 2.701 1.175 6.436


w/o rev 28.601 9.386 64.122 0.777 0.334 1.441
w/ rev 26.359 7.131 49.495 0.511 0.200 1.628


w/o rev 20.893 4.411 56.263 0.273 0.082 0.329
w/ rev 20.132 4.126 39.726 0.247 0.091 0.410


w/o rev 22.287 5.127 57.082 0.300 0.085 0.278
w/ rev 19.072 4.607 40.029 0.268 0.072 0.266


w/o rev 18.974 3.922 56.933 0.232 0.088 0.320
w/ rev 17.842 3.592 38.322 0.226 0.079 0.244


w/o rev 27.490 5.850 56.772 0.260 0.094 0.437
w/ rev 26.297 5.725 61.696 0.290 0.108 0.466


w/o rev 20.425 4.173 42.200 0.196 0.062 0.210
w/ rev 17.391 3.431 35.795 0.173 0.053 0.190
Table 3: Performance comparison after introducing reversed edges on heterogeneous data.
Models MUTAG Regular Complex


MTL 1.285 0.520 14.169 10.064 30.496 6.839
STL -0.003 +0.030 +0.159 -0.029 -1.355 -0.096


MTL 1.895 0.830 14.306 10.143 37.251 9.156
STL -0.128 -0.041 +1.487 +1.211 -5.671 -2.067


MTL 0.511 0.200 14.652 9.911 26.359 7.131
STL +0.202 +0.090 +0.348 -0.269 +1.686 +0.460


MTL 0.247 0.091 13.128 8.412 20.132 4.126
STL +0.053 +0.004 +1.119 +1.019 +1.804 +0.068


MTL 0.268 0.072 14.174 9.685 19.072 4.607
STL +0.088 +0.086 +0.252 +0.738 +3.625 +0.260


MTL 0.226 0.079 11.980 7.832 17.842 3.592
STL +0.011 +0.001 +0.318 +0.097 +3.604 +0.865


MTL 0.260 0.094 1.275 0.731 26.297 5.725
STL +0.099 +0.044 +0.036 +0.035 +3.753 +0.886


MTL 0.173 0.053 0.617 0.422 17.391 3.431
STL +0.040 +0.020 +0.513 +0.252 +4.263 +0.928
Table 4: Performance comparison in multi-task training (MTL) and single-task training (STL) on subgraph isomorphism counting. We report best results of whether adding reversed edges or not, and error increases are underlined.


Counting and matching results are reported in Table 2. We find graph models perform better than sequence models, and DMPNN almost surpasses all message passing based networks in counting and matching. RGIN extends RGCN with the sum aggregator followed by an MLP to makes full use of the neighborhood information, and it improves the original RGCN significantly. CompGCN is designed to leverage vertex-edge composition operations to predict the potential links, which is contrary to the goal of accurate matching. On the contrary, DMPNN learns both node embeddings and edge embeddings in aligned space but from different but dual structures. We also observe local relational pooling can significantly decrease errors on homogeneous data by explicitly permuting neighbor subsets. But Deep-LRP is designed for patterns within three nodes and simple graphs so that it cannot handle multi-edges in nature, let along complex structures in randomly generated data and real-life data. One advantage of DMPNN is to model heterogeneous nodes and edges in the same space. We can see the success of DMPNN-LRP in three datasets with the maximum pattern size 4. But it struggles on the Complex dataset where patterns contain at most 8 nodes.

We also evaluate baselines with additional reversed edges on Complex and MUTAG datasets. From results in Table 3, we see graph convolutions consistently reduce errors with reversed edges, but sequence models usually become worse. LRP is designed for simple graphs so that it cannot handle heterogeneous edges in nature, but DMPNN makes it generalized. This observation also indicates that one of the challenges on neural subgraph isomorphism counting and matching is the complex graph local structure instead of the number of edges in graphs; otherwise, revised edges were toxic. We compare the efficiency in Appendix D.

In the joint learning, we hope models can learn the mutual supervision that node weights determine the global count and the global count is the upper bound of node weights. We also conduct experiments on single task learning to examine whether models can benefit from this mutual supervision. As shown in Table 4, graph models consistently achieve further performance gains from multi-task learning, while sequence models cannot. Moreover, improvement is more notable if the dataset is more complicated, e.g., patterns with more edges and graphs with non-trivial structures.

Unattributed Unsupervised Node Classification

Unattributed unsupervised node classification focuses on local structures instead of node features and attributes. Node embeddings are learned with the link prediction loss, then linear support vector machines are trained based on 80% of labeled node embeddings

to predict the remaining 20%. We report the average Macro-F1 and Micro-F1 on five runs.


We follow the setting of RGCN and CompGCN: graph neural networks first learn the node representations, and then DistMult models embedding2015yang

take pairs of node hidden representations to produce a score for a triplet

, where are the source, the edge type, and the target, respectively. Eq. (9) is the objective function, where is the triplet collection of graph , is the score for , and is one of the negative triplets sampled from by replacing with or with uniformly:
We report the results of KG embedding models, proximity-preserving based embedding methods, graph convolutional networks, and graph attention networks for comparison. We use the same parameter setting as yang2020heterogeneous (yang2020heterogeneous).


Dataset #Label type #Labeled node
PubMed 63,109 244,986 10 8 454
Yelp 82,465 30,542,675 4 16 7,417
Table 5: Statistics of two real-life heterogeneous networks on unattributed unsupervised node classification.

yang2020heterogeneous (yang2020heterogeneous) collected and processed two heterogeneous networks to evaluate graph embedding algorithms. PubMed is a biomedical network constructed by text mining and manual processing where nodes are labeled as one of eight types of diseases; Yelp is a business network where nodes may have multiple labels (businesses, users, locations, and reviews). Statistics are summarized in Table 5.


In Table 6, we observe low F1 scores on both datasets and the difficulty of this task. Traditional KG embedding methods perform very similarly, but graph neural networks vary dramatically. RGCN and RGIN adapt the same relational transformations, but RGIN surpasses RGCN because of sum aggregation and MLPs. HAN and MAGNN explicitly learn the node representations from meta-paths and meta-path neighbors, but these models are evidently easy to overfit to training data because they predict the connectivity with the leaky edge type information. On the contrary, CompGCN and HGT obtain better scores since CompGCN incorporates semantics by node-relation composition, and HGT captures semantic relations and injects edge dependencies by relation-specific matrices. Our DMPNN outperforms all baselines by asynchronously learning node embeddings and edge representations in the same aligned space. Even for the challenging 16-way multi-label classification, DMPNN also works without any node attributes.

Models PubMed Yelp
Macro-F1 Micro-F1 Macro-F1 Micro-F1

TransE bordes2013translating

11.40 15.16 5.05 23.03

DistMult embedding2015yang

11.27 15.79 5.04 23.00

ConvE dettmers2018convolutional

13.00 14.49 5.09 23.02

metapath2vec dong2017metapath2vec

12.90 15.51 5.16 23.32

HIN2vec fu2017hin2vec

10.93 15.31 5.12 23.25

HEER shi2018easing

11.73 15.29 5.03 22.92

RGCN schlichtkrull2018modeling

10.75 12.73 5.10 23.24

RGIN liu2020neural

12.22 15.41 5.14 23.82

CompGCN vashishth2020composition

13.89 21.13 5.09 23.96

HAN wang2019heterogeneous

9.54 12.18 5.10 23.24

MAGNN fu2020magnn

10.30 12.60 5.10 23.24

HGT hu2020heterogeneous

11.24 18.72 5.07 23.12


16.54 23.13 12.74 29.12
Table 6: F1 scores (%) on unattributed unsupervised node classification. Results of are taken from  yang2020heterogeneous.

Related Work

The isomorphism search aims to find all bijections between two graphs. The subgraph isomorphism search is more challenging, and it has been proven to be an NP-complete problem. Most subgraph isomorphism algorithms are based on backtracking or graph-index ullmann1976an; he2008graphs. However, these algorithms are hard to be applied to complex patterns and large data graphs. The search space of backtracking methods grows exponentially, and the latter requires a large quantity of disk space to index. Some methods introduce weak rules to reduce search space in most cases, such as candidate region filtering, partial matching enumeration, and ordering carletti2018challenging. On the other hand, there are many approximate techniques for subgraph counting, such as path sampling jha2015path and color coding bressan2019bressan. But most approaches are hard to generalize to complex heterogeneous multi-graphs sun2020in.

In recent years, graph neural networks (GNNs) and message passing networks (MPNNs) have achieved success in graph data modeling. There are also some discussions about isomorphisms. xu2019how (xu2019how) and morris2019weisfeiler (morris2019weisfeiler) showed that neighborhood-aggregation schemes are as stronger as Weisfeiler-Leman (1-WL) test. chen2020can (chen2020can) proved that -WL cannot count all patterns more than nodes accurately, but the bound of iterations of -WL grows quickly to . These conclusions encourage researchers to empower message passing and explore the possibilities of neural subgraph counting. Empirically, liu2020neural (liu2020neural) combined graph encoding and dynamic memory networks to count subgraph isomorphisms in an end-to-end way. They showed the memory with linear-complexity read-write operations can significantly improve all encoding models. A more challenging problem is subgraph isomorphism matching. NeuralMatch ying2020neural utilizes neural methods and a voting method to detect subgraph matching. However, it only returns whether one pattern is included in the data graph instead of specific isomorphisms. Neural subgraph matching is still under discussion. Besides, graph learning also applies on maximum common subgraph detection bai2021glsearch, providing another possible solution for isomorohisms.


In this paper, we theoretically analyze the connection between the edge-to-vertex transform and the duality of isomorphisms in heterogeneous multi-graphs. We design dual message passing neural networks (DMPNNs) based on the equivalence of isomorphism searching over original graphs and line graphs. Experiments on subgraph isomorphism counting and matching as well as unsupervised node classification support our theoretical exposition and demonstrate effectiveness. We also see huge performance boost in small patterns by stacking dual message passing and local relational pooling. We defer a better integration as future work.


The authors of this paper were supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20) and the GRF (16211520) from RGC of Hong Kong, the MHKJFS (MHP/001/19) from ITC of Hong Kong with special thanks to HKMAAC and CUSBLT, and the Jiangsu Province Science and Technology Collaboration Fund (BZ2021065). We thank Dr. Xin Jiang for his valuable comments and the Gift Fund from Huawei Noah’s Ark Lab.


Appendix A Appendix A  Proof of Proposition 4


Assume the line graph is transformed from by and the line graph is transformed from by , then , , . Moreover, based on the isomorphism , we get
, . We find the bijection mapping to for any , which is a bijection from to .

Similarly, from the two necessary conditions of isomorphism about with and with and the definition , we have
, . Therefore, we conclude that the two line graphs and are isomorphic, where the dual isomorphism satisfies , . We denote . ∎

Figure 3: Simple directed unlabeled graphs with reversed edges and no more than four nodes, and their corresponding line graphs. We observe that the number of isomorphisms of two graphs equals to the number of isomorphisms of their line graphs.

Appendix B Appendix B  Proof of Theorem 6


Assume and are two connected directed heterogeneous multigraphs with reversed edges and their isomorphisms are , and are their line graphs with isomorphisms , Let , , , . To prove Theorem 6, we show .

The first step is to prove is equivalent to . The necessary conditions of are and ; the necessary conditions of are and . We know that