Graphs have been widely used in various applications across domains from chemoinformatics to social networks. The isomorphism is one of the important properties in graphs, and analysis on subgraph isomorphisms is useful in real applications. For example, we can determine the properties of compounds by finding functional group information in chemical molecules gilmer2017neural; some sub-structures in social networks are regarded as irreplaceable features in recommender systems ying2018graph. The challenge of finding subgraph isomorphisms requires the exponential computational cost. Particularly, finding and counting require global inference to oversee the whole graph. Existing counting and matching algorithms are designed for some query patterns up to a certain size (e.g., 5), and some of them cannot directly apply to heterogeneous graphs where vertices and edges are labeled with types bhattarai2019ceci; sun2020in.
There has been more attention to using deep learning to count or match subgraph isomorphisms.liu2020neural (liu2020neural) designed a general end-to-end framework to predict the number of subgraph isomorphisms on heterogeneous graphs, and ying2020neural (ying2020neural) combined node embeddings and voting to match subgraphs. They found that neural networks could speed up 10 to 1,000 times compared with traditional searching algorithms. xu2019how (xu2019how) and morris2019weisfeiler (morris2019weisfeiler) showed that graph neural networks (GNNs) based on message passing are at most as powerful as the WL test weisfeiler1968reduction, and chen2020can (chen2020can) further analyzed the upper-bound of message passing and -WL for subgraph isomorphism counting. These studies show that it is theoretically possible for neural methods to count larger patterns in complex graphs. In heterogeneous graphs, edges play an important role in checking and searching isomorphisms because graph isomorphisms require taking account of graph adjacency and edge types. However, existing message passing mechanisms have not paid enough attention to edge representations gilmer2017neural; schlichtkrull2018modeling; vashishth2020composition; jin2021power.
In this paper, we discuss a particular edge-to-vertex transform and find the one-to-one correspondence between subgraph isomorphisms of original graphs and subgraph isomorphisms of their corresponding edge-to-vertex dual graphs. This property suggests that searching isomorphisms on the original graph is equivalent to searching on its dual graph. Based on this observation and the theoretical guarantee, we propose new dual message passing networks (DMPNNs) to learn node and edge representations simultaneously in the aligned space. Empirical results show the effectiveness of DMPNNs on all homogenerous and heterogeneous graphs, synthetic data or real-life data.
Our main contributions are summarized as follows:
We prove that there is a one-to-one correspondence between isomorphisms of connected directed heterogeneous multi-graphs with reversed edges and isomorphisms between their edge-to-vertex dual graphs.
We propose dual message passing mechanism and design the DMPNN model to explicitly model edges and align node and edge representations in the same space.
We empirically demonstrate that DMPNNs can count subgraph isomorphisms more accurately and match isomorphic nodes more correctly. DMPNNs also surpass competitive baselines on unsupervised node classification, indicating the necessity of explicit edge modeling for general graph representation learning.
To be more general, we assume a graph is a directed heterogeneous multigraph. Let be a graph with a vertex set and each vertex with a different vertex id, an edge set , a label function that maps a vertex to a vertex label, and a label function that maps an edge to a set of edge labels. As we regard each edge can be associated with a set of labels, we can merge multiple edges with the same source and the same target as one edge with multiple labels. A subgraph of , denoted as , is any graph with , satisfying and . To simplify the statement, we let if .
Isomorphisms and Subgraph Isomorphisms
Definition 1 (Isomorphism).
A graph is isomorphic to a graph if there is a bijection such that: , , , .
We write for such isomorphic property and name as an isomorphism. For example, there are two different isomorphisms between the two triangles in Figure 0(a). As a special case, the isomorphism between two empty graphs without any vertex is .
In addition, if a subgraph of is isomorphic to another graph, then the corresponding bijection function is named as a subgraph isomorphism. The formal definition is:
Definition 2 (Subgraph isomorphism).
If a subgraph of is isomorphic to a graph with a bijection , we say contains a subgraph isomorphic to and name as a subgraph isomorphism.
Subgraph isomorphism related problems commonly refer to two kinds of subgraphs: node-induced subgraphs and edge-induced subgraphs. In node-induced subgraph related problems, the possible subgraphs require that for each vertex in , the associated edges in must appear in , i.e., ; in edge-induced subgraph related problems, the required subgraphs are restricted by associating vertices that are incident to edges, i.e., , . Node-induced subgraphs are specific edge-induced subgraphs when is connected. Hence, we assume all subgraphs mentioned in the following are edge-induced for better generalization. Figure 0(b) shows an example of subgraph isomorphism that a graph with four vertices is subgraph isomorphic to the triangle pattern.
In graph theory, the line graph of an undirected graph is another undirected graph that represents the adjacencies between edges of , e.g., Figure 0(c). We extend line graphs to directed heterogeneous multigraphs.
Definition 3 (Edge-to-vertex transform).
A line graph (also known as edge-to-vertex dual graph) of a graph is obtained by associating a vertex with each edge and connecting two vertices with an edge from to if and only if the destination of the corresponding edge is exact the source of . Formally, we have: , , , .
We call the bijection as the edge-to-vertex map, and write as where corresponds to the edge-to-vertex transform. There are several differences between undirected line graphs and directed line graphs. As shown in Figure 0(c) and Figure 0(d), except directions of edges, an edge with its inverse in the original graph will introduce two corresponding vertices and a pair of reversed edges in between in the line graph.
There are many properties in the edge-to-vertex graph. As the vertices of the line graph corresponds to the edges of the original graph , some properties of that depend only on adjacency between edges may be preserved as equivalent properties in that depend on adjacency between vertices. For example, an independent set in corresponds to a matching (also known as independent edge set) in . But the edge-to-vertex transform may lose the information of the original graph. For example, two different graphs may have the same line graph. We have one observation that if two graphs are isomorphic, their line graphs are also isomorphic; nevertheless, the converse is not always correct. We will discuss the isomorphism and the edge-to-vertex transform in the next section.
Isomorphisms vs. Edge-to-vertex Transforms
The edge-to-vertex transform can preserve adjacency relevant properties of graphs. In this section, we discuss isomorphisms and the edge-to-vertex transform. Particularly, we analyze the symmetry of isomorphisms in special situations transforming edges to vertices, and we further extend all graphs into this particular kind of structure for searching.
If two graphs and are isomorphic with an isomorphism , then their line graphs and are also isomorphic with an isomorphism such that and .
The proof is shown in Appendix A. Furthermore, we conclude that the dual isomorphism satisfies , . We denote for Proposition 4.
The relation between the isomorphism and its dual is non-injective: two line graphs in Figure 1(a) are isomorphic but their original graphs are not, which also indicates may correspond to multiple different (even does not exist). That is to say, the edge-to-vertex transform cannot remain all graph adjacency and guarantee isomorphisms in some situations.
Theorem 5 (Whitney isomorphism theorem).
For connected simple graphs with more than four vertices, there is a one-to-one correspondence between isomorphisms of the graphs and isomorphisms of their line graphs.
Theorem 5 whitney1932congruent concludes the condition for simple graphs. Inspired by it, we add reversed edges associated with special labels for directed graphs so that graphs can be regarded as undirected (Figure 1(b)). Theorem 6 is the extension for directed heterogeneous multigraphs.
For connected directed heterogeneous multigraphs with reversed edges (the reverse of one self-loop is itself), there is a one-to-one correspondence between isomorphisms of the graphs and isomorphisms of their line graphs.
The detailed proof is listed in Appendix B. Moreover, we have Corollary 7 for subgraph isomorphisms and their duals.
For connect directed heterogeneous multigraphs with reversed edges more than one vertex, there is a one-to-one correspondence between subgraph isomorphisms of the graphs and subgraph isomorphisms of their line graphs.
Dual Message Passing Neural Networks
The edge-to-vertex transform and the duality property indicate that searching isomorphisms on the original graph is equivalent to searching on its line graph. Hence, we design the dual message passing to model nodes with original structure and model edges with the line graph structure. Moreover, we extend the dual message passing to heterogeneous multi-graphs.
Conventional Graph Convolutions
kipf2017semi (kipf2017semi) proposed parameterized conventional graph convolutions as the first-order approximation of spectral convolutions , where is the filter in the Fourier domain and
is the scalar feature vector forvertices of . In practice,
is a diagonal matrix as a function of eigenvalues of the (normalized) graph Laplacian. Considering the computational cost of eigendecomposition is, it is approximated by shifted Chebyshev polynomials hammond2011wavelets:
is an identity matrix,is the largest eigenvalue so that the input of is located in . Therefore, the convolution becomes to
Dual Message Passing Mechanism
This convolution can also apply on the line graph , then convolutional operation in is
However, Eq. (3) results in a new problem: the computation cost is linear to where . To tackle this issue, we combine the two convolutions in an asynchronous manner in .
If is a directed graph with vertices and edges, then , where is the adjacency matrix, are the out-degree and in-degree diagonal matrices respectively, and is the oriented incidence matrix where if vertex is the destination of edge , if is the source of , otherwise. In particular, if is with reversed edges, then we have , where is the Laplacian matrix.
If is a directed graph with vertices and edges and is the line graph of , then , where is the adjacency matrix of , is an identity matrix, and is the unoriented incidence matrix where if vertex is incident to edge , otherwise. In particular, if is with reversed edges, then is also with reversed edges and . Furthermore, we have , where is the Laplacian matrix of .
We use Proposition 8 to inspect the graph convolutions. The second term of Eq. (2) can be written as , and corresponds to the computation in the edge space. We can design a better filter to replace this subtraction operation so that , where is the result of some specific computation in the edge space, which is straightforward to involve Eq. (3). We are able to generalize Eq. (3) by the same idea, but it does not help to reduce the complexity. The second term of Eq. (3) is equivalent to obtained from Proposition 9. Moreover, corresponds to the computation . We can also enhance this computation by introducing , e.g., . We can get the degree matrix without constructing the line graph because it depends on the vertex degrees of : . We manually set .
Finally, the asynchronous updates are defined as follows:
Heterogeneous Multi-graph Extensions
Different relational message passing variants have been proposed to model heterogeneous graphs.
Nevertheless, our dual message passing is natural to handle complex edge types and even edge features.
Each edge not only carries the edge-level property, but also stores the local structural information in the corresponding line graph.
However, Eq. (5) does not reflect the edge direction since regards the source and the target of one edge as the same.
Therefore, we extend Eqs. (4-5) and propose dual message passing neural networks (DMPNNs) to support the mixture of various properties:
We evaluate DMPNNs on the challenging subgraph isomorphism counting and matching tasks.
Besides, we also learn embeddings and classify nodes without any label or attribute on heterogeneous graphs to verify the generalization and the necessity of explicit edge modeling.
Besides, we also learn embeddings and classify nodes without any label or attribute on heterogeneous graphs to verify the generalization and the necessity of explicit edge modeling.Training and testing of DMPNNs and baselines were conducted
on single NVIDIA V100 GPU under PyTorchpaszke2019pytorch and DGL wang2019dgl frameworks.
Subgraph Isomorphism Counting and Matching
DMPNNs are designed based on the duality of isomorphisms so that evaluation on isomorphism related tasks is the most straightforward. Given a pair of pattern and graph , subgraph isomorphism counting aims to count all different subgraph isomorphisms in , and matching aims to seek out which nodes and edges belong to those isomorphic subgraphs. We report the root mean square error (RMSE) and the mean absolute error (MAE) between global counting predictions and the ground truth, and evaluate graph edit distance (GED) between predicted subgraphs and all isomorphic subgraphs. However, computing GED is NP-hard, so we consider the lower-bound of GED in contiguous space. We use DMPNN and baselines to predict the possible frequency of each node or edge appearing in isomorphic subgraphs. For example, models are expected to return for nodes and for edges given the pair in Figure 0(a), and return for nodes and for edges given Figure 0(b). MAE between node predictions and node frequencies or the MAE between edge predictions and edge frequencies is regarded as the lower-bound of GED. We run experiments on three different seeds and report the best.
We compare with three sequence models and three graph models, including CNN kim2014convolutional, LSTM hochreiter1997long, TXL transformerxl2019dai, RGCN schlichtkrull2018modeling, RGIN liu2020neural, and CompGCN vashishth2020composition.
Sequence models embed edges, and we calculate the MAE over edges as the GED.
On the contrary, graph models embed nodes so that we consider the MAE over nodes.
We jointly train counting and matching prediction modules of DMPNN and other graph baselines:
Table 1 shows the statistics of two synthetic homogeneous datasets with 3-stars, triangles, tailed triangles, and chordal cycles as patterns chen2020can, one synthetic heterogeneous dataset with 75 random patterns,111This Complex dataset corresponds to the Small dataset in the original paper. But we found some ground truth counts are not correct because VF2 does not check self-loops. We removed all self-loops from patterns and graphs and got the correct ground truth. and one mutagenic compound dataset MUTAG with 24 patterns liu2020neural. In traditional algorithms, adding reversed edges increases the search space dramatically, but it does not take too much extra time on neural methods. Thus, we also conduct experiments on patterns and graphs with reversed edges associated with specific edge labels , which doubles the number of edges and the number of edge labels.
Counting and matching results are reported in Table 2. We find graph models perform better than sequence models, and DMPNN almost surpasses all message passing based networks in counting and matching. RGIN extends RGCN with the sum aggregator followed by an MLP to makes full use of the neighborhood information, and it improves the original RGCN significantly. CompGCN is designed to leverage vertex-edge composition operations to predict the potential links, which is contrary to the goal of accurate matching. On the contrary, DMPNN learns both node embeddings and edge embeddings in aligned space but from different but dual structures. We also observe local relational pooling can significantly decrease errors on homogeneous data by explicitly permuting neighbor subsets. But Deep-LRP is designed for patterns within three nodes and simple graphs so that it cannot handle multi-edges in nature, let along complex structures in randomly generated data and real-life data. One advantage of DMPNN is to model heterogeneous nodes and edges in the same space. We can see the success of DMPNN-LRP in three datasets with the maximum pattern size 4. But it struggles on the Complex dataset where patterns contain at most 8 nodes.
We also evaluate baselines with additional reversed edges on Complex and MUTAG datasets. From results in Table 3, we see graph convolutions consistently reduce errors with reversed edges, but sequence models usually become worse. LRP is designed for simple graphs so that it cannot handle heterogeneous edges in nature, but DMPNN makes it generalized. This observation also indicates that one of the challenges on neural subgraph isomorphism counting and matching is the complex graph local structure instead of the number of edges in graphs; otherwise, revised edges were toxic. We compare the efficiency in Appendix D.
In the joint learning, we hope models can learn the mutual supervision that node weights determine the global count and the global count is the upper bound of node weights. We also conduct experiments on single task learning to examine whether models can benefit from this mutual supervision. As shown in Table 4, graph models consistently achieve further performance gains from multi-task learning, while sequence models cannot. Moreover, improvement is more notable if the dataset is more complicated, e.g., patterns with more edges and graphs with non-trivial structures.
Unattributed Unsupervised Node Classification
Unattributed unsupervised node classification focuses on local structures instead of node features and attributes. Node embeddings are learned with the link prediction loss, then linear support vector machines are trained based on 80% of labeled node embeddingsto predict the remaining 20%. We report the average Macro-F1 and Micro-F1 on five runs.
We follow the setting of RGCN and CompGCN: graph neural networks first learn the node representations, and then DistMult models embedding2015yang
take pairs of node hidden representations to produce a score for a triplet, where are the source, the edge type, and the target, respectively. Eq. (9) is the objective function, where is the triplet collection of graph , is the score for , and is one of the negative triplets sampled from by replacing with or with uniformly:
|Dataset||#Label type||#Labeled node|
yang2020heterogeneous (yang2020heterogeneous) collected and processed two heterogeneous networks to evaluate graph embedding algorithms. PubMed is a biomedical network constructed by text mining and manual processing where nodes are labeled as one of eight types of diseases; Yelp is a business network where nodes may have multiple labels (businesses, users, locations, and reviews). Statistics are summarized in Table 5.
In Table 6, we observe low F1 scores on both datasets and the difficulty of this task. Traditional KG embedding methods perform very similarly, but graph neural networks vary dramatically. RGCN and RGIN adapt the same relational transformations, but RGIN surpasses RGCN because of sum aggregation and MLPs. HAN and MAGNN explicitly learn the node representations from meta-paths and meta-path neighbors, but these models are evidently easy to overfit to training data because they predict the connectivity with the leaky edge type information. On the contrary, CompGCN and HGT obtain better scores since CompGCN incorporates semantics by node-relation composition, and HGT captures semantic relations and injects edge dependencies by relation-specific matrices. Our DMPNN outperforms all baselines by asynchronously learning node embeddings and edge representations in the same aligned space. Even for the challenging 16-way multi-label classification, DMPNN also works without any node attributes.
The isomorphism search aims to find all bijections between two graphs. The subgraph isomorphism search is more challenging, and it has been proven to be an NP-complete problem. Most subgraph isomorphism algorithms are based on backtracking or graph-index ullmann1976an; he2008graphs. However, these algorithms are hard to be applied to complex patterns and large data graphs. The search space of backtracking methods grows exponentially, and the latter requires a large quantity of disk space to index. Some methods introduce weak rules to reduce search space in most cases, such as candidate region filtering, partial matching enumeration, and ordering carletti2018challenging. On the other hand, there are many approximate techniques for subgraph counting, such as path sampling jha2015path and color coding bressan2019bressan. But most approaches are hard to generalize to complex heterogeneous multi-graphs sun2020in.
In recent years, graph neural networks (GNNs) and message passing networks (MPNNs) have achieved success in graph data modeling. There are also some discussions about isomorphisms. xu2019how (xu2019how) and morris2019weisfeiler (morris2019weisfeiler) showed that neighborhood-aggregation schemes are as stronger as Weisfeiler-Leman (1-WL) test. chen2020can (chen2020can) proved that -WL cannot count all patterns more than nodes accurately, but the bound of iterations of -WL grows quickly to . These conclusions encourage researchers to empower message passing and explore the possibilities of neural subgraph counting. Empirically, liu2020neural (liu2020neural) combined graph encoding and dynamic memory networks to count subgraph isomorphisms in an end-to-end way. They showed the memory with linear-complexity read-write operations can significantly improve all encoding models. A more challenging problem is subgraph isomorphism matching. NeuralMatch ying2020neural utilizes neural methods and a voting method to detect subgraph matching. However, it only returns whether one pattern is included in the data graph instead of specific isomorphisms. Neural subgraph matching is still under discussion. Besides, graph learning also applies on maximum common subgraph detection bai2021glsearch, providing another possible solution for isomorohisms.
In this paper, we theoretically analyze the connection between the edge-to-vertex transform and the duality of isomorphisms in heterogeneous multi-graphs. We design dual message passing neural networks (DMPNNs) based on the equivalence of isomorphism searching over original graphs and line graphs. Experiments on subgraph isomorphism counting and matching as well as unsupervised node classification support our theoretical exposition and demonstrate effectiveness. We also see huge performance boost in small patterns by stacking dual message passing and local relational pooling. We defer a better integration as future work.
The authors of this paper were supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20) and the GRF (16211520) from RGC of Hong Kong, the MHKJFS (MHP/001/19) from ITC of Hong Kong with special thanks to HKMAAC and CUSBLT, and the Jiangsu Province Science and Technology Collaboration Fund (BZ2021065). We thank Dr. Xin Jiang for his valuable comments and the Gift Fund from Huawei Noah’s Ark Lab.
Appendix A Appendix A Proof of Proposition 4
Assume the line graph is transformed from by and the line graph is transformed from by , then
Moreover, based on the isomorphism , we get
, . We find the bijection mapping to for any , which is a bijection from to .
Similarly, from the two necessary conditions of isomorphism about with and with and the definition , we have
, . Therefore, we conclude that the two line graphs and are isomorphic, where the dual isomorphism satisfies , . We denote . ∎
Appendix B Appendix B Proof of Theorem 6
Assume and are two connected directed heterogeneous multigraphs with reversed edges and their isomorphisms are , and are their line graphs with isomorphisms , Let , , , . To prove Theorem 6, we show .
The first step is to prove is equivalent to . The necessary conditions of are and ; the necessary conditions of are and . We know that