Graphs are ubiquitous due to their accurate depiction of relational data. Graph representation learning 
, in order to alleviate sparsity and irregularity of graphs, came into life, projecting nodes to vector spaces while preserving graph properties. Vector spaces being regular, graph representation learning hence serves as a versatile tool by accommodating numerous prediction tasks on graphs.
More recently, success in extending deep learning to graphs brought about Graph Neural Networks (GNNs) and achieved impressive performances. Many GNNs follow a recursive scheme called neighborhood aggregation, where the representation vectors of nodes are computed via aggregating and transforming the features within their neighborhoods. By doing so, a computation tree is constructed which is computed in a bottom-up manner. Being powerful yet efficient, neighborhood aggregation based GNNs111In this paper we focus on GNNs based on neighborhood aggregation, like , and leave other architectures for future work. have hence attracted the attention of numerous research works [18, 8].
In addition to node-level features which GNNs aggregate, features of other scales also prevail in graphs, among which, structural patterns of varying scales recurring frequently are typical and indicative of node and graph properties, such as functions in molecular networks , pattern of information flow (Granovetter granovetter1977strength, Paranjape paranjape2017motifs), and social phenomenons , which are often global insights which node-level features fail to provide.
Yet, although GNNs do encode neighborhoods of nodes , they are not ensured to generate distinctive results for nodes with different structural patterns. Specifically, distinctions between local structural patterns are minuscule, one or two links for example, which makes it hard for GNNs to generate distinctive embeddings for structural patterns, even with wildly different semantics.
We take the triadic closures, patterns characteristic of strong ties in social networks, as examples . We show the computation tree of a triadic closure in a 2-layer GNN in Fig. 1. As can be shown, the only difference the triadic closure makes is the existence of first order neighbors (green nodes) on the second layer of the tree, whose impact tends to diminish as their neighborhood (red nodes) expands. It is thus concluded that, based on neighborhood aggregation, GNNs can, in some cases, fail to generate distinctive embeddings for structural patterns that are topologically similar but carry wildly different semantics.
As the counterpart of CNNs in images, we would consider GNNs to be capable of capturing graphical features of varying levels and scales, including both node and local, structural level. Hence, one question arises: how can we enable GNNs to more adequately capture and leverage multi-scaled structural and node features? One straightforward way is to first measure the structural properties of each node and concatenate them with their node features as inputs. Yet easy as it is, two challenges remain to be solved.
Incorporation of structural properties. Challenges also lie in the incorporation of such properties. On one hand, structural features convey rich semantics that shed light on graph properties, which cannot be captured by statistics only. On the other hand, structural properties may indicate roles of nodes in graphs and hence guide the aggregation of features [8, 19].
Consequently, to complement GNNs for better addressing these structural patterns, we propose Graph Neural Network with Local Structural Patterns, abbreviated GraLSP, a GNN framework incorporating local structural patterns into the aggregation of neighbors. Specifically, we capture local structural patterns via random anonymous walks, variants of random walks that are able to capture local structures in a general manner. The walks are then projected to vectors to preserve their underlying structural semantics. We design neighborhood aggregation schemes with multiple elaborate techniques to reflect the impact of structures on feature aggregation. In addition, we propose objectives to jointly optimize the vectors for walks and nodes based on their pairwise proximity. Extensive experiments show that due to our elaborate incorporation of structural patterns, our model outperforms competitive counterparts in various tasks.
To summarize, we make the following contributions.
We analyze the neighborhood aggregation scheme and conclude that common GNNs suffer from defects in identifying some common structural patterns.
We propose that random walks can be used to capture structural patterns with analyses on them.
We propose a novel neighborhood aggregation scheme that combines the structural and node properties through adaptive receptive radius, attention and amplification.
We carry out extensive experiments with their results showing that our model, incorporating structural patterns into GNNs, attains satisfactory performances.
2 Related Work
There are generally two types of GRL methods, as defined by different notions of node similarity. On one hand, methods like DeepWalk perozzi2014deepwalk and GraphSAGE hamilton2017inductive adopt the notion of homophily, similarity defined by close connections. On the other hand, methods like struc2vec ribeiro2017struc2vec and Graphwave donnat2018learning define similarity as possessing similar topological structures. It should be noticed that although our method captures structural patterns, it, like most GNNs, falls into the former type, adopting the idea of homophily instead of structural similarity. We will demonstrate more on the two notions of node similarity in the experiments.
Graph Neural Networks (GNNs). GNNs (Scarselli scarselli2008graph, Bruna bruna2013spectral, Niepert niepert2016learning, Kipf kipf2016semi) gradually gain tremendous popularity in recent years. Recent researchers generally adopt the method of neighborhood aggregation, i.e. merging node features within neighborhoods to represent central nodes (Hamilton hamilton2017inductive).
Identifying the connection between GNNs and graph structures have also been popular.  and  demonstrated the equivalence between GNNs and the 1-WL isomorphism test.  showed the connection between GNN and Laplacian smoothing. Compared to previous works, our work focus more on “local” structures while  focus more on global graph structures, e.g. isomorphism.
Measuring Structural Patterns. Previous works on measuring structural patterns pay their attention on characteristic structures including shortest paths and graphlets (Shervashidze shervashidze2009efficient, Borgwardt borgwardt2005shortest) etc. In addition,  showed that it is possible to reconstruct a local neighborhood via anonymous random walks on graphs, a result surprising and inspiring to our model. Such notions of anonymous walks are extended by  who proposed graph embedding methods on them.
3 Model: GraLSP
In this section we introduce the design of our model, GraLSP, with a brief overview illustrated in Fig. 2.
We begin by introducing several backgrounds related to our problem, including graph representation learning, random and anonymous walks, and graph neural networks.
Definition 1 (Graph & Graph Representation Learning).
Given a graph , where is the set of nodes and is the set of edges, graph representation learning learns a mapping function
where with maintaining properties of node .
Specifically, GNNs characterize their mapping functions to be iterative, where a node’s representation vector is computed via aggregation of features within its neighborhood, which can be summarized by the following equations
Definition 2 (Random Anonymous Walks ).
Given a random walk where , the anonymous walk for is defined as
where denotes the number of distinct nodes in when first appears in , i.e.
We denote anonymous walks of length as according to their lexicographical order. For example, , etc.
The key difference between anonymous walks and random walks is that, anonymous walks depict the underlying “patterns” of random walks, regardless of the exact nodes visited. For example, both and correspond to the same anonymous walk , even though and visited different nodes.
3.2 Extracting Structural Patterns
We start by introducing our extraction of structural patterns through anonymous walks. For each node , a set of random walk sequences of length are sampled. Alias sampling is used such that the sampling complexity would be . We then compute the empirical distribution of their underlying anonymous walks as
In addition, we take the mean empirical distribution over the whole graph as
as estimates of the true distributionand .
Rationale of Anonymous Walks
There are works exploring properties of anonymous walks. Micali  showed that one can reconstruct a local sub-graph using anonymous walks. We present the theorem here.
 Let be the subgraph induced by all nodes such that and be the distribution of anonymous walks of length starting from , one can reconstruct using where and is the number of edges in .
This theorem underscores the ability of anonymous walks to capture structures in a highly general manner, in that they capture the complete -hop neighborhood 222Although we do not explicitly reconstruct , such theorem demonstrates the ability of anonymous walks to represent structural properties.. Yet this theorem is unrealistic considering representing structural patterns in GNNs. For example, for the dataset cora and , we get , which is impossible to deal with since the number of anonymous walks growing exponentially with . Instead, we propose an alternative that is more suitable for our task.
One can reconstruct with anonymous walk of length where is the number of edges in an ego-network of , if one can de-anonymize the first elements in each anonymous walk starting from .
Given that the graph follows power-law degree distribution, the expected number of edges in an ego-network of would be
where denotes average degree, maximum degree, clustering coefficient of .
These corollaries show the rationale of using reasonably long anonymous walks to depict general local structural patterns. Specifically, for citation graphs including Cora, Citeseer and AMiner, Eqn. 2 evaluates to about . We omit the detailed proofs due to space constraints.
In addition, we provide intuitive explanations of anonymous walks which we find appealing. Intuitively, an anonymous walk with distinct nodes induces a graph with and or . In this sense, a single anonymous walk is a partial reconstruction of the underlying graph, which is able to indicate certain structures, such as triads. We show the intuition with walks on a triadic closure as an example in Fig. 3.
3.3 Aggregation of Structural Patterns
In this section we introduce our incorporation of structural patterns into the representation of nodes.
Representing Anonymous Walks
Denoting anonymous walks as statistics is insufficient as walks represent structural patterns with varying similarities to each other. For example, we would intuitively believe that is highly similar to as they both indicate an underlying triad, but is dissimilar to as no triads are indicated.
Consequently, as we would like to capture the properties of varying walk sequences, we represent each anonymous walk as a vector through an embedding table lookup
to capture the properties of varying walks and structures.
In this part we introduce how we aggregate structures along with node-level features. Specifically, we focus on how to aggregate node features under the impact of their local structural patterns, instead of plainly aggregating them together using concatenation.
Intuitively, we consider structures to have the following impacts on the aggregation of information on graphs:
Defining Receptive Paths. Random walks can be seen as receptive paths, showing how information flows over the graph . Hence, we would like to define flexible receptive paths, or “neighbors” of according to its random walks, instead of fixed 1-hop neighbors.
Importance of Neighbors. Generally neighbors do not exert impact uniformly but exhibit varying strength. It has been studied that structures including cliques or dense clusters generally indicate strong social impact , which should be captured by our model.
Selective Gathering of Information. Structural patterns may also characterize selection towards the information to gather. For example, enzymes in biological networks share distinctive structures such that selective catalysis towards biological reactions is enabled .
To address the above impacts, we design our aggregation formula as follows.
where denotes a walk starting from , denotes the -th node of walk .
is the ReLU activation andis the mean pooling. In addition, denotes element-wise multiplication, while , denotes receptive radius of , attention and amplification coefficients, respectively, which we will introduce in detail later that correspond to the above impacts. Moreover, denote trainable weight matrices.
Adaptive Receptive Radius
While each random walk can be seen as a receptive path, properties of the walk imply different radius of reception. For example, if a walk visits many distinct nodes, it may span to nodes far away which may not exert impact on the central node. On the other hand, a walk visiting few distinct nodes indicate an underlying cluster, which are all close to the central node. Hence, we propose the adaptive receptive radius for neighborhood sampling to address it. Specifically, the receptive radius of a walk , or “window size” is negatively correlated to its span, i.e.
where denotes the number of distinct nodes visited by walk . We build the neighborhood of such that for each , only nodes within the radius are included, which forms an adaptive neighborhood of node .
We introduce amplification module for channel-wise amplification, or “gate”, to model the selective aggregation of node features in the neighborhood. Formally, we model similarly as:
is the sigmoid function to control the scale of amplification, and, are trainable parameters.
3.4 Model Learning
In this section we introduce the objectives guiding the learning of our model. Specifically, we design a multi-task objective function to simultaneously preserve proximity between both pairwise nodes but also pairwise walks.
Preserving Proximity of Walks
Intuitively, if two anonymous walks both appear frequently within the same neighborhood, they are supposed to depict similar structural information — the same neighborhood, and vise versa. Hence, we design our walk proximity objective as follows,
such that highly co-appearing walks are mapped with similar vectors. By constraining walk vectors in this way, we are endowing walk vectors with semantics which can interpret similarities, such that our operations of incorporating walk vectors into aggregation are sound.
Preserving Proximity of Nodes
An objective preserving node proximity is required so as to preserve node properties. We adopt the unsupervised objective of  but it does not rule out other objectives.
We combine the above two objectives together by summing them up
to obtain a multi-task objective preserving both proximity between pairwise walks and nodes. We adopt the Adam Optimizer to optimize the objective using TensorFlow.
In this section we introduce our experimental evaluations on our model GraLSP.
4.1 Experimental Setup
We use the following datasets for the experiments. We take nodes as papers, edges as citations, labels as research fields and word vectors as features, if not specified elsewhere.
Cora and Citeseer are citation datasets used in GCN (Kipf kipf2016semi). We reduce the feature dimensions from about 2000 to 300 and 500 through PCA, respectively.
AMiner is used in DANE . We reduce the feature dimensions from over 10000 to 1000.
We summarize the statistics of the datasets in Table 1.
|Dataset||Feature Dims||# Labels|
We take the following novel approaches in representation learning as baselines.
Skip-gram models, including DeepWalk perozzi2014deepwalk and LINE tang2015line, which optimizes proximity between nodes.
Structure models, focusing on topological similarity instead of connections, including struc2vec and Graphwave.
GNNs, including GraphSAGE, GCN and GAT. We use unsupervised GraphSAGE with mean aggregator and semi-supervised GCN, GAT with 6% labeled nodes.
As for parameter settings, we take 32-dimensional embeddings for all methods, and adopt Adam optimizer with learning rate 0.005. For GNNs, we take 2-layer networks with a hidden layer sized 100. For models involving skip-gram optimization including DeepWalk, GraphSAGE and GraLSP, we take , , window size as 5 and the number of negative sampling as 8. For models involving neighborhood sampling, we take the number for sampling as 20. In addition, we take , and for GraLSP, and keep the other parameters for the baselines as default.
4.2 Visualization as a Proof-of-Concept
We first carry out visualization on an artificial dataset as a proof-of-concept, to test GNNs’ ability to identify local structural patterns. We build from a circle with nodes, where each node is surrounded by either two open or closed triads interleavingly. In addition, for each triad, there are 4 addition nodes linked to it. Apparently the nodes on the circle possess two distinct structural properties, those surrounded by closed triads and those by open ones. We show the illustration of and its building blocks in Fig. 4.
We visualize the representations from GraphSAGE and GraLSP in Fig. 4. As shown, GraLSP generates a clearer boundary between the two types of nodes, while GraphSAGE fails to draw a boundary as distinctive, which not only demonstrates the inability of current GNNs in generating distinctive embeddings for different local structural patterns, but also underscores the ability of anonymous walks and GraLSP in complementing such drawbacks.
4.3 Node Classification
We carry out node classification on the four datasets. We learn representation vectors using the whole graph, which are then fed into Logistic Regression in Sklearn. We take 20% of all nodes as the test set and 80% as training. We take the macro and micro F1-scores for evaluation. In addition, all results are averaged for 10 independent experiments.
The results are shown in Table 2. As shown, the performance gain from original GNNs towards GraLSP is considerable, which demonstrates GraLSP is able to complement the drawbacks of identifying local structures. In addition, struc2vec and Graphwave perform poorly on academic datasets, but impressively on US-Airport, which can be attributed to the label definitions. In academic datasets, labels are defined as fields, where connected papers tend to have the same field and label, while in US-Airport, labels are taken as activity levels with less significant homophily but more related to structural properties. Nonetheless, we can see that generally GraLSP produces satisfactory results.
4.4 Link Prediction
We then carry out link prediction under the same settings. We generate the test set by sampling 10% of the edges as positive edges, which are removed during training, with an identical number of random negative edges. For an edge , we take the inner product of their vectors , which will serve as the score for ranking. We take AUC and recall at 50% (equal to the number of positive edges) as metrics.
The results are shown in Table 3. It can be shown that our model is able to achieve gains compared to GCN and GraphSAGE, which should not be surprising given that structural patterns will shed light on possible edges , which are better captured by our model. Again, it is not surprising that struc2vec and Graphwave fail to generate satisfactory performances in that they assign similar representations to structurally similar nodes instead of connected nodes. As for US-Airport dataset, it is likely that local proximity is sufficient to reconstruct the graph, as shown by all baselines except LINE fail to perform well.
4.5 Model Analysis
We carry out tests on our model itself, including parameter analysis and scalability. Unless specified, we use node classification on Cora to reflect the performance of the model. All parameters are fixed as mentioned except those tested.
Number of Walks Sampled
We run GraLSP with number of walks per node , and report their performances in Fig. 4(a). It can be shown that the more walks sampled each node, the better the performance. Empirically, as increasing from 100 to 200 yields no significant gain, we conclude that is reasonable in practice.
Length of Anonymous Walks
As longer walks are considered, more complex structural patterns are incorporated. We take and show the performances in Fig. 4(b). As shown, performance improves along with , with decreasing marginal gains. As the number of anonymous walks grows exponentially with , we conclude that would be sufficient in balancing efficiency and performance.
Weight of Objective Functions
We analyze the weight of losses , which determines the trade-off between the multi-task objective. We take , and plot the performances in Fig. 4(c). It can be observed that starting from , using only the objective in Eqn. 13, the performance peaks at , before taking a plunge afterwards. We hence conclude that, the incorporation of our multi-task objective does enhance the performance of the model.
Study of Aggregation Scheme
We analyze our aggregation scheme to verify that it enhances the aggregation of features. We compare our model with ordinary GraphSAGE, along with another GraphSAGE with node features concatenated with the distribution of anonymous walks, which proves to be a valid measure of structures (Ivanov ivanov2018anonymous). We quote this variant as GraphSAGE-concat.
We show the results on Cora and Citeseer in Fig. 4(d). As shown, with node features concatenated with structural features, the GraphSAGE-concat did not even outperform GraphSAGE, which demonstrates that simply combining them would compromise both. By comparison, our model with adaptive receptive radius, attention and amplification outperforms both GraphSAGE and GraphSAGE-concat.
We finally analyze the scalability of our model. We run our model on Erdos-Renyi random graphs with and . We tested the time needed for the preprocessing (i.e. sampling random walks) and training to converge, which is defined by the loss not descending for 10 continuous iterations.
We plot the time needed with respect to the number of nodes in log-log scale in Fig. 4(e). As can be seen, both the preprocessing and the training time are bounded by an complexity, which endorses the scalability of our model.
4.6 Visualization on Real World Datasets
We finally carry out visualization on real world datasets to qualitatively evaluate our model. We learn the representation vectors on Cora, which are then reduced to 2-dimensional vectors using PCA. We select three representative models: DeepWalk (skip-gram), GraphSAGE (GNNs) and struc2vec (structure models), along with our model to compare.
The plots are shown in Fig. 6, where yellow, green, blue and red dots correspond to 4 labels within Cora. As shown, struc2vec (Fig. 5(b)) illustrates no clusters as connected nodes do not share similar representations. In addition, while DeepWalk (Fig. 5(a)), GraphSAGE (Fig. 5(c)) and GraLSP (Fig. 5(d)) all illustrate clustering among nodes with the same label, GraLSP generates clearer boundaries than DeepWalk and GraphSAGE (between blue and red dots).
In this paper we present a GNN framework incorporating local structural patterns to current GNNs, called GraLSP. We start by analyzing drawbacks of current GNNs in identifying certain structural patterns, such as triadic closures. In what follows, we show that anonymous walks are effective alternatives in measuring local structural patterns. We then represent them with vectors and incorporate them into neighborhood aggregation with multiple modules. In addition, we present a multi-task objective function preserving proximity between both pairwise nodes and walks to preserve the semantics underlying certain structures. By adequately taking local structural patterns into account, our method outperforms various competitive baselines.
For future work, we plan to extend this paper to GNNs with more sophisticated architectures, such as RNNs. In addition, interpretations of structures in GNNs will definitely improve our insight on various network phenomenons.
-  (2018) PME: projected metric embedding on heterogeneous networks for link prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1177–1186. Cited by: §2.
-  (2018) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
-  (1977) The strength of weak ties. In Social networks, pp. 347–367. Cited by: 2nd item.
-  (2015) Triadic closure pattern analysis and prediction in social networks. IEEE Transactions on Knowledge and Data Engineering 27 (12), pp. 3374–3389. Cited by: §1, §4.4.
Anonymous walk embeddings.
International Conference on Machine Learning, pp. 2191–2200. Cited by: §2, §3.2.
-  (2013) Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences. Proceedings of the National Academy of Sciences 110 (45), pp. 18070–18075. Cited by: §1.
Deeper insights into graph convolutional networks for semi-supervised learning. In
Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
-  (2019) Geniepath: graph neural networks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4424–4431. Cited by: 2nd item, §1, 1st item.
-  (2019) Hierarchical community structure preserving network embedding: a subspace approach. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 409–418. Cited by: §2.
-  (2016) Reconstructing markov processes from independent and anonymous experiments. Discrete Applied Mathematics 200, pp. 108–122. Cited by: §2, §3.2, Definition 2, Theorem 1.
-  (2019) Weisfeiler and leman go neural: higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4602–4609. Cited by: §2.
A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic acids research 28 (20), pp. 4021–4028. Cited by: 3rd item.
-  (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §3.4.
-  (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23 (2), pp. e177–e183. Cited by: §1.
-  (2009) Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics, pp. 488–495. Cited by: 1st item, §3.2.
-  (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §3.3.
-  (2017) Community preserving network embedding. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §2.
-  (2018) How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826. Cited by: §1, §1, §2, §3.1, footnote 1.
Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: 2nd item.
-  (2019-07) DANE: domain adaptive network embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 4362–4368. External Links: Cited by: 2nd item.
-  (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §1.