Modeling Attention Flow on Graphs

11/01/2018 ∙ by Xiaoran Xu, et al. ∙ Hulu Peking University 6

Real-world scenarios demand reasoning about process, more than final outcome prediction, to discover latent causal chains and better understand complex systems. It requires the learning algorithms to offer both accurate predictions and clear interpretations. We design a set of trajectory reasoning tasks on graphs with only the source and the destination observed. We present the attention flow mechanism to explicitly model the reasoning process, leveraging the relational inductive biases by basing our models on graph networks. We study the way attention flow can effectively act on the underlying information flow implemented by message passing. Experiments demonstrate that the attention flow driven by and interacting with graph networks can provide higher accuracy in prediction and better interpretation for trajectories reasoning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 18

page 19

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many practical applications have the need to infer latent causal chains or to construct interpretations for observations or some predicted results. For example, in a physical world, we want to reason the trajectories of moving objects given very few observed frames; in a video streaming system, we wish recommendation models that track the evolving user interests to provide personalized recommendation reasons linking users’ watched videos to the recommended ones. Here, we focus on graph-based scenarios, and aim to infer latent chains that might cause observed results described by nodes and edges in a given graph.

Graph networks [1, 2, 3, 4, 5]

are a family of neural networks that operate on graphs and carry strong relational inductive biases. It is believed that graph networks have a powerful fitting capacity to deal with graph-structured data. However, its black-box nature makes it less competitive than other differentiable logic-based reasoning

[6, 7, 8] when modeling the reasoning process with interpretations provided. In this work, we develop a new attention mechanism on graphs, called attention flow, to model the reasoning process to predict the final outcome with the interpretability. We use the message passing algorithm in graph networks to derive a transition matrix evolving with time steps to drive the attention flow. We also let the attention flow act back on the passed messages, called information flow. To evaluate the models, we design a set of trajectory reasoning tasks, where only the source and destination ends of trajectories are observed.

Our contributions are two-fold. First, our attention flow mechanism, built on graph networks, introduces a new way to construct interpretations and increase the transparency when applying graph networks. Second, we show how attention flow can effectively intervene back in message passing conducted in graph networks, analogous to that of reinforcement learning where actions taken by agents would affect states of the environment. Experiments demonstrate that the graph network models with the explicit and backward-acting attention flow compare favorably both in prediction accuracy and in interpretability against those without it.

Related Works. Graph networks[3, 4, 5], dating back to a decade ago [1, 2], are thought to support relational reasoning and combinatorial generalization over graph-structured representations. Recently, this area has grown rapidly and many versions of graph networks have been proposed, including Gated Graph Neural Networks [9], Interaction Networks [10], Relation Networks [11], Message Passing Neural Networks [12], Graph Attention Networks [13], Non-Local Neural Networks [14], and the graph convolutional network family [15], spectral [16, 17, 18, 19] or non-spectral [20, 21, 22, 23, 24]. From a unified perspective, [12] introduces the message passing mechanism to generalize computation frameworks on graphs; [3] uses the term of graph networks

to generalize and extend several lines in this area. While graph networks give reasoning over graphs more fitting capacity, we look back on the old-fashioned logic and rules-based reasoning to seek the interpretability. Recent probabilistic logic programming, such as TensorLog

[6, 7] and NeuralLP [8]

, develops differentiable reasoning based on a knowledge graph, learning soft logic rules in an end-to-end style, and the process is much like a rooted random walk computing conditional probabilities based on paths. People also studied reasoning over paths or graphs using reinforcement learning to deal with discrete actions of choosing nodes or edges, such as MINERVA

[25], Structure2Vec Deep Q-learning [26]

, and Neural Combinatorial Optimization

[27]. Attention mechanisms, derived from sequence-based tasks [28], developed in [29] referred to as self-attention, have been brought in graphs recently by attending over neighbors of each node [13, 30] or non-local areas [14]. Here, we present the attention flow mechanism not only for the computation need but also for the interpretation purpose.

2 Tasks

Real-world scenarios often demand reasoning about process, that is, constructing interpretations by listing a series of causal connections linking an opening to an outcome. We need a simulation system to generate a trajectory of events with the dynamics governed by latent factors, such as the trajectory of a moving object controlled by an external force. Instead of full observation, we only allow events at the source and the destination observed, treating the task as a trajectory reasoning problem.

We build a corrupted grid world with a small fraction of nodes or edges randomly removed. There are 8 types of directed edges at most for each node, connecting it to its neighbors, such as east (E) and northeast (NE). Picking an arbitrary node as the source , we draw a sequence of consecutive nodes to construct a trajectory and obtain the final node as the destination. Each node on the trajectory except the source is chosen from the neighborhood of the previous node by drawing one of the 8 edge types from the distribution below driven by latent direction function varying with time and location :

(1)

The trajectory terminates by either choosing a non-existent edge or reaching the maximal steps. To be specific, we generate four types of trajectories governed by:

  • [nosep, wide=0pt, leftmargin=]

  • , a straight line with a slope.

  • , a sine curve with directions varying with time .

  • , , directions varying with the current location.

  • , , depending on location history.

Instead of learning a latent model to solve the trajectory reasoning problem, we use a supervised setting. Considering only the source and the destination available, we train a discriminative model to predict the destination node by inputting the source node. We leverage the graph structure in the corrupted grid world, as the problem implies strong inductive biases on graphs, relational (sequences of consecutive nodes) and non-relational (latent direction functions depending on time, location and history). The trajectory reasoning problem is difficult considering that many candidate paths link the source to the destination. The only clue we can observe is the destination nodes resulting from the blocked trajectories caused by removed nodes or edges. Note that we do not look for the shortest paths but the true trajectory pattern governed by some latent dynamics. The evaluation criteria should be based on both the accuracy of prediction and the human readability of interpretation.

3 Models

Modeling attention flow on graphs

. We view the problem of predicting the destination given the source as predicting the output attention distribution over nodes given the input attention distribution. We use the term of attention distribution to represent the probability distribution of attending over nodes. For each pair

, the input attention distribution has all the probability mass concentrated on the source node. After a series of computation on the graph, the resulting output attention distribution predicts the most likely node to be the destination. Attention transfers from the source to the destination, implying a flow through the graph mimicking latent causal chains. The followings attempt to model the attention flow on graphs from three different perspectives.

Implicit attention flow in graph networks. Modern graph networks (GNs) mostly employ the message passing mechanism implemented by neural network building blocks, such as MLP and GRU modules. Representations in GNs include node-level states , edge-level messages and sometimes graph-level global state . A GN framework has three phases: the initialization phase, the propagation phase, and the output phase. The model in the propagation phase includes:

  • [nosep, wide=0pt, leftmargin=]

  • Message function: , where is edge type-specific parameters.

  • Message aggregation operation: , aggregating all received messages from neighbors.

  • Node update function: , where is stationary node embeddings.

  • Global update function: , where .

To model the attention flow, we modify the initialization phase by defining where is attention channels and auxiliary channels. We initialize for the source and for the rest where

acts as a reference vector for computing attention distributions, so that score

is on the source node and on the rest. We set . At the output phase, we compute the output attention distribution by .

This model wraps the attention flow in the messages passing process at the beginning and takes it out at the end. Neural network-based computation makes the propagation model a black box, lacking an explicit way to depict the attention flow, helpless for the interpretation purpose.

Explicit attention flow by random walks. To explicitly model the attention flow, we use random walks with learnable transition . The dynamics of attention flow are driven by where and represent two consecutive attention distributions. Here, we take two model settings:

  • [nosep, wide=0pt, leftmargin=]

  • Stationary transition setting: and do the row-level softmax to get entry . The transition is stationary across inputs and steps.

  • Dynamic transition setting: where , and . Note that no message passing is applied. The update on global state is based on the weighted sum of node states affected by . We emphasize that attention distributions can act as more than output of internal states and effectively impact back on the internal. The graph context captured this way is still limited without leveraging message passing.

Figure 1: The two-flow model architecture.

Explicit attention flow with graph networks. The interpretability benefit of explicit attention flow can be gained even when enjoying the expressivity of graph networks. We present the attention flow mechanism by introducing the node-level attention, called focused attention , and the edge-level attention, called flowing attention . With the superscript step , the dynamics are driven by:

(2)

where transition relies on the rich context carried by the underlying message passing in GNs, thus called information flow, in addition to attention flow. See the two-flow model architecture in Figure 1.

It is obvious that the information flow determines the attention flow, but we are more curious about how the attention flow can affect back the information flow. We have seen the node-level backward acting of the focused attention on the sum of node states as above. Here, we study on the edge level how the flowing attention acts on the information flow by defining the message-attending function to produce attended message to replace original message :

  • [nosep, wide=0pt, leftmargin=]

  • No acting:

  • Multiplying:

  • Non-linearly acting after multiplying:

To design a meaningful , when we make independent from , implying no attention paid to this piece of message but not necessarily being . We find performs the best in most cases, revealing not only the importance of backward acting but also the necessity of keeping information flow even if not being attended.

Connections to reinforcement learning and probabilistic latent models. If we inject noises and then pick the top-1 attended node each step, the process becomes similar to reinforcement learning in some way. If we apply noises but keep it soft in the Gumbel-Softmax [31] or Concrete [32] distribution, it turns into a probabilistic latent model. Attention flow can be viewed as graph-level computation operating directly andnumerically in a probability space rather than in a discrete sample space.

4 Experiments

4.1 Experimental Procedure

Dataset generation and statistics. We generate a number of dataset groups each representing a randomized grid world with a specific version of trajectories, consisting of sequences of nodes, driven by a specific setting of latent dynamics applied. More specifically, for each dataset group, we build a corrupted grid world and then apply a latent direction function to draw multiple trajectories starting from each node. The data generation parameters include:

  • [nosep, wide=0pt, leftmargin=]

  • : The size of a grid map. Without nodes and edges dropped, we have nodes and directed edges at most, where each internal node is connected to 8 incoming edges as well as 8 outgoing edges. In the experiments, we test models in two sizes: and .

  • and : The dropping probabilities of randomly removing a node or an edge. If a node is removed, all its connected edges should be gone; if a node is left with no edges, that is, a single isolated point, we remove it. When we remove an edge connected by a pair of nodes and , we drop edges and in both directions. In the experiments, we try two settings: dropping nodes only with , and dropping edges only with .

  • : The maximal steps of a trajectory. Without being blocked, a trajectory would end with a maximal length of . In the experiments, we set in and in . Therefore, the total problem scale depends on , and , and .

  • : The standard deviation of sampling an edge around latent direction

    . Larger increases the chance to bypass gaps caused by removed nodes and edges, and also brings a larger deviation of positions of the destination nodes given the same source, leading to a larger exploring area and more uncertainty for prediction. In the experiments, we pick two values, and .

  • : The number of rollouts to draw a trajectory starting from a specific node. With each node as a source, we try times and then remove the duplicated source-destination pairs. In the experiments, we use .

  • : The latent direction function. In the experiments, we try four settings as shown in Section 2: (1) a straight line with a constant sloped direction; (2) a sine curve with time-dependent varying directions; (3) a curve with location-dependent varying directions; (4) a curve with location history-dependent varying directions.

  • : The datasets in each group share the same generation parameters listed above except . The purpose is to make experimental results less impacted by accidental factors. We use five random seeds in the experiments.

We generate dataset groups with their names and generation parameters listed in Table 4 in the appendix. Each dataset group contains five datasets with different s. The observed part of each dataset includes a grid map containing all edge information and a list of source-destination pairs used for training, validation and test. We make the training, validation and test sets by an splitting on source nodes, so that we can assess models based on their performances of handling pairs with unseen source nodes. The statistics of datasets are given in Table 5 in the appendix.

Models in comparison. To fully evaluate the attention flow mechanism, we choose three types of graph networks plus two random walk-based models to set the benchmark. Graph networks include a full Graph Network (FullGN) [3], a Gated Graph Neural Network (GGNN) [9], and a Graph Attention Network (GAT) [13]. Note that the regular versions of these graph networks are incapable to explicitly model attention flow and fulfill the trajectory reasoning purpose, though able to make predictions. We remodel them with the attention flow mechanism imposed respectively, which is implemented in three different ways on how the flowing attention acts back on the passed messages.

Regular graph networks. Node states have channels (or dimensions) where the number of attention channels is and the rest channels are left for carrying auxiliary messages. We tried several combinations for the pair of channel numbers and found 8 attention channels performed the best. To initialize , we set for the source node and for the rest. We also initialize where represents stationary node embeddings and is a single-layer feedforward network with the activation function. The loss is defined in the cross entropy between the one-hot true destination labels and the predicted probability distribution by .

  • [nosep, wide=0pt, leftmargin=]

  • FullGN: This model has global state , and both the message function and the node update function take as one of their inputs. Here, each function uses a single-layer feedforward network.

    • [nosep, wide=0pt, leftmargin=]

    • Message function: .

    • Node update function: .

    • Global update function: .

  • GGNN

    : This model computes messages in a non-pairwise linear manner that depends on sending nodes and edge types. It ignores the global state and changes the node update function into a gated recurrent unit (GRU).

    • [nosep, wide=0pt, leftmargin=]

    • Message function: .

    • Node update function: .

  • GAT: This model uses multi-head self-attention layers. Here, we take edge types into account to define weights. We also apply a GRU to the node update function, taking the concatenation of all multi-head aggregated messages and node embedding as the input. We use heads each with an -dimensional self-attention so that the concatenated message still has dimensions.

    • [nosep, wide=0pt, leftmargin=]

    • Multi-head self-attention: .

    • Message function: .

    • Node update function: .

Remodeled graph networks with explicit attention flow. We add our attention flow module onto the computation framework of graph networks as shown in Figure 1. At the initialization phase, we define by and for the rest; at the output phase, we take to compute the loss:

For the propagation phase, we need to compute two more functions:

  • [nosep, wide=0pt, leftmargin=]

  • Transition logits function:

    in order to compute .

  • Message-attending function: to produce attended message in place of original message . We study three ways to implement it: (1) no acting, (2) multiplying, (3) non-linearly acting after multiplying. For (3), we simply use a single-layer feedforward network. Finally, we derive three explicit attention flow models based on each graph network.

    • [nosep, wide=0pt, leftmargin=]

    • {FullGN, GGNN, GAT}-NoACT.

    • {FullGN, GGNN, GAT}-Mul.

    • {FullGN, GGNN, GAT}-MulMlp.

Random walk-based models. If we model the attention flow without considering message passing conducted in graph networks, the method falls into the family of differentiable random walk models with a learned transition matrix. Here, we try two types of transition, stationary and dynamic.

  • [nosep, wide=0pt, leftmargin=]

  • RW-Stationary:

    • [nosep, wide=0pt, leftmargin=]

    • Transition logits function:

  • RW-Dynamic:

    • [nosep, wide=0pt, leftmargin=]

    • Transition logits function: .

    • Node update function: .

    • Global update function: where .

 

LINE SINE LOCATION HISTORY
Model H@1(%) MRR H@1(%) MRR H@1(%) MRR H@1(%) MRR

 

RW-Stationary 15.80 0.3409 8.56 0.2177 50.06 0.6625 16.44 0.3215
RW-Dynamic 16.64 0.3562 19.15 0.3418 45.94 0.6320 20.41 0.3656

 

FullGN 15.13 0.3451 51.71 0.6665 25.67 0.4393 16.61 0.3095
FullGN-NoAct 16.65 0.3574 30.10 0.4476 46.47 0.6337 20.80 0.3729
FullGN-Mul 16.69 0.3636 37.49 0.4915 43.44 0.6029 21.35 0.3618
FullGN-MulMlp 16.99 0.3662 39.91 0.5195 50.93 0.6598 23.94 0.3850

 

GGNN 15.49 0.3493 51.02 0.6611 29.20 0.4699 22.56 0.3677
GGNN-NoAct 16.64 0.3570 25.14 0.3918 49.50 0.6621 21.69 0.3818
GGNN-Mul 16.95 0.3610 23.62 0.3776 45.22 0.6185 23.81 0.3824
GGNN-MulMlp 17.08 0.3673 34.75 0.4699 50.28 0.6637 26.06 0.4001

 

GAT 16.01 0.3469 43.19 0.5566 18.18 0.3583 12.11 0.2333
GAT-NoAct 16.02 0.3536 15.77 0.3221 46.10 0.6356 23.17 0.3818
GAT-Mul 15.86 0.3501 20.14 0.3429 45.83 0.6208 22.70 0.3762
GAT-MulMlp 17.07 0.3646 30.64 0.4390 47.52 0.6371 20.71 0.3655

 

Table 1: Comparison results on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ32-STP16-NDRP-STD0.2. This table focuses on comparative evaluation between all the explicit attention flow models that offer clear interpretations as well as prediction results, so that we gray the results from the implicit attention flow models, that is, regular graph networks. Each column indicates one comparison in a specific metric based on the same dataset group. represents the highest metric score acquired by random walk-based models. represents the graph network-based explicit attention flow models that beat the best random walk-based models. Futhermore, we compare the three message-attending approaches between the explicit attention flow models based on the same graph network, and then highlight the best in bold.

Training and evaluation details

. Considering that trajectories might terminate before reaching the maximal steps, we add a selfloop edge onto each node so that we can treat all trajectories as ones with a fixed number of steps. Thus, there would be 9 types of edges during training. Our training hyperparameters include the batch size of

, the representation dimensions of , the weight decay on node embeddings of , the decayed learning rates from to diminished by

every 10 epochs, and a total number of training epochs of

. We use the Adam SGD optimizer for all models. When dealing with larger datasets of , we reduce the batch size to and the representation dimensions to

. During experiments, for each model we conducted 10 runs on each dataset group by five different generation seeds and two different input shufflings. We saved one model snapshot every epoch and chose the best three according to their validation performance, and then computed the mean and standard deviation of their evaluation metrics on the test set.

Evaluation metrics

. We use Hits@1, Hits@5, Hits@10, the mean rank (MR), and the mean reciprocal rank (MRR) to evaluate these models. Hits@k means the proportion of test source-destination pairs for which the target destination is ranked in the top-k predictions, and thus Hits@1 is the prediction accuracy. Compared to Hits@k, MR and MRR can evaluate prediction results even when the target destination is ranked out of the top-k. MR provides a more intuitive sense about how many are ranked before the target on average, but often suffers from its instability susceptible to the worst example and becomes very large. MRR scores always range from

to . For MR lower score reflects better prediction, whereas for MRR higher score means better.

 

LINE SINE LOCATION HISTORY
Model H@1(%) MRR H@1(%) MRR H@1(%) MRR H@1(%) MRR

 

RW-Stationary 15.74 0.3348 9.07 0.2267 19.40 0.3860 11.72 0.2547
RW-Dynamic 15.48 0.3429 13.38 0.2905 17.91 0.3722 12.25 0.2820

 

FullGN 14.61 0.3355 17.10 0.3525 16.09 0.3476 13.79 0.3051
FullGN-NoAct 15.50 0.3410 16.60 0.3360 18.83 0.3816 13.90 0.3059
FullGN-Mul 16.21 0.3498 16.93 0.3283 18.50 0.3787 12.93 0.2807
FullGN-MulMlp 16.07 0.3502 17.31 0.3389 19.64 0.3991 14.89 0.3145

 

GGNN 14.53 0.3344 17.45 0.3555 17.11 0.3689 14.83 0.3217
GGNN-NoAct 15.58 0.3415 16.51 0.3262 19.66 0.3912 13.84 0.2957
GGNN-Mul 15.79 0.3448 16.03 0.3226 17.84 0.3723 14.16 0.2971
GGNN-MulMlp 15.99 0.3497 17.31 0.3370 19.39 0.3911 14.80 0.3053

 

GAT 14.79 0.3300 16.48 0.3338 14.51 0.3227 10.80 0.2538
GAT-NoAct 15.83 0.3414 15.15 0.3161 17.82 0.3702 12.60 0.2829
GAT-Mul 15.01 0.3351 14.85 0.3070 18.27 0.3749 13.49 0.2885
GAT-MulMlp 16.25 0.3493 16.45 0.3292 18.93 0.3843 13.75 0.2933

 

Table 2: Comparison results on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ32-STP16-NDRP-STD0.5. The marks in this table have the same meanings as Table 1.

 

LINE SINE LOCATION HISTORY
Std Model H@1(%) MRR H@1(%) MRR H@1(%) MRR H@1(%) MRR

 

0.2 GGNN 12.36 0.2918 27.08 0.4047 20.98 0.4323 11.47 0.2785
GGNN-MulMlp 15.56 0.3335 39.98 0.5317 47.56 0.6534 23.49 0.3888

 

0.5 GGNN 11.04 0.2759 10.53 0.2616 9.74 0.2347 8.67 0.2222
GGNN-MulMlp 14.53 0.3179 16.08 0.3137 18.39 0.3782 13.60 0.2902

 

Table 3: Comparison results on larger datasets of that are {LINE,SINE,LOCATION,HISTORY}-SZ64-STP32-NDRP-STD{0.2,0.5}.

4.2 Experimental Results

Figure 2: The true trajectory v.s. the latent directions in arrows v.s. the learned attention flow. The first row reflects a constant sloped direction, the second row time-dependent directions, the third row location-dependent directions, and the last row history-dependent directions. The drawn attention flow is based on the max aggregation of normalized attention distributions over the steps.

Objectives of comparison. We list our objectives of comparative evaluation from three aspects.

  • [nosep, wide=0pt, leftmargin=]

  • To test how well the explicit attention flow modeling can leverage rich context carried by message passing in graph networks, compared to the modeling purely based on random walks.

  • To test whether the backward acting of attention flow on message passing is useful, and which way can be the most effective.

  • To test whether the explicit attention flow can improve the prediction accuracy.

Discussion on comparison results. First, we compare the models that explicitly model attention flow, between the random walk-based and the graph network-based. From Table 1 and 2, we see in most cases the models favored by graph networks surpass the random walk-based models, often by a large margin. Although there are exceptions that RW-Stationary performs strong in the location-dependent cases, probably due to little context needed other than current location information, the best of the graph network-based models, such as FullGN-MulMlp, can still beat it. Second, we compare the backward acting mechanism between no acting, the multiplying acting, and the non-linearly acting. The non-linearly acting after multiplying performs the best in almost all cases. What surprises us is that simply doing multiplying may degrade the performance, making it worse than no acting. How to design an effective backward acting mechanism is worth further study in future work. Last, we compare our remodeled graph networks with explicit attention flow against the regular graph networks. For the datasets, {FullGN,GGNN,GAT}-MulMlp exceed their regular graph network counterparts respectively except for dataset groups SINE-SZ32-STP16-*. For the larger datasets, we test GGNN and GGNN-MulMlp, and find that GGNN-MulMlp performs significantly much better than GGNN on every evaluation metric as shown in Table 3.

Discussion on visualization results. We visualize the learned attention flow in a corrupted grid map, compared with the true trajectories and latent directions by taking one example for each direction setting as shown in Figure 2. At first glance, the drawn attention flow over the steps makes up a belt linking from the source node to the destination node, almost matching the true trajectories, especially as shown in the first and second rows in Figure 2. On closer inspection, we find that the attention flow might not necessarily follow the one-path pattern but instead branch to enlarge the exploring area that is more likely to contain a destination node especially near gaps, as shown in the last two rows.

5 Conclusion

In this paper, we introduce the attention flow mechanism to explicitly model the reasoning process on graphs, leveraging the rich context carried by message passing in graph networks. We treat this mechanism as a way to offer accurate predictions as well as clear interpretations. In addition, we study the backward acting of attention flow on information flow implemented by message passing, and show some interesting findings from experimental results. The interaction between the two flows, one favoring the fitting capacity and one offering the interpretability, may be worth further study in future work.

References

1 Experiments (cont’d)

Dataset Group
LINE-SZ32-STP16-NDRP-STD0.2 Line 32 16 0.2 0.1 0.0
LINE-SZ32-STP16-NDRP-STD0.5 Line 32 16 0.5 0.1 0.0
SINE-SZ32-STP16-NDRP-STD0.2 Sine 32 16 0.2 0.1 0.0
SINE-SZ32-STP16-NDRP-STD0.5 Sine 32 16 0.5 0.1 0.0
LOCATION-SZ32-STP16-NDRP-STD0.2 Location 32 16 0.2 0.1 0.0
LOCATION-SZ32-STP16-NDRP-STD0.5 Location 32 16 0.5 0.1 0.0
HISTORY-SZ32-STP16-NDRP-STD0.2 History 32 16 0.2 0.1 0.0
HISTORY-SZ32-STP16-NDRP-STD0.5 History 32 16 0.5 0.1 0.0
LINE-SZ32-STP16-EDRP-STD0.2 Line 32 16 0.2 0.0 0.2
LINE-SZ32-STP16-EDRP-STD0.5 Line 32 16 0.5 0.0 0.2
SINE-SZ32-STP16-EDRP-STD0.2 Sine 32 16 0.2 0.0 0.2
SINE-SZ32-STP16-EDRP-STD0.5 Sine 32 16 0.5 0.0 0.2
LOCATION-SZ32-STP16-EDRP-STD0.2 Location 32 16 0.2 0.0 0.2
LOCATION-SZ32-STP16-EDRP-STD0.5 Location 32 16 0.5 0.0 0.2
HISTORY-SZ32-STP16-EDRP-STD0.2 History 32 16 0.2 0.0 0.2
HISTORY-SZ32-STP16-EDRP-STD0.5 History 32 16 0.5 0.0 0.2
LINE-SZ64-STP32-NDRP-STD0.2 Line 64 32 0.2 0.1 0.0
LINE-SZ64-STP32-NDRP-STD0.5 Line 64 32 0.5 0.1 0.0
SINE-SZ64-STP32-NDRP-STD0.2 Sine 64 32 0.2 0.1 0.0
SINE-SZ64-STP32-NDRP-STD0.5 Sine 64 32 0.5 0.1 0.0
LOCATION-SZ64-STP32-NDRP-STD0.2 Location 64 32 0.2 0.1 0.0
LOCATION-SZ64-STP32-NDRP-STD0.5 Location 64 32 0.5 0.1 0.0
HISTORY-SZ64-STP32-NDRP-STD0.2 History 64 32 0.2 0.1 0.0
HISTORY-SZ64-STP32-NDRP-STD0.5 History 64 32 0.5 0.1 0.0
Table 4: Parameters of generating the datasets
Dataset Group #Nodes #Edges #Trajs #Trajs-per-node Traj-length
LINE-SZ32-STP16-NDRP-STD0.2 921 6319 4829 5.2 9.1
LINE-SZ32-STP16-NDRP-STD0.5 921 6319 5029 5.5 9.0
SINE-SZ32-STP16-NDRP-STD0.2 921 6319 1555 1.7 9.0
SINE-SZ32-STP16-NDRP-STD0.5 921 6319 4380 4.8 9.4
LOCATION-SZ32-STP16-NDRP-STD0.2 921 6319 1440 1.6 7.9
LOCATION-SZ32-STP16-NDRP-STD0.5 921 6319 4098 4.4 8.8
HISTORY-SZ32-STP16-NDRP-STD0.2 921 6319 1541 1.7 8.5
HISTORY-SZ32-STP16-NDRP-STD0.5 921 6319 4418 4.8 9.2
LINE-SZ32-STP16-EDRP-STD0.2 1023 6248 4828 4.7 6.8
LINE-SZ32-STP16-EDRP-STD0.5 1023 6248 5051 4.9 6.7
SINE-SZ32-STP16-EDRP-STD0.2 1023 6248 1441 1.4 6.3
SINE-SZ32-STP16-EDRP-STD0.5 1023 6248 4238 4.1 6.8
LOCATION-SZ32-STP16-EDRP-STD0.2 1023 6248 1576 1.5 6.0
LOCATION-SZ32-STP16-EDRP-STD0.5 1023 6248 4327 4.2 6.6
HISTORY-SZ32-STP16-EDRP-STD0. 1023 6248 1586 1.5 6.0
HISTORY-SZ32-STP16-EDRP-STD0. 1023 6248 4441 4.3 6.6
LINE-SZ64-STP32-NDRP-STD0.2 3686 25891 21390 5.8 11.4
LINE-SZ64-STP32-NDRP-STD0.5 3686 25891 22106 6.0 11.1
SINE-SZ64-STP32-NDRP-STD0.2 3686 25891 6691 1.8 11.4
SINE-SZ64-STP32-NDRP-STD0.5 3686 25891 19233 5.2 11.7
LOCATION-SZ64-STP32-NDRP-STD0.2 3686 25891 6522 1.8 10.1
LOCATION-SZ64-STP32-NDRP-STD0.5 3686 25891 17482 4.7 10.8
HISTORY-SZ64-STP32-NDRP-STD0.2 3686 25891 6987 1.9 11.0
HISTORY-SZ64-STP32-NDRP-STD0.5 3686 25891 19303 5.2 11.7
Table 5: Dataset statistics (All numbers represent average results. Note that dataset groups LINE-*-STD0.2 produce more trajectories per node than their counterparts, because the slop we choose makes the latent direction equally between two candidate edges, introducing more randomness to generate trajectories.)

1.1 Discussion about Results

During the experiments, we found some results in our expectation as discussed in the model section, as well as some unexpected results that surprise us, probably worth further study in future work. Now we summarize them as follows:

  • [wide=10pt, leftmargin=]

  • Backward acting of attention flow on information flow is useful, better than no acting in most cases.

    • [wide=10pt, leftmargin=]

    • FullGN-{Mul,MulMlp} both perform better than FullGN-NoAct on Hits@1, Hits@5, Hits@10, MR and MRR for dataset groups LINE-SZ32-STP16-*, on Hits@1, Hits@5, Hits@10 and MRR for dataset groups SINE-SZ32-STP16-NDRP-STD0.2, SINE-SZ32-STP16-EDRP-*, LOCATION-SZ32-STP16-EDRP-STD0.5, on Hits@1 for dataset groups SINE-SZ32-STP16-NDRP-STD0.5, HISTORY-SZ32-STP16-*-STD0.2.

    • GGNN-{Mul,MulMlp} both perform better than GGNN-NoAct on Hits@1, Hits@5, Hits@10, MR and MRR for dataset group LINE-SZ32-STP16-NDRP-STD0.2, on Hits@1, MR and MRR for dataset group LINE-SZ32-STP16-NDRP-STD0.5, on Hits@1 and MRR for dataset groups HISTORY-SZ32-STP16-NDRP-*, on Hits@1 for dataset group SINE-SZ32-STP16-NDRP-STD0.5.

    • GAT-{Mul,MulMlp} both perform better than GAT-NoAct on Hits@1, Hits@5 and MRR for dataset groups SINE-SZ32-STP16-NDRP-STD0.2, LOCATION-SZ32-STP16-NDRP-STD0.5, HISTORY-SZ32-STP16-NDRP-STD0.5.

  • Simply applying multiplying to backward acting might cause degradation.

    • [wide=10pt, leftmargin=]

    • {FullGN,GGNN,GAT}-Mul perform poorly on dataset groups LOCATION-SZ32-STP16-*.

    • {GGNN,GAT}-Mul perform poorly on dataset group SINE-SZ32-STP16-NDRP-STD0.5.

    • GGNN-Mul performs poorly on dataset group SINE-SZ32-STP16-NDRP-STD0.2.

  • Non-linear backward acting after multiplying can work consistently well, always performing the best among the backward acting of NoAct, Mul and MulMlp, and even often the best among all the models.

    • [wide=10pt, leftmargin=]

    • {FullGN,GGNN,GAT}-MulMlp perform the best on Hits@1, Hits@5, MR and MR for dataset groups LINE-SZ32-STP16-*-STD0.2, on Hits@1 and MRR for dataset groups LOCATION-SZ32-STP16-*-STD0.2

    • {GGNN,GAT}-MulMlp perform the best on Hits@1, Hits@5 and MR for dataset groups LINE-SZ32-STP16-NDRP-STD0.5, HISTORY-SZ32-STP16-NDRP-STD0.5, on MR for dataset group SINE-SZ32-STP16-NDRP-STD0.2

    • {FullGN,GAT}-MulMlp perform the best on Hits@1, Hits@5, Hits@10 and MR for dataset group LOCATION-SZ32-STP16-NDRP-STD0.5, on MR for dataset group SINE-SZ32-STP16-NDRP-STD0.2, on MR and MRR for dataset group LINE-SZ32-STP16-NDRP-STD0.5

    • {FullGN,GGNN}-MulMlp perform the best on Hits@1 and MRR for dataset group HISTORY-SZ32-STP16-NDRP-STD0.2.

    • FullGN-MulMlp performs the best on Hits@1, Hits@5 and MRR for dataset groups LINE-SZ32-STP16-EDRP-STD0.2, LOCATION-SZ32-STP16-EDRP-*, on Hits@1 on dataset groups LINE-SZ32-STP16-EDRP-*, LOCATION-SZ32-STP16-EDRP-*, HISTORY-SZ32-STP16-EDRP-STD0.2, SINE-SZ32-STP16-EDRP-STD0.5, on MRR for dataset groups LINE-SZ32-STP16-EDRP-STD0.5, SINE-SZ32-STP16-EDRP-STD0.5, LOCATION-SZ32-STP16-EDRP-STD0.5

  • Reasoning purely based on random walks with no message passing may be the worst in most cases, but in a few cases it can work surprisingly well.

    • [wide=10pt, leftmargin=]

    • RW-Stationary performs very poorly on dataset groups LINE-SZ32-STP16-NDRP-*, SINE-SZ32-STP16-NDRP-*, HISTORY-SZ32-STP16-NDRP-* but surprisingly well on dataset groups LOCATION-SZ32-STP16-NDRP-*. It is probably because under the location-dependent latent directions it requires little context except current location information to do the trajectory reasoning.

    • When considering global context information, RW-Dynamic works better than RW-Stationary in the cases where RW-Stationary performs poorly, but it obtains lower scores than RW-Stationary on dataset groups LOCATION-SZ32-STP16-NDRP-*, demonstrating again that little context is needed here.

  • Regular graph networks work extremely well in the cases with the time-dependent latent directions, but they might not be suitable for other cases, such as the location-dependent and the history-dependent latent directions.

    • [wide=10pt, leftmargin=]

    • FullGN, GGNN, GAT obtain the highest scores and exceed the second by a large margin on dataset groups SINE-SZ32-STP16-* but still get a large MR score.

    • FullGN, GGNN, GAT perform very poorly on dataset groups LOCATION-SZ32-STP16-*, HISTORY-SZ32-STP16-* except GGNN on HISTORY-SZ32-STP16-NDRP-STD0.5.

  • Models with attention flow taking non-linear backward acting might perform significantly better on a larger scale than those without.

    • [wide=10pt, leftmargin=]

    • Due to the limitation of computation resource, we have not taken multiple runs on the dataset groups of but only run GGNN, GGNN-MulMlp once on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ64-STP32-NDRP-STD{0.2,0.5}. From the results, we can see GGNN-MulMlp surpasses GGNN by a large amount on every evaluation metric.

Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 15.80 0.56 56.95 1.96 78.25 2.31 9.857 1.167 0.3409 0.0098
RW-Dynamic 16.64 0.65 59.65 1.89 82.13 1.77 7.061 0.594 0.3562 0.0084
FullGN 15.13 0.74 59.75 2.24 83.83 1.99 6.800 0.993 0.3451 0.0082
FullGN-NoAct 16.65 0.96 59.35 2.08 82.50 1.82 6.628 0.392 0.3574 0.0102
FullGN-Mul 16.69 0.86 61.20 1.66 83.58 2.24 6.399 0.609 0.3636 0.0108
FullGN-MulMlp 16.99 0.91 61.59 2.05 83.73 1.73 6.296 0.615 0.3662 0.0109
GGNN 15.49 0.98 60.30 1.80 83.60 1.96 6.605 0.580 0.3493 0.0100
GGNN-NoAct 16.64 0.86 59.62 2.02 82.52 1.53 6.760 0.429 0.3570 0.0095
GGNN-Mul 16.95 0.70 59.80 1.65 82.61 2.05 6.477 0.532 0.3610 0.0089
GGNN-MulMlp 17.08 0.77 61.45 1.82 83.54 1.89 6.316 0.684 0.3673 0.0111
GAT 16.01 1.06 58.86 1.82 80.78 1.91 18.306 16.927 0.3469 0.0121
GAT-NoAct 16.02 0.65 59.42 1.95 82.29 1.70 6.707 0.397 0.3536 0.0091
GAT-Mul 15.86 0.99 58.77 2.64 81.67 1.98 6.518 0.474 0.3501 0.0122
GAT-MulMlp 17.07 0.79 60.60 1.70 83.22 2.18 6.488 0.670 0.3646 0.0100
Table 6: Comparison results on dataset group LINE-SZ32-STP16-NDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 15.74 0.54 55.28 1.15 76.40 1.24 10.087 0.610 0.3348 0.0040
RW-Dynamic 15.48 0.82 57.07 1.30 79.19 1.04 7.735 0.345 0.3429 0.0071
FullGN 14.61 0.90 57.97 1.20 80.90 1.02 8.472 1.331 0.3355 0.0090
FullGN-NoAct 15.50 0.48 57.09 1.31 79.55 1.10 7.761 0.416 0.3410 0.0064
FullGN-Mul 16.21 0.35 58.18 1.45 80.73 0.93 7.463 0.284 0.3498 0.0047
FullGN-MulMlp 16.07 0.56 58.16 1.02 80.59 1.29 7.431 0.283 0.3502 0.0046
GGNN 14.53 0.72 57.33 1.01 80.34 1.25 7.856 0.669 0.3344 0.0073
GGNN-NoAct 15.58 0.53 57.44 1.00 79.62 1.17 7.789 0.349 0.3415 0.0055
GGNN-Mul 15.79 0.63 57.36 1.92 80.13 1.16 7.387 0.303 0.3448 0.0060
GGNN-MulMlp 15.99 0.59 58.17 1.26 80.79 0.98 7.391 0.325 0.3497 0.0052
GAT 14.79 1.17 56.51 2.26 79.20 1.79 16.323 12.684 0.3300 0.0147
GAT-NoAct 15.83 0.39 56.95 1.25 79.28 1.31 7.631 0.309 0.3414 0.0047
GAT-Mul 15.01 0.84 55.95 1.22 78.55 1.62 7.793 0.530 0.3351 0.0071
GAT-MulMlp 16.25 0.57 57.61 1.35 80.60 0.80 7.292 0.190 0.3493 0.0056
Table 7: Comparison results on dataset group LINE-SZ32-STP16-NDRP-STD0.5
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 8.56 1.76 35.81 1.81 51.33 2.47 23.883 2.757 0.2177 0.0103
RW-Dynamic 19.15 3.02 50.69 4.20 67.13 3.58 13.554 2.560 0.3418 0.0266
FullGN 51.71 3.46 87.46 3.93 93.20 4.78 17.588 19.805 0.6665 0.0348
FullGN-NoAct 30.10 6.69 61.67 7.22 76.88 5.48 9.329 2.224 0.4476 0.0597
FullGN-Mul 37.49 1.97 63.60 2.72 76.91 2.31 9.800 1.077 0.4915 0.0212
FullGN-MulMlp 39.91 2.80 66.15 3.01 78.19 2.33 9.377 0.824 0.5195 0.0242
GGNN 51.02 2.10 87.15 3.44 93.51 2.92 12.796 18.578 0.6611 0.0208
GGNN-NoAct 25.14 6.53 54.30 6.68 69.91 4.57 12.509 2.212 0.3918 0.0604
GGNN-Mul 23.62 2.30 53.62 2.64 67.37 3.12 15.002 1.354 0.3776 0.0168
GGNN-MulMlp 34.75 2.93 60.76 4.26 73.72 3.33 12.392 1.525 0.4699 0.0264
GAT 43.19 3.24 73.32 3.67 79.30 3.32 62.170 12.886 0.5566 0.0294
GAT-NoAct 15.77 3.97 50.17 4.34 67.36 3.93 13.988 1.933 0.3221 0.0336
GAT-Mul 20.14 2.10 50.70 2.94 66.65 2.81 15.725 1.922 0.3429 0.0193
GAT-MulMlp 30.64 2.37 58.59 3.91 73.46 3.34 12.022 1.639 0.4390 0.0216
Table 8: Comparison results on dataset group SINE-SZ32-STP16-NDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 9.07 0.60 34.71 1.38 50.39 3.15 19.937 1.535 0.2267 0.0090
RW-Dynamic 13.38 1.06 46.26 1.98 64.54 2.05 12.380 1.032 0.2905 0.0103
FullGN 17.10 0.67 57.91 1.90 76.37 1.58 9.517 1.175 0.3525 0.0069
FullGN-NoAct 16.60 1.13 52.81 1.95 71.91 1.58 9.542 0.790 0.3360 0.0106
FullGN-Mul 16.93 0.99 50.41 1.98 67.74 1.89 12.163 0.526 0.3283 0.0133
FullGN-MulMlp 17.31 0.79 52.74 1.68 70.69 2.67 11.328 1.042 0.3389 0.0072
GGNN 17.45 1.03 57.80 1.59 76.72 2.14 10.011 1.730 0.3555 0.0098
GGNN-NoAct 16.51 1.51 50.63 3.06 69.49 2.48 10.611 1.031 0.3262 0.0163
GGNN-Mul 16.03 1.21 50.61 1.80 69.17 2.34 11.253 0.514 0.3226 0.0156
GGNN-MulMlp 17.31 1.55 53.07 2.75 70.65 1.08 11.500 0.705 0.3370 0.0177
GAT 16.48 0.91 53.47 1.60 71.89 1.70 21.869 12.869 0.3338 0.0070
GAT-NoAct 15.15 1.34 49.65 3.17 68.24 2.86 11.026 1.152 0.3161 0.0153
GAT-Mul 14.85 0.90 48.34 2.27 67.67 2.24 11.681 0.363 0.3070 0.0097
GAT-MulMlp 16.45 1.18 51.23 2.81 68.86 2.24 11.701 0.526 0.3292 0.0169
Table 9: Comparison results on dataset group SINE-SZ32-STP16-NDRP-STD0.5
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 50.06 3.23 84.70 5.23 91.99 3.76 5.997 2.266 0.6625 0.0403
RW-Dynamic 45.94 6.63 85.81 5.03 93.05 2.62 6.036 2.205 0.6320 0.0544
FullGN 25.67 8.66 69.33 10.43 80.63 7.18 18.589 9.211 0.4393 0.0884
FullGN-NoAct 46.47 3.86 84.30 3.88 92.22 2.64 5.797 2.316 0.6337 0.0346
FullGN-Mul 43.44 6.02 82.88 5.66 89.49 4.43 7.395 2.546 0.6029 0.0511
FullGN-MulMlp 50.93 3.21 85.46 4.93 91.03 3.72 7.649 3.921 0.6598 0.0362
GGNN 29.20 7.85 70.59 8.83 79.83 5.75 16.471 6.033 0.4699 0.0742
GGNN-NoAct 49.50 3.91 87.84 4.27 93.48 3.09 5.850 2.498 0.6621 0.0383
GGNN-Mul 45.22 4.67 83.48 5.20 90.46 3.57 8.098 3.410 0.6185 0.0437
GGNN-MulMlp 50.28 3.36 86.41 4.16 91.55 3.27 6.634 2.951 0.6637 0.0342
GAT 18.18 4.96 59.84 6.92 73.31 5.37 21.609 3.249 0.3583 0.0579
GAT-NoAct 46.10 4.34 85.67 4.78 92.54 3.61 5.332 1.829 0.6356 0.0400
GAT-Mul 45.83 2.74 82.99 3.29 89.72 2.19 7.059 1.654 0.6208 0.0232
GAT-MulMlp 47.52 7.39 85.68 4.04 92.97 2.80 6.276 2.278 0.6371 0.0597
Table 10: Comparison results on dataset group LOCATION-SZ32-STP16-NDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 19.40 1.03 64.37 1.68 80.30 1.63 9.569 1.020 0.3860 0.0113
RW-Dynamic 17.91 0.75 62.47 1.60 80.55 1.87 9.379 1.392 0.3722 0.0101
FullGN 16.09 1.51 58.99 4.85 78.30 4.49 16.608 6.832 0.3476 0.0256
FullGN-NoAct 18.83 1.17 63.27 1.71 82.63 1.61 8.140 0.986 0.3816 0.0111
FullGN-Mul 18.50 1.10 62.79 1.96 81.34 1.55 8.928 1.070 0.3787 0.0099
FullGN-MulMlp 19.64 1.02 66.33 1.75 83.87 1.57 8.235 1.025 0.3991 0.0100
GGNN 17.11 1.07 63.48 2.29 81.83 1.95 14.964 3.551 0.3689 0.0100
GGNN-NoAct 19.66 0.81 64.74 1.20 82.97 0.98 8.118 0.762 0.3912 0.0086
GGNN-Mul 17.84 1.11 62.36 1.50 81.60 2.31 9.220 1.276 0.3723 0.0114
GGNN-MulMlp 19.39 0.60 65.81 1.65 83.57 1.40 8.375 0.746 0.3911 0.0061
GAT 14.51 1.17 55.19 2.14 72.87 2.42 28.074 4.756 0.3227 0.0110
GAT-NoAct 17.82 0.96 61.62 1.47 80.28 2.40 9.147 1.037 0.3702 0.0082
GAT-Mul 18.27 0.73 62.42 2.34 80.82 1.26 9.260 1.066 0.3749 0.0093
GAT-MulMlp 18.93 1.18 64.53 1.90 83.06 1.92 8.283 0.988 0.3843 0.0114
Table 11: Comparison results on dataset group LOCATION-SZ32-STP16-NDRP-STD0.5
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 16.44 3.08 47.22 3.67 59.40 4.04 34.573 4.230 0.3215 0.0319
RW-Dynamic 20.41 2.07 55.07 2.87 68.41 2.77 24.968 3.194 0.3656 0.0206
FullGN 16.61 4.79 48.68 12.52 59.51 14.14 63.999 22.762 0.3095 0.0765
FullGN-NoAct 20.80 2.37 57.16 3.94 70.07 4.00 21.744 3.576 0.3729 0.0243
FullGN-Mul 21.35 3.31 52.83 3.56 64.58 3.05 30.538 3.866 0.3618 0.0286
FullGN-MulMlp 23.94 1.75 54.98 3.86 65.76 3.01 30.251 5.933 0.3850 0.0230
GGNN 22.56 2.43 53.52 3.10 62.88 2.97 70.509 11.580 0.3677 0.0179
GGNN-NoAct 21.69 2.26 57.50 4.18 69.54 3.70 24.676 2.777 0.3818 0.0186
GGNN-Mul 23.81 2.21 54.66 2.18 65.62 2.62 29.538 3.694 0.3824 0.0175
GGNN-MulMlp 26.06 1.51 56.04 2.94 66.18 3.54 32.089 5.656 0.4001 0.0173
GAT 12.11 1.71 36.94 3.05 47.97 4.17 94.705 8.713 0.2333 0.0184
GAT-NoAct 23.17 2.02 56.06 3.09 68.00 2.89 27.490 3.953 0.3818 0.0186
GAT-Mul 22.70 2.58 56.12 3.22 67.75 3.05 30.113 3.024 0.3762 0.0188
GAT-MulMlp 20.71 2.98 54.86 2.76 66.01 3.23 29.533 5.556 0.3655 0.0225
Table 12: Comparison results on dataset group HISTORY-SZ32-STP16-NDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
RW-Stationary 11.72 2.02 41.07 3.66 54.65 4.07 34.876 1.670 0.2547 0.0231
RW-Dynamic 12.25 1.39 46.04 3.81 64.03 3.78 20.600 2.735 0.2820 0.0191
FullGN 13.79 1.65 51.04 3.10 69.30 2.87 31.874 4.526 0.3051 0.0207
FullGN-NoAct 13.90 1.29 49.87 2.94 67.39 2.87 15.716 1.772 0.3059 0.0161
FullGN-Mul 12.93 0.90 45.60 2.75 61.88 2.12 21.881 2.319 0.2807 0.0120
FullGN-MulMlp 14.89 0.96 51.20 2.36 68.48 2.88 17.917 2.680 0.3145 0.0103
GGNN 14.83 1.27 53.28 3.22 70.76 2.63 33.821 7.460 0.3217 0.0153
GGNN-NoAct 13.84 1.21 48.51 2.39 65.86 2.34 20.799 1.546 0.2957 0.0126
GGNN-Mul 14.16 1.06 47.27 3.53 63.62 3.58 23.457 2.224 0.2971 0.0168
GGNN-MulMlp 14.80 1.26 49.47 2.74 66.35 3.00 22.230 2.410 0.3053 0.0142
GAT 10.80 1.14 43.44 2.81 60.12 3.54 73.118 7.201 0.2538 0.0105
GAT-NoAct 12.60 1.48 45.84 3.56 62.84 2.95 21.903 1.581 0.2829 0.0179
GAT-Mul 13.49 1.28 46.23 3.15 62.68 2.85 23.887 2.012 0.2885 0.0149
GAT-MulMlp 13.75 1.19 47.93 2.72 64.23 2.23 22.535 1.538 0.2933 0.0132
Table 13: Comparison results on dataset group HISTORY-SZ32-STP16-NDRP-STD0.5
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
LINE FullGN 17.59 1.17 65.27 2.71 85.47 1.96 8.239 1.769 0.3780 0.0141
-NoAct 19.39 1.14 64.42 2.06 83.84 1.67 7.189 0.458 0.3887 0.0132
-Mul 20.20 1.28 65.51 2.77 84.56 2.30 6.260 0.499 0.3985 0.0145
-MulMlp 20.26 1.26 66.35 2.87 84.88 2.40 6.185 0.501 0.3999 0.0161
SINE FullGN 63.08 2.03 90.32 2.38 93.16 2.21 18.052 5.502 0.7464 0.0187
-NoAct 52.87 2.48 72.77 4.75 81.82 3.51 10.496 2.357 0.6193 0.0247
-Mul 58.74 2.14 75.56 4.72 83.33 3.38 7.518 1.472 0.6681 0.0281
-MulMlp 56.94 1.79 76.97 2.38 83.37 2.73 7.710 1.647 0.6606 0.0162
LOCA FullGN 22.74 5.22 68.86 7.39 81.88 4.49 20.447 6.473 0.4213 0.0554
-NoAct 41.18 2.98 82.45 1.74 91.31 1.61 5.390 0.849 0.5881 0.0249
-Mul 37.69 4.07 77.94 4.28 87.52 2.72 8.093 1.841 0.5501 0.0401
-MulMlp 43.39 3.26 84.32 3.50 90.11 3.15 6.249 1.747 0.6081 0.0297
HIST FullGN 18.62 2.96 61.12 3.74 74.94 3.95 32.250 12.497 0.3648 0.0264
-NoAct 27.96 2.04 65.73 3.95 76.76 3.44 13.564 3.033 0.4494 0.0200
-Mul 28.40 3.76 61.72 5.35 72.94 3.07 19.180 3.192 0.4332 0.0338
-MulMlp 29.13 2.91 62.26 3.71 72.31 3.65 18.469 2.971 0.4416 0.0297
Table 14: Comparison results on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ32-STP16-EDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
LINE FullGN 16.87 1.85 63.11 2.29 82.59 1.53 9.296 1.374 0.3669 0.0177
-NoAct 18.75 1.67 62.33 2.36 81.39 2.00 8.564 0.822 0.3764 0.0189
-Mul 19.42 1.79 63.04 2.25 81.63 1.75 7.811 0.684 0.3857 0.0193
-MulMlp 19.63 1.85 63.03 2.29 81.56 1.59 8.029 0.526 0.3867 0.0198
SINE FullGN 19.08 1.11 62.90 1.61 80.16 1.61 18.303 9.662 0.3759 0.0117
-NoAct 20.41 1.61 56.18 2.94 73.19 3.01 11.914 1.874 0.3656 0.0160
-Mul 21.53 1.42 59.52 2.30 75.77 1.73 11.893 1.124 0.3829 0.0132
-MulMlp 22.12 1.21 60.09 2.66 75.74 1.96 12.262 1.532 0.3884 0.0134
LOCA FullGN 15.95 1.08 56.61 2.39 76.61 1.77 15.732 4.803 0.3392 0.0100
-NoAct 18.55 1.54 59.81 2.67 78.41 2.44 9.751 1.260 0.3703 0.0183
-Mul 19.32 1.14 60.34 1.82 79.69 1.92 10.382 1.586 0.3759 0.0136
-MulMlp 19.36 1.21 60.92 2.14 80.31 1.84 9.352 1.572 0.3783 0.0132
HIST FullGN 14.00 1.52 51.60 2.51 70.76 3.17 25.511 6.818 0.3091 0.0145
-NoAct 16.61 1.49 54.18 2.26 70.87 2.95 13.968 1.772 0.3343 0.0149
-Mul 16.97 0.94 50.46 2.38 67.69 1.77 16.933 2.255 0.3252 0.0123
-MulMlp 16.27 1.31 51.63 3.22 68.57 3.77 16.647 2.403 0.3258 0.0158
Table 15: Comparison results on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ32-STP16-EDRP-STD0.5
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
LINE GGNN 12.36 48.34 71.44 15.348 0.2918
GGNN-MulMlp 15.56 55.01 75.34 9.050 0.3335
SINE GGNN 27.08 53.67 76.19 73.990 0.4047
GGNN-MulMlp 39.98 69.10 78.52 12.078 0.5317
LOCATION GGNN 20.98 74.25 92.26 19.753 0.4323
GGNN-MulMlp 47.56 91.17 97.20 3.471 0.6534
HISTORY GGNN 11.47 46.08 71.66 67.990 0.2785
GGNN-MulMlp 23.49 58.55 71.43 36.734 0.3888
Table 16: Comparison results on dataset groups {LINE,SINE,LOCATION,HISTORY}-SZ64-STP32-NDRP-STD0.2
Model Hits@1 () Hits@5 () Hits@10 () MR MRR
LINE GGNN 11.04 46.54 71.39 15.953 0.2759
GGNN-MulMlp 14.53 52.56 72.32 11.489 0.3179
SINE GGNN 10.53 43.77 65.82 36.048 0.2616
GGNN-MulMlp 16.08 49.77 66.53 20.024 0.3137
LOCATION GGNN 9.74 37.14 58.94 42.255 0.2347
GGNN-MulMlp 18.39 62.45 82.35 7.931 0.3782
HISTORY GGNN 8.67 35.42 55.62 65.156 0.2222
GGNN-MulMlp 13.60 47.06 64.77 33.193 0.2902
Table 1