1. Introduction
A graph describes a set of objects and their pairwise relations. Many reallife data such as social networks, transportation networks and ecommerce useritem graphs can naturally be represented in the form of graphs. Recent years have witnessed increasing efforts to generalize neural network models to graphs. These neural network models that operate on graphs are known as graph neural networks (Gori et al., [n. d.]; Scarselli et al., 2009). Graph neural networks have been applied to perform the reasoning of dynamics of physical systems (Battaglia et al., 2016; Chang et al., 2016; SanchezGonzalez et al., 2018)
. Graph convolutional neural networks, which extend the convolutional neural networks to graph structure data, have been shown to improve the performance of graph classification
(Defferrard et al., 2016; Bruna et al., 2013) and nodelevel semisupervised classification (Kipf and Welling, 2016; Hamilton et al., 2017). A general framework of graph neural network is proposed in (Battaglia et al., 2018).Most of these aforementioned neural network models have been designed for static graphs. Graphs in many realworld applications are inherently dynamic. For example, new users will join a social network and users in the social network will create new relations, users in ecommerce platform continue interacting with new items, and new connections are established in a communication network over time. To apply existing graph neural network models to dynamic graphs, we need to completely ignore their evolving structures by treating them as static graphs. However, the dynamic information has been proven to boost a variety of graph analytic tasks such as community detection (Lin et al., 2008), link prediction (Goyal et al., 2018; Li et al., 2018) and network embedding (Goyal et al., 2018; Li et al., 2018). Therefore, it has great potential to advance graph neural networks by considering the dynamic nature of graphs, which calls for dedicated efforts.
Meanwhile, designing graph neural networks for dynamic graphs faces tremendous challenges. From the global perspective, structures of dynamic graphs continue evolving since new nodes and edges are constantly introduced. It is necessary to capture the evolving structures for graph neural networks. From the local perspective, a node can keep establishing new edges, the establishing order of these edges is important to understand the node properties. For example, in the ecommerce useritem graph, new interactions are more likely to represent the users’ latest preferences. Moreover, the introduction of a new edge (interaction) would affect the properties of the node. It is necessary to keep the node information updated once a new interaction happened. In addition, these edges are unevenly introduced, i.e., the distribution of these edges in the timeline is uneven. For example, a user in social networks could create edges very frequently in certain periods while only establishing a few edges in others. The time intervals between interactions for a specific node can vary dramatically. It is important to consider these time intervals and the major reasons are twofold. First, the time interval between interactions of specific node can impact our strategy to update the node information. For example, if a new interaction is distant from its previous interaction, we should focus more on the new interaction since the node properties could change. Second, a new interaction can not only affect the two nodes directly involved in the interaction, but also can influence other nodes that are “close” to the two nodes; and the time interval can impact our strategy to propagate the interaction information to the influenced nodes. For example, if the new interaction is distant from the latest interaction between the node and an influenced node, the effect of the new interaction on the influenced node could be little.
In this paper, we embrace the opportunities and challenges to study graph neural networks for dynamic graphs. In essence, we aim to answer the following questions – 1) how to constantly keep the node information updated when new interactions happen; 2) how to propagate the interaction information to the influenced nodes; and 3) how to incorporate time interval between interactions during update and propagation. We propose a dynamic graph neural networks (DGNN) to answer aforementioned three questions simultaneously. Our contributions can be summarized as follows:

We provide a principled approach for the node information update and propagation when new edges are introduced;

We propose a novel graph neural network for dynamic graphs (DGNN), which models establishing orders and time intervals of edges into a coherent framework; and

We demonstrate the effectiveness of the proposed model with several graph related tasks on various realworld dynamic graphs.
The rest of this paper is organized as follows. In Section 2, we introduce the proposed framework with details about its update and propagation components and the approaches to learn model parameters. In Section 3, we present experimental results in two graph mining tasks including link prediction and node classification. We review related work in Section 4 and finally we conclude our work with future work in Section 5.
2. The proposed framework
In this section, we introduce the graph neural network framework designed for dynamic networks. We first provide an overview about the model and then describe the components of the framework in details. Before that, we first introduce notations and definitions we will use in this work.
A dynamic graph consists of a set of nodes and we assume that there are nodes introduced until our latest observation about the graph. The graph evolves when new edges and nodes emerge. An example of a dynamic graph is shown in the left side of Figure 1, where there are nodes and interactions (edges) emerge from time to . Note that, in this work, we only consider the emerging of new edges and nodes while leaving the deletion of existing edges and nodes as one future work. A directed edge can be represented as describing an interaction from to at time . For example, the interaction happened at in Figure 1 can be denoted as . For convenience, we call the two nodes involved in the interaction as the “interacting nodes”. As mentioned before, a new interaction can not only affect the two interacting nodes but also can influence other nodes that are “close” to the interacting nodes, which we call as the “influenced nodes”. Thus, we need to update the information of this new interaction to the two interacting nodes and also propagate this information to the “influenced nodes”.
To achieve this goal, a dynamic graph neural network (DGNN) is introduced and an overview about DGNN framework is demonstrated in Figure 1, which consists of two major components: 1) the update component and 2) the propagation component. We briefly describe the operations of the two components when introducing a new interaction . The update component involves node , node and updates the interaction information to both of them. For example, in Figure 1, a new interaction happened at , the two interacting nodes being involved in the update component are and . The propagate component involves the two interacting nodes and the “influenced nodes” as it propagates the information of the interaction to the “influenced nodes”. The “influenced nodes” can be defined in different ways, which we will discuss in later subsections. In Figure 1, we define the “influenced nodes” as all the nodes that have interacted with the two “interacting nodes”, which includes —the 1hop “neighors” of , and — the 1hop “neighbors” of . Next, we detail each component.
2.1. The update component
In this subsection, we discuss the update component for the interacting nodes. We first give an overview of the operations of the update component with the focus on a single node of the dynamic graph illustrated in the left of Figure 1). There are three interactions involving node , , and . It is natural that interactions between nodes will affect the properties of the nodes. For example, as suggested by homophily, users with similar interests are likely to create connections in social networks (McPherson et al., 2001). Thus, the update components should update the interaction information to the two interacting nodes. As shown in Figure 2, there are three update components, processing the three interactions involving node for node . Each of the update components takes an interaction as input and update the interaction information to node . Note that we only show the update component for node in Figure 2, while there is also another update component to update the interaction information to the other interacting node for each interaction. Furthermore, the order of the interactions is also important to understand the nodes’ property. For example, in the ecommerce useritem graph, user’s latest preference can be better captured by the recent interactions than the old ones. Thus, it is important to capture the order information. It is natural to view the interactions (involving the same node) as a “sequence” and recurrently apply the update component to the interactions. Note that, although the interactions can be viewed as a “sequence”, we do not need to store all the information of this “sequence”. We only store most recent information of the nodes. As shown in Figure 2
, the three update components are connected in the sense that the next component takes the output of the previous component as input. Hence, we model the update component based on the longshort term memory (LSTM) unit
(Hochreiter and Schmidhuber, 1997). As discussed before, the time interval information is also important, thus, we also incorporate it into the update component. As shown in Figure 2, a single update component consists of three units – the interact unit, the update unit and the merge unit. Next we describe these three units in details.Before proceeding to the details of the units, we first introduce the information we store for each node . Note that in a directed graph, a node could play the roles of both source node and target node. Thus, we introduce two sets of different cell memories and hidden states for the two roles of each node, respectively. The cell memory and hidden state for the source role of node right before time are denoted as and respectively, while the cell memory and hidden state for the target role of node right before time are denoted as and separately. Here, the notation means the time that is infinitely close to , but prior to , such that all the interactions before time have been processed. For example, in Figure 2, at time , for node , is, in fact, equal to . Note that we do not consider the propagation component now, for the purpose of illustrating the update component. The source and target hidden states are merged with the merge unit, to generate the general features of the node , which describes the general property of node . These cell memories , , hidden states , and general features are the information stored for each node and needed to be updated when new interaction happens. For example, Figure 3 shows the operations of two update components performing update for node and when the interaction happens. The information stored for the two nodes right before time is shown in Figure 2 (a).
2.1.1. The interact unit
The interact unit is designed to generate the interaction information for
from node information. The generated interaction information is later used as the input of the update unit. We model the interact unit using a deep feedforward neural network and the formulation is as follows:
(1) 
where and are the general features of the nodes and right before time . , and are the parameters of the neural network and
is an activation function such as sigmoid or tanh. The output
contains the information of the interaction . As an example, Figure 3 (b) shows how the interact unit works for the interaction .2.1.2. The update unit
As mentioned before, the interactions (involving the same node) can be viewed as a “sequence”. The information of this node gradually evolves as these interactions happens sequentially. Thus, to capture these interaction information for this node, we recurrently apply the update component to process the interaction information. The update unit is the part performing the operation to update the interaction information generated from the interact unit to the interacting nodes. Recall that the interactions do not emerge evenly in time. The time interval between interactions involving the same node can vary dramatically. The time interval impacts how the old information should be forgotten. It is intuitive that interactions happened in the far past should have less influence on the current information of node, thus they should be “heavily” forgotten. On the other hand, recent interactions should have more importance on the current information of node. Thus, it is desired to incorporate the time interval into the update component. Hence, to build the update unit, we modify the LSTM unit as similar in (Baytas et al., 2017) to incorporate the time interval information to control the “magtitude” of forgetting.
An update unit is shown in Figure 4, the input of this unit includes the most recent cell memory , hidden states , the time interval and the interaction information calculated by the interact unit. The output of the update unit are the updated cell memory and hidden state . Note that, for illustration purpose, we do not differentiate the source and target cell memory and hidden states in Figure 4. In practice, we have two types of update units, the SUpdate unit and the GUpdate unit, which share the same structure but have different parameters. For an interaction , we use the SUpdate unit to update the information for the source node and use the GUpdate unit to update the information for the target node . The update unit is based on an LSTM unit, the only difference between the update unit and a standard LSTM unit is in the blue dashed box part of Figure 4. The corresponding formulations for this part are as follows
(2)  
(3)  
(4)  
(5) 
In this part, the old cell memory is adjusted according to the time interval to generate the adjusted old cell memory . It is first decomposed to two components, the short term memory and the long term memory , where is generated by a neural network and the long term memory . The long term memory is kept untouched while the short term memory is discounted (forgotten) according to the time interval between the events with a discount function . The discount function is a decreasing function, which means the larger the time interval is, the less the short term memory is kept. Hence, we use this to model how we should forget the old information in our model. The discounted short term memory and the long term memory are then combined to generate the adjusted old cell memory , which can be regarded as the output of the dashed box being input to the standard LSTM unit (the rest part of the update unit). The decomposition and recombination ensure that not the entire information of the old cell memory is lost during this procedure. The formulations of the rest part of the update unit, which are the same as a standard LSTM unit, are as follows
(6)  
(7)  
(8)  
(9)  
(10)  
(11) 
For convenience, we summarize the procedure of the update unit in Figure 4 (eq. (2) to eq. (11)) as
(12) 
Examples of the operations of the update units are shown in Figure 3 (c), where, for interaction , we use the SUpdate unit to update the information for the source node and use the GUpdate unit to update the information for the target node . Note that the SUpdate unit only updates the source information (cell memory and hidden state) of the source node but keeps the target information of the source node untouched. Similarly, the GUpdate unit only updates the target information of the target node but keeps the source information untouched. In Figure 2, for the convenience of illustration, we “abuse” the output of the SUpdate unit and GUpdate unit a little bit by considering the untouched target information of source node as part of the output of the SUpdate unit and the untouched source information of target node as part of the output of the GUpdate unit.
2.1.3. The merge unit
The merge unit is to combine the source hidden state and target hidden state of a given node to generate the general features for this node. As we mentioned in last subsection, given an interaction , the SUpdate unit only updates the source information of the source node and the Gupdate only updates the target information of the target node . Hence, for node , we have and as the output of the SUpdate unit. The merge unit takes these two hidden states as input and generates new general features for the node as follows:
(13) 
Similarly, the merge unit generates the new general features for node as follows:
(14) 
The two merge units to generate new general features for node and after the interaction are shown in Figure 3 (d).
Finally, the output of the update component is the updated information of the interacting nodes. For the source node of the interaction , the updated information includes , and . For the target node , the updated information includes and . The operations of the two update components for the interaction are shown in Figure 3.
2.2. The propagation component
In the previous section, we introduced the component to update the two interacting nodes when a new interaction happens. The update component only considers the two nodes directly affected by the new interaction. However, the newly emerging interaction changes the existing local structure of the graph. Thus, the interaction can influence some other nodes. In this work, we choose the current neighbors of the two “interacting nodes” as the “influenced nodes”. The major reasons are threefold. First, as informed in mining streaming graphs, the impact of a new edge on the whole graph is often local (Chang et al., 2017). Second, after we propagate information to the neighbors, the information will be further propagated, once the influenced nodes have interactions with other nodes. Third, we empirically found that when propagating more hops, the performance does not increase significantly or even decreases since we may also introduce noise during the propagation. To update the influenced nodes, the interaction information should be propagated to their cell memories. As the interaction does not directly influence the influenced nodes, we assume that the interaction does not disturb the history of the influenced nodes but only bring about new information. Thus, we do not need to decay or decrease the history information (cell memory) as what we do in the update component but only incrementally add new information to it. As similar with the intuition that older interactions should have less impact on the recent node information, an interaction should have less impact on the older influenced nodes. Thus, it is also desired to consider the time interval of the interactions in the propagation component. In addition, the influence can vary due to varied tie strengths (e.g., strong and weak ties are mixed together) (Xiang et al., 2010). Nodes are likely to influence others with strong ties than weak ties. Therefore, it is important to consider heterogeneous influence. With these intuitions, next we illustrate the operations of the propagation component.
The propagation component consists of three units – the interact unit, the prop unit and the merge unit. Note that the interact unit and the merge unit are the same as the ones in the update component. So, we mainly introduce the prop unit.
Let be the newly happened interaction, where is the source node and is the target node. The influenced nodes are the neighbors of these two nodes until time , which can be denoted as and . In a directed graph, we can further decompose the two sets of neighbors as and , where denotes the set of source neighbors and denotes the set of target neighbors. Note that there are, in total, types of different prop units with the same structure but different parameters. They are the prop unit to propagate interaction information 1) from the source node to its source neighbors ; 2) from the source node to its target neighbors ; 3) from the target node to its source neighbors ; and 4) from the target node to its target neighbors . We only describe one of them, from source node to its source neighbors, as the others have the same structure. For each node , we propagate the interaction information to them with the following formulations:
(15)  
(16) 
where is the time interval between the current time and the last time when the node interacted with node . is the same decay function as we defined for the update component. Intuitively, propagating the interaction information to “extremely old neighbors” may introduce noise. Hence, we introduce a function to filter some “influenced nodes” as defined as follows:
where is a predefined threshold. This means if the time interval is too large ( ), we will stop propagating information to such neighbors. One advantage of this operation is to make the propagation step much more efficient. We will demonstrate more details about this filtering step in the experiment section.
is a linear transformation to project the interaction information for propagating to source neighbors. We have different transformation matrix for the other types of prop units. The function
is an attention function to capture the tie strength between nodes and defined as:(17) 
Figure 5 illustrates an example of propagating the interaction information to the source neighbor of the source node when an interaction happens. The prop unit is shown in Figure 5 (c). Note that, for compactness of Figure 5, we do not include the attention mechanism in it. The interact unit is shown in Figure 5 (b), and the merge unit is shown in Figure 5 (d).
2.3. Parameter learning
In this section, we introduce the parameter learning procedure of the dynamic graph neural network model. The proposed framework DGNN is general and can be utilized for a variety of network analytic tasks. Next we will use link prediction and node classification as examples to illustrate how to use DGNN for network analysis and its corresponding algorithm for parameter learning.
2.3.1. Parameter learning for link prediction
To train the dynamic graph neural network model for the link prediction task, we design a specific training schedule. In DGNN, we only have one set of general features for each node, while each node can be either source node or target node. Thus, for the link prediction task, we first project the general features of the two interacting nodes to the corresponding role in the interaction with two projection matrix and
. We then adapt a widely used graphbased loss function with temporal information. For an interaction
, we project the most recent general features , to and respectively as follows:Then the probability of an interaction from
to is modeled as where is the sigmod function. Eventually the loss can be represented as(18) 
where is the number of negative samples and is a negative sampling distribution. The total loss until time can be represented as
(19) 
where denotes all the interactions until time .
We then adopt a minibatch gradient descent method to optimize the loss function. Note that in our case, the minibatches of edges are not randomly sampled from the entire set of edges but sequences from the interaction sequence maintaining the temporal order. The loss of the minibatch is calculated from all the interactions in the minibatch. The negative sampling distribution
is a uniform distribution out of all the nodes involved in the minibatch, which includes the interacting nodes and the influenced nodes of each interaction.
2.3.2. Learning parameters for node classification
To train the dynamic graph neural network model for node classification, we adopt the cross entropy loss. For a node with general features and label immediately after time , where is the number of classes, we first project to . Then, the loss corresponding for the node at time is defined as
where and denote the ith element of and , respectively.
The training schedule is semisupervised, only some nodes are labeled but the unlabeled nodes are also involved in the update and propagation component of the dynamic graph neural network model. We adopt a similar minibatch (of edges) procedure as that in link prediction. Let be the end time of the minibatch, i.e. the time of the last interaction in the minibatch. After the minibatch of interactions is processed by the update and propagation components of DGNN, we collect all the nodes involved in the minibatch and denote them as . Let denote the set of all the nodes with labels. We then use as the training samples of this minibatch. We form the loss function for this minibatch as
(20) 
3. Experiments
In this section, we perform two graph based tasks to demonstrate the effectiveness of the proposed dynamic graph network model. We first introduce three datasets we use in the experiments, then present the two tasks—link prediction and node classification—with experimental details and discussions and finally study the key model components of the proposed framework.
3.1. Datasets
We conduct the experiments on the following three datasets. Some important statics of the three datasets can be found in Table 1.
UCI  DNC  Epinions  
number of nodes  1,899  2,029  6,224 
number of edges  59,835  39,264  19496 
time duration  194 days  982 days  936 days 
number of labels  15 

UCI (Kunegis, 2013) is a directed graph which denotes the message communications between the users of an online community of students from the University of California, Irvine. A node in this graph represents a user. There are edges between users if they have message communications where the time associated with each edge indicates when they communicated. In this dataset, the graph structure and edge creation time are available; hence we use this dataset to evaluate link prediction performance.

DNC (Kunegis, 2013) is a directed graph of email communications in the 2016 Democratic National Committee email leak. Nodes in this graph represents persons. A directed edge in this graph represents that an email is sent from one person to another. In this dataset, the graph structure and the time information when edges are established are available; thus we use this dataset for the link prediction task.

Epinions (Tang et al., 2012) is a directed graph which denotes trust relations between users in the product review platform Epinions. A node in this graph represents a user. A directed edge represents a trust relation among users. We have labels in this dataset. The label of each user is assigned according to the category of the majority of the user’s reviewed products. In this dataset, we have graph structure, edge creation time and node labels; therefore, we use this dataset for both link prediction and node classification tasks.
3.2. Link prediction
In this section, we conduct the link prediction experiments to evaluate the performance of the DGNN model. We first introduce the baselines. We then describe the experimental setting of the link prediction task and the evaluation metrics. Finally, we present the experimental results with discussions.
3.2.1. Baselines
We carefully choose representative baselines from two groups. One group includes existing neural graph network models. The other contains state of art graph representation learning methods given their promising performance in link prediction. Details about baselines are introduced as follows:

GCN (Kipf and Welling, 2016) is a state of art graph convolutional network model, it tries to learn better node features by aggregating information from the node’s neighbors. The method cannot use temporal information; thus we treat the dynamic graphs as static graphs for this method by ignoring the edge creation time information.

GraphSage (Hamilton et al., 2017) also aggregates information from neighbors, but it samples neighbors instead of using all neighbors. It cannot use temporal information neither; thus we treat the dynamic graphs as static graphs for this method similar to GCN.

node2vec (Grover and Leskovec, 2016) is a state of art graph representation learning method. It utilizes random walk to capture the proximity in the network and maps all the nodes into a lowdimensional representation space which preserves the proximity. It cannot utilize the temporal information and we convert the dynamic graphs into static graphs for node2vec.

DynGEM (Goyal et al., 2018) is a graph representation learning method designed for dynamic graphs. However, it can only be applied to discrete time data with snapshots, thus in our experiments, we split each dataset into snapshots for this baseline.

CPTM (Dunlavy et al., 2011)
is a tensorbased model. It treats the dynamic graph as
dimension tensor, where two dimensions describe the interactions of nodes and the third dimension is time. It decomposes the tensor to get the features of the nodes. It can only be applied to discrete time data with snapshots. Hence, as for DyGEM, we split each dataset into snapshots. 
DANE (Li et al., 2017)
is a recent proposed eigendocompoation based node representation learning algorithm for attributed dynamic graphs. It updates the node representations over time by perturbation analysis of eigenvectors. It can only be applied to discrete time data with snapshots. Hence, as for DyGEM, we split each dataset into snapshots. Note that since the focus in this work is not attributed networks, we use a variant of
DANE, which only considers the structural information of dynamic networks. 
DynamicTriad (Zhou et al., 2018) is a recent proposed node representation learning algorithm for dynamic graphs. As suggested by its name, it is based on modelling the triangle closure between snapshots of the dynamic graphs. It can only be applied to discrete time data with snapshots. Hence, as for DyGEM, we split each dataset into snapshots.
As we can see, our baselines include representative graph neural network models, i.e., GCN and GraphSage, one state of the art static node embedding method node2vec, three recent dynamic network embedding methods DynGEM, DANE, DynamicTriad and one traditional dynamic network embedding method CPTM.
3.2.2. Experimental setting
In the link prediction task, we are given a fraction of interactions in the graph as the history and supposed to predict which new edges will emerge in the future. In this experiment, we use the first of the edges as the history (training set) to train the dynamic graph neural network model, of the edges as the validation set and the next edges as the testing set. All the baselines and our model return node features after training. We use the node features learned with the training set as the node features for link prediction. For each edge in the testing set, we first fix and replace
with all nodes in the graph and then we use the cosine similarity to measure the similarity and rank the nodes. We then fix
and replace with all the nodes in the graph and rank the nodes in a similar way. For all the models we tune the parameters on the validation set. For our model, to calculate the cosine similarity, we use the projected features for UCI and DNC dataset and use the original features for Epinions dataset according to the performance on the validation dataset. In the link prediction task, we randomly initialize the cell memories, hidden states and general features for all nodes.3.2.3. Evaluation metrics
We use two different metrics to evaluate the performance of the link prediction task. One of them is mean reciprocal rank (MRR) (Voorhees et al., 1999), which is defined as
(21) 
where is the number of testing pairs. Note that one edge is corresponding to two testing pairs: one for the source node and the other one for the target node. is the rank of the ground truth node out of all the nodes. The MRR metric calculates the mean of the reciprocal ranking of the ground truth nodes in the testing set. It is higher when there are more ground truth node ranked top out of all the nodes.
The other metric we use is Recall@k, which is defined as:
(22) 
where only when , otherwise . The recall@k calculates how many of the ground truth nodes are ranked in top out of all the nodes in their own testing pairs. The larger it is, the better the performance is. In this work, we use Recall@20 and Recall@50.
3.2.4. Experimental results
In this section, we present the experimental results. The link prediction results on the three datasets are shown in Table 2. From results, we can make the following observations

DANE does not perform well as expected since it has been originally designed for attributed networks.

DynGEM and DynamicTriad outperforms node2vec in most cases. All the three methods are embedding algorithms – node2vec is for static networks while DynGEM and DynamicTriad capture dynamics. These results suggest the importance of the dynamic information in graphs.

The proposed dynamic graph neural network model outperforms two representative existing GNNs, i.e., GCN and GraphSage. Our model is for dynamic networks while GCN and GraphSage ignore the dynamic information, which further support the importance to capture dynamics.

The proposed model DGNN outperforms all the baselines in most of the cases on all the three datasets. DGNN provides model components to capture time interval, propagation and tie strength. In the following subsections, we will study the impact of these model components on the performance of the proposed framework.
Baselines 
UCI  DNC  Epinions  

MRR  Recall@20  Recall@50  MRR  Recall@20  Recall@50  MRR  Recall@20  Recall@50  
DGNN  0.0342  0.1284  0.2547  0.0536  0.1852  0.3884  0.0204  0.0848  0.1894 
GCN  0.0138  0.0632  0.1176  0.0447  0.2032  0.3291  0.0045  0.0071  0.0119 
GraphSage  0.0060  0.0161  0.0578  0.0167  0.0576  0.1781  0.0035  0.0072  0.0108 
node2vec  0.0056  0.0184  0.0309  0.0202  0.0719  0.178  0.0135  0.0571  0.1240 
DynGEM  0.0146  0.0773  0.1455  0.0271  0.0971  0.2356  0.0150  0.0657  0.1233 
CPTM  0.0138  0.0921  0.1082  0.0109  0.0072  0.0108  0.0036  0.0060  0.0125 
DANE  0.0040  0.0110  0.0233  0.0128  0.0270  0.0432  0.0040  0.0100  0.0120 
DynamicTriad  0.0150  0.0610  0.1236  0.0146  0.0414  0.0665  0.0170  0.0729  0.1629 
3.3. Node classification
In this subsection, we conduct the node classification task to evaluate the performance of the dynamic graph neural network model. We first introduce the baselines. We then describe the experimental setting and the evaluation metrics. Finally, we present the experimental results.
3.3.1. Baselines
The node classification task is a semisupervised learning task, where some nodes are labeled and we aim to infer the labels of unlabeled nodes in the graph. Therefore, we carefully choose two groups of baselines. One is about GNNs for semisupervised learning including GCN and GraphSage. The other is traditional semisupervised learning methods and we choose a startoftheart traditional semisupervised method LP based on Label Propagation
(Zhu et al., 2003). Note that for a fair comparison, we do not choose node embedding algorithms such as node2vec as baselines since they are designed under the unsupervised setting.3.3.2. Experimental setting
In the node classification task, we randomly sample a fraction of nodes and hide their labels. These nodes with labels hidden will be treated as validation and testing sets. The remaining nodes are treated as the training set. In this work, we randomly sample of all the nodes and hide their labels. We use of them as validation set and the other as testing set. For the rest of nodes with labels, we choose as labeled nodes and others as unlabeled nodes. In this experiment, we vary as . We use micro and macro as the metrics to measure the performance of the node classification task.
3.3.3. Experimental results
Among three datasets, only Epinions dataset has label information. Hence we conduct the node classification task on it and the results are presented in Figure 6. We can make the following observations:

With the increase of the number of labeled nodes, the classification performance tends to increase.

GraphSage, GCN and DGNN outperforms LP in all settings, which indicates the power of GNNs in semisupervised learning.

DGNN outperforms GraphSage and GCN under all the three settings, which shows the importance of temporal information in node classification.
3.4. Model Component analysis
In the last two sections, we have demonstrated the effectiveness of the proposed framework in two graph mining tasks – link prediction and node classification. In this subsection, we conduct experiments to understand the effect of the key components on our proposed model. More specifically, we form the following variants of our model by removing some components in the model:

DGNNprop: In this variant, we remove the entire propagation component from the model. This variant only does the update procedure when new edge emerges.

DGNNti: In this variant, we do not use the time interval information in both update component and propagation component. Thus, we treat the interactions as a sequence with no temporal information.

DGNNatt: In this variant, we remove the attention mechanism in the propagation component and consider equal influence.
We will use the task of link prediction to illustrate the impact of model components. The performance of these variants on link prediction task are shown in Table 3. As we can observe from the results, all the three components are important to our model, as removing them will reduce the performance of link prediction. Via this study, we can conclude that (1) it is necessary to propagate interaction information to influenced nodes; (2) it is important to consider the time interval information; and (3) capturing varied influence can improve the performance.
Baselines 
UCI  DNC  Epinions  

MRR  Recall@20  Recall@50  MRR  Recall@20  Recall@50  MRR  Recall@20  Recall@50  
DGNN  0.0342  0.1284  0.2547  0.0536  0.185  0.3884  0.0204  0.0848  0.1894 
DGNNprop  0.0103  0.0444  0.1087  0.0046  0  0  0.0171  0.0633  0.1514 
DGNNti  0.0174  0.0918  0.2118  0.0050  0  0.0054  0.0157  0.0591  0.1589 
DGNNatt  0.0200  0.0844  0.2235  0.0562  0.1547  0.3219  0.0177  0.0651  0.1655 
3.5. Parameter Analysis
The proposed framework introduces one parameter in the propagation component to filter some “influenced nodes”. In this subsection, we analyze how different values of in the propagation component affect the performance of the DGNN model. We perform the analysis for the link prediction task on the UCI dataset with the measure since we have similar observations with other settings and on other datasets.
As shown in Table 1, the duration of this dataset is days and we set the threshold to days, and days with a step size of . The performance in terms of is shown in Figure 7. The performance of DGNN first increases as the threshold gets larger. A large allows the interaction information to be propagated to more influenced nodes. After hits , the performance becomes stable or even slightly decreases. These observations suggest that 1) the propagation procedure does help to broadcast necessary information to the “influenced nodes” as the performance first gets improved when the threshold increases; and 2) propagating the interaction information to “extremely old neighbors” may not be helpful or even may bring noise. These observations have practical significance since we can choose a proper in the propagation component, which can remarkably boost the efficiency of the proposed framework since we only need to perform the propagation with a small number of “influenced nodes”.
4. Related work
In this section, we briefly review two streams of research related to our work: graph neural networks and dynamic graph analysis.
In recent years, many efforts have been made to extend deep neural network models to graph structured data. These neural network models that are applied to graphs are known as graph neural network models (Gori et al., [n. d.]; Scarselli et al., 2009). They have been applied to various tasks in many areas. Various graph neural network models have been designed to reason dynamics of physical systems where previous states of the nodes are given as history to predict future states of the nodes (Battaglia et al., 2016; Chang et al., 2016; SanchezGonzalez et al., 2018). Neural message passing networks have been designed to predict the properties of molecules (Gilmer et al., 2017). Graph convolutional neural networks, which try to perform convolution operations on graph structure data, have been shown to advance many tasks such as graph classification (Defferrard et al., 2016; Bruna et al., 2013), node classification (Kipf and Welling, 2016; Hamilton et al., 2017; Veličković et al., 2017) and recommendation (Ying et al., 2018). A comprehensive survey on graph convolutional neural networks can be found in (Bronstein et al., 2017). A general framework of graph neural networks was proposed in (Battaglia et al., 2018) recently.
Most of the current graph neural network models are designed for static graphs where nodes and edges are fixed. However, many realworld graphs are evolving. For example, social networks are naturally evolving as new nodes joining the graph and new edges being created. It has been of great interest to study the properties of dynamic graphs (Holme and Saramäki, 2012; Harary and Gupta, 1997; Casteigts et al., 2012; Zhang et al., 2017). Many graphbased tasks such as community detection (Lin et al., 2008), link prediction (Goyal et al., 2018; Li et al., 2018), node classification (Jian et al., 2018)
, knowledge graph mining
(Trivedi et al., 2017) and network embedding (Li et al., 2017; Zhou et al., 2018; Ma et al., 2018) haven been shown to be facilitated by including and modeling the temporal information in dynamic graphs. In this work, we propose a dynamic graph neural network model, which incorporates and models the temporal information in the dynamic graphs.5. Conclusion
In this paper, we propose a novel graph neural graph DGNN for dynamic graphs. It provides two key components – the update component and the propagation component. When a new edge is introduced, the update component can keep node information being updated by capturing the creation sequential information of edges and the time intervals between interactions. The propagation component will propagate new interaction information to the influenced nodes by considering influence strengths. We use link prediction and node classification as examples to illustrate how to leverage DGCN to advance graph mining tasks. We conduct experiments on three realworld dynamic graphs and the experimental results in terms of link prediction and node classification suggest the important of dynamic information and the effectiveness of the proposed update and propagation components in capturing dynamic information.
In the current model, we choose one’s neighbors as the set of influenced nodes. Though that choice is reasonable and it works well in practice, we would like to provide some theoretical analysis about this choice and also investigate alternative approaches. Now we illustrate how to use the proposed framework for link prediction and node classification. We also want to investigate how to use the framework for other graph mining tasks especially these under the unsupervised settings such as community detection.
References
 (1)
 Battaglia et al. (2016) Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 2016. Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems. 4502–4510.
 Battaglia et al. (2018) Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro SanchezGonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).
 Baytas et al. (2017) Inci M Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K Jain, and Jiayu Zhou. 2017. Patient subtyping via timeaware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 65–74.

Bronstein et al. (2017)
Michael M Bronstein, Joan
Bruna, Yann LeCun, Arthur Szlam, and
Pierre Vandergheynst. 2017.
Geometric deep learning: going beyond euclidean data.
IEEE Signal Processing Magazine 34, 4 (2017), 18–42.  Bruna et al. (2013) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).
 Casteigts et al. (2012) Arnaud Casteigts, Paola Flocchini, Walter Quattrociocchi, and Nicola Santoro. 2012. Timevarying graphs and dynamic networks. International Journal of Parallel, Emergent and Distributed Systems 27, 5 (2012), 387–408.
 Chang et al. (2016) Michael B Chang, Tomer Ullman, Antonio Torralba, and Joshua B Tenenbaum. 2016. A compositional objectbased approach to learning physical dynamics. arXiv preprint arXiv:1612.00341 (2016).
 Chang et al. (2017) Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A HasegawaJohnson, and Thomas S Huang. 2017. Streaming recommender systems. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 381–389.
 Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
 Dunlavy et al. (2011) Daniel M Dunlavy, Tamara G Kolda, and Evrim Acar. 2011. Temporal link prediction using matrix and tensor factorizations. ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 2 (2011), 10.
 Gilmer et al. (2017) Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212 (2017).
 Gori et al. ([n. d.]) Marco Gori, Gabriele Monfardini, and Franco Scarselli. [n. d.]. A new model for learning in graph domains. In Neural Networks, 2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Conference on, Vol. 2. IEEE, 729–734.
 Goyal et al. (2018) Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. 2018. DynGEM: Deep Embedding Method for Dynamic Graphs. arXiv preprint arXiv:1805.11273 (2018).
 Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855–864.
 Hamilton et al. (2017) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034.
 Harary and Gupta (1997) Frank Harary and Gopal Gupta. 1997. Dynamic graph models. Mathematical and Computer Modelling 25, 7 (1997), 79–87.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Holme and Saramäki (2012) Petter Holme and Jari Saramäki. 2012. Temporal networks. Physics reports 519, 3 (2012), 97–125.
 Jian et al. (2018) Ling Jian, Jundong Li, and Huan Liu. 2018. Toward online node classification on streaming networks. Data Mining and Knowledge Discovery 32, 1 (2018), 231–257.
 Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
 Kunegis (2013) Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1343–1350.
 Li et al. (2018) Jundong Li, Kewei Cheng, Liang Wu, and Huan Liu. 2018. Streaming link prediction on dynamic attributed networks. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 369–377.
 Li et al. (2017) Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. 2017. Attributed network embedding for learning in a dynamic environment. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 387–396.
 Lin et al. (2008) YuRu Lin, Yun Chi, Shenghuo Zhu, Hari Sundaram, and Belle L Tseng. 2008. Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In Proceedings of the 17th international conference on World Wide Web. ACM, 685–694.
 Ma et al. (2018) Jianxin Ma, Peng Cui, and Wenwu Zhu. 2018. DepthLGP: Learning Embeddings of OutofSample Nodes in Dynamic Networks. AAAI.
 McPherson et al. (2001) Miller McPherson, Lynn SmithLovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
 SanchezGonzalez et al. (2018) Alvaro SanchezGonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. 2018. Graph networks as learnable physics engines for inference and control. arXiv preprint arXiv:1806.01242 (2018).
 Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2009), 61–80.
 Tang et al. (2012) J. Tang, H. Gao, and H. Liu. 2012. mTrust: Discerning multifaceted trust in a connected world. In Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 93–102.
 Trivedi et al. (2017) Rakshit Trivedi, Hanjun Dai, Yichen Wang, and Le Song. 2017. Knowevolve: Deep temporal reasoning for dynamic knowledge graphs. arXiv preprint arXiv:1705.05742 (2017).
 Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph Attention Networks. arXiv preprint arXiv:1710.10903 (2017).
 Voorhees et al. (1999) Ellen M Voorhees et al. 1999. The TREC8 Question Answering Track Report.. In Trec, Vol. 99. 77–82.
 Xiang et al. (2010) Rongjing Xiang, Jennifer Neville, and Monica Rogati. 2010. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web. ACM, 981–990.
 Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for WebScale Recommender Systems. arXiv preprint arXiv:1806.01973 (2018).
 Zhang et al. (2017) Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu. 2017. TIMERS: ErrorBounded SVD Restart on Dynamic Networks. arXiv preprint arXiv:1711.09541 (2017).
 Zhou et al. (2018) Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang. 2018. Dynamic Network Embedding by Modeling Triadic Closure Process.

Zhu
et al. (2003)
Xiaojin Zhu, Zoubin
Ghahramani, and John D Lafferty.
2003.
Semisupervised learning using gaussian fields and
harmonic functions. In
Proceedings of the 20th International conference on Machine learning (ICML03)
. 912–919.