1. Introduction
Network embedding has shed a light on network analysis due to its capability of encoding the structures and properties of networks with latent representations (Cui et al., 2018; Cai et al., 2018). Though the stateofthearts (Perozzi et al., 2014; Tang et al., 2015; Wang et al., 2016; Grover and Leskovec, 2016; Dong et al., 2017; Qiu et al., 2018) have achieved promising performance in many data mining tasks, most of them focus on static networks with fixed structures. In reality, the network usually exhibits complex temporal properties, meaning that the network structures are not achieved overnight and usually evolve over time. In the socalled temporal networks (Li et al., 2017a), the establishments of edges between nodes are chronological and the network scale grows with some obvious distribution. For example, researchers collaborate with others in different years, leading to sequential coauthor events and continued growth of the network scale. Therefore, a temporal network naturally represents the evolution of a network, including not only the finegrained network structure but also the macroscopic network scale. Embedding a temporal network with the latent representation space is of great importance for applications in practice.
Basically, one requirement for temporal network embedding is that the learned embeddings should preserve the network structure and reflect its temporal evolution. The temporal network evolution usually follows two dynamics processes, i.e., the microscopic and macroscopic dynamics. At the microscopic level, the temporal network structure is driven by the establishments of edges, which is actually a sequence of chronological events involving two nodes. Taking Figure 1 as an example, from time to , the formation of the structure can be described as , , , and here nodes and build a link again at . Usually, an edge generated at time is inevitably related with the historical neighbors before , and the influence of the neighbor structures on the edge formation comes from the twoway nodes, not just a single one. Besides, different neighborhoods may have distinct influences. For example, in Figure 1, the establishment of edge at time should be influenced by {, and . Besides, the influences of and on the event should be larger than that of , since nodes and form a closed triad (Zhou et al., 2018; Huang et al., 2015). Such microdynamic evolution process describes the edge formation between nodes at different timesteps in detail and explains that “why the network evolves into such structures at time ”. Modeling the microdynamics enables the learned node embeddings to capture the evolution of a temporal network more accurately, which will be beneficial for the downstream temporal network tasks. We notice that temporal network embedding has been studied by some works (Du et al., 2018; Zhou et al., 2018; Trivedi et al., 2018; Zuo et al., 2018). However, they either simplify the evolution process as a series of network snapshots, which cannot truly reveal the formation order of edges; or model neighborhood structures using stochastic processes, which ignores the finegrained structural and temporal properties.
More importantly, at the macroscopic level, another salient property of the temporal network is that the network scale evolves with obvious distributions over time, e.g., Sshaped sigmoid curve (Leskovec et al., 2005) or a powerlaw like pattern (Zang et al., 2016). As shown in Figure 1, when the network evolves over time, the edges are continuously being built and form the network structures at each timestamp. Thus, the network scale, i.e., the number of edges, grows with time and obeys a certain underlying principle, rather than being randomly generated. Such macrodynamics reveal the inherent evolution pattern of the temporal network and impose constraints in a higher structural level on the network embedding, i.e., they determine that how many edges should be generated totally by microdynamics embedding as the network evolves. The incorporation of macrodynamics provides valuable and effective evolutionary information to enhance the capability of network embedding preserving network structure and evolution pattern, which will largely strengthen the generalization ability of network embedding. Therefore, whether the learned embedding space can encode the macrodynamics in a temporal network should be a critical requirement for temporal network embedding methods. Unfortunately, none of the existing temporal network embedding method takes them into account although the macrodynamics are closely related with temporal networks.
In this paper, we propose a novel temporal Network Embedding method with Micro and MacroDynamics, named . In particular, to model the chronological events of edge establishments in a temporal network (i.e., microdynamics), we elaborately design a temporal attention point process by parameterizing the conditional intensity function with node embeddings, which captures the finegrained structural and temporal properties with a hierarchical temporal attention. To model the evolution pattern of the temporal network scale (i.e., macrodynamics), we define a general dynamics equation as a nonlinear function of the network embedding, which imposes constraints on the network embedding at a high structural level and well couples the dynamics analysis with representation learning on temporal networks. At last, we combine micro and macrodynamics preserved embedding and optimize them jointly. As micro and macrodynamics mutually evolve and alternately influence the process of learning node embeddings, the proposed has the capability to capture the formation process of topological structures and the evolutionary pattern of network scale in a unified manner. We will make our code and data publicly available at website after the review.
The major contributions of this work can be summarized as follows:

[leftmargin=*]

For the first time, we study the important problem of incorporating the microdynamics and macrodynamics into temporal network embedding.

We propose a novel temporal network embedding method (), which microscopically models the formation process of network structure with a temporal attention point process, and macroscopically constrains the network structure to obey a certain evolutionary pattern with a dynamics equation.

We conduct comprehensive experiments to validate the benefits of on the traditional applications (e.g., network reconstruction and temporal link prediction), as well as some novel applications related to temporal networks (e.g. scale prediction).
2. Related work
Recently, network embedding has attracted considerable attention (Cui et al., 2018). Inspired by word2vec (Mikolov et al., 2013), random walk based methods(Perozzi et al., 2014; Grover and Leskovec, 2016) have been proposed to learn node embeddings by the skipgram model. After that, (Wang et al., 2017; Qiu et al., 2018)
are designed to better preserve network properties, e.g. highorder proximity. There are also some deep neural network based methods, such as autoencoder based methods
(Wang et al., 2016, 2018) and graph neural network based methods (Kipf and Welling, 2017; Velickovic et al., 2018). Besides, some models are designed for heterogeneous information networks (Dong et al., 2017; Shi et al., 2018a; Lu et al., 2019) or attribute networks (Zhang et al., 2018). However, all the aforementioned methods only focus on static network embedding.There are some attempts in temporal network embedding, which can be broadly classified into two categories: embedding snapshot networks
(Li et al., 2017b; Zhu et al., 2018; Du et al., 2018; Goyal et al., 2018; Zhou et al., 2018) and modeling temporal evolution (Nguyen et al., 2018; Trivedi et al., 2018; Zuo et al., 2018). The basic idea of the former is to learn node embedding for each network snapshot. Specifically, DANE (Li et al., 2017b) and DHPE (Zhu et al., 2018) present efficient algorithms based on perturbation theory. Song et al. extend skipgram based models and propose a dynamic network embedding framework (Du et al., 2018). DynamicTriad (Zhou et al., 2018) models the triadic closure process to capture dynamics and learns node embeddings at each time step. The latter type of methods try to capture the evolution pattern of network for latent embeddings. (Trivedi et al., 2018) describes temporal evolution over graphs as association and communication process and propose a deep representation learning framework for dynamic graphs. HTNE (Zuo et al., 2018) proposes a Hawkes process based network embedding method, which models the neighborhood formation sequence to learn node embeddings. Besides, there are some taskspecific temporal network embedding methods. NetWalk (Yu et al., 2018)is an anomaly detection framework, which detects network deviations based on a dynamic clustering algorithm.
All the abovementioned methods either learn node embeddings on snapshots, or model temporal process of networks with limited dynamics and structures. None of them integrate both of micro and macrodynamics into temporal network embedding.
3. Preliminaries
3.1. Dynamics in Temporal Networks
Definition 3.1 ().
Temporal Network. A temporal network refers as a sequence of timestamped edges, where each edge connects two nodes at a certain time. Formally, a temporal network can be denoted as , where and are sets of nodes and edges, and is the sequence of timestamps. Each temporal edge refers to an event involving nodes and at time .
Please notice that nodes and may build multiple edges at different timestamps, we consider as a distinct temporal edge while means a static edge.
Definition 3.2 ().
Microdynamics. Given a temporal network , microdynamics describe the formation process of the network structures, denoted as , where represents a temporal event that nodes and establish an edge at time and is the complete sequence of observed events ordered by time in window .
Definition 3.3 ().
Macrodynamics. Given a temporal network , macrodynamics refer to the evolution process of the network scale, denoted as , where is the cumulative number of edges by time .
In fact, macrodynamics represent both the change of edges and nodes. Since new nodes will inevitably lead to new edges, we focus on the growth of edges here. Intuitively, microdynamics determine which edges will be built (i.e., events occurrence), while macrodynamics constrain the scale of new changes of edges.
3.2. Temporal Point Process
Temporal point processes have previously been used to model dynamics in networks (Farajtabar et al., 2015), which assumes that an event happens in a tiny window with conditional probability given the historical events. Selfexciting multivariate point process or Hawkes process (Mei and Eisner, 2017) is a wellknown temporal point process with the conditional intensity function defined as follows:
(1) 
where is the base intensity, describing the arrival of spontaneous events. The kernel models the time decay effects of past events on current events, which is usually in the form of an exponential function. is the number of events until .
3.3. Problem Definition
Our goal is to learn node embeddings by capturing the formation process of the network structures and the evolution pattern of the network scale. We can formally define the problem as follows:
Definition 3.4 ().
Temporal Network Embedding. Given a temporal network , temporal network embedding aims to learn a mapping function , where is the number of embedding dimensions and . The objective of the function is to model the evolution pattern of the network, including both micro and macrodynamics in a temporal network.
4. The Proposed Model
4.1. Model Overview
Different from conventional methods which only consider the evolution of network structures, we incorporate both micro and macrodynamics into temporal network embedding. As illustrated in Figure 2, from a microscopic perspective (i.e., Figure 2(a)), we consider the establishment of edges as the chronological event and propose a temporal attention point process to capture the finegrained structural and temporal properties for network embedding. The establishment of an edge (e.g., ) is determined by the nodes themselves and their historical neighbors (e.g., and ), where the distinct influences are captured with a hierarchical temporal attention. From a macroscopic perspective (i.e., Figure 2(b)), the inherent evolution pattern of network scale constrains the network structures at a higher level, which is defined as a dynamics equation parameterized with network embedding and timestamp . Micro and macrodynamics evolve and derive node embeddings in a mutual manner (i.e., Figure 2(c)). The micro prediction from the historical structures indicates that node may link with three nodes (i.e., three new temporal edges to be established) at time , while the macrodynamics preserved embedding limits the number of new edges to only two, according to the evolution pattern of the network scale. Therefore, the network embedding learned with our proposed captures more precise structural and temporal properties.
4.2. Microdynamics Preserved Embedding
With the evolution of the network, new edges are constantly being established, which can be regarded as a series of observed events. Intuitively, the occurrence of an event is not only influenced by the event participants, but also by past events. Moreover, the past events affect the current event to varying degrees. Thus, we propose a temporal attention point process to preserve microdynamics in a temporal network.
Formally, given a temporal edge (i.e., an observed event), we parameterize the intensity of the event with network embedding . Since similar nodes and are more likely to establish the edge , the similarity between nodes and should be proportional to the intensity of the event that and build a link at time . On the other hand, the similarity between the historical neighbors and the current node indicates the degree of past impact on the event , which should decrease with time and be different from distinct neighbors.
To this end, we define the occurrence intensity of the event , consisting of the base intensity from nodes themselves and the historical influences from twoway neighbors, as follows:
(2)  
where is a function measuring the similarity of two nodes, here we define
, where other measurements can also be used, such as cosine similarity
^{1}^{1}1Since our idea is to keep the intensity much larger when nodes and are more similar, the negative Euclidian distance is applied here as it satisfies the triangle inequality and thus can preserve the first and secondorder proximities naturally (Danielsson, 1980).. and are the historical neighbors of node and before , respectively. The term is the time decay function with a nodedependent and learnable decay rate , where is the time of the past event . Here and are two attention coefficients determined by a hierarchical temporal attention mechanism, which will be introduced later.As the current event is stochastically excited or inhibited by past events, and the Eq. (2) may derive negative values, we apply a nonlinear transfer function (i.e., exponential function) to ensure that the intensity of an event is a positive real number.
(3) 
4.2.1. Hierarchical Temporal Attention
As mentioned before, the past events have an impact on the occurrence of the current event, and this impact may vary from past events. For instance, whether two researchers and collaborate on a neural networkrelated paper at time is usually related with their respective historical collaborators. Intuitively, a researcher who has collaborated with or on neural networkrelated papers in the past has a larger local influence on the current event . Besides, if ’s collaborators are more experts in neural networks as a whole, his neighbors will have a larger global impact on the current event. Since a researcher’s interest will change with the research hotspot, the influence of his neighbors is not static but dynamic. Hence, we propose a temporal hierarchical attention mechanism to capture such nonuniform and dynamic influence of historical structures.
For the local influence from each neighbor, the term makes it likely to form an edge between nodes and , if ’s neighbor is similar with . The importance of to the event depends on node and changes as neighborhood structures evolve. Hence, the attention coefficient is defined as follows:
(4) 
(5) 
where is the concatenation operation.
serves as the attention vector and
represents the local weight matrix. Here we incorporate the time decay so that if the timestamp is close to , then node will have a large impact on the event . Similarly, we can get which captures the distinctive local influence from neighbors of node on the event at time .For the global impact of whole neighbors, we represent the historical neighbors as a whole with the aggregation of each neighbor information . Considering the global decay of influence, we average the time decay of past events with . Thus, we capture the global attention of the ’s whole neighbors on the current event as follows:
(6) 
(7) 
where is a singlelayer neural network, which takes the aggregated embedding from neighbors and the average time decay of past events as input.
Combining the two parts of attention, we can preserve the structural and temporal properties in a coupled way, as the attention itself is evolutionary with microdynamics in the temporal network.
4.2.2. Micro Prediction
Until now, we define the probability of establishing an edge between nodes and at time as follows:
(8) 
Hence, we can minimize the following objective function to capture the microdynamics in a temporal network:
(9) 
4.3. Macrodynamics Preserved Embedding
Unlike microdynamics driving the formation of edges, macrodynamics describe the evolution pattern of the network scale, which usually obeys obvious distributions, i.e., the network scale can be described with a certain dynamics equation. Furthermore, macrodynamics constrain the formation of the internal structure of the network at a higher level, i.e., it determines that how many edges should be generated totally by now. Encoding such highlevel structures can largely strengthen the capability of network embeddings. Hence, we propose to define a dynamics equation parameterized with the network embedding, which bridges the dynamics analysis with representation learning on temporal networks.
Given a temporal network , we have the cumulative number of nodes by time . For each node , it links other nodes (e.g., node ) with a linking rate at time . According to the densification power laws in network evolution (Leskovec et al., 2005; Zang et al., 2016), we have the average accessible neighbors with the linear sparsity coefficient and powerlaw sparsity exponent . Hence, we define the macrodynamics which refer to the number of new edges at time as follows:
(10) 
where can be obtained as the network evolves by time , and are learnable with model optimization.
As the network evolves by time , nodes joint in the network. At the next time, each node in the network tries to establish edges with the other nodes with a link rate .
4.3.1. Linking Rate.
Since the linking rate plays a vital role in driving the evolution of network scale (Leskovec et al., 2005), it is dependent not only on the temporal information but also structural properties of the network. On the one hand, much more edges are built at the inception of the network while the growth rate decays with the densification of the network. Therefore, the linking rate should decay with a temporal term. On the other hand, the establishments of edges promote the evolution of network structures, the linking rate should be associated with the structural properties of the network. Hence, in order to capture such temporal and structural information in network embeddings, we parameterize the linking rate of the network with a temporal fizzling term and node embeddings:
(11) 
where is the temporal fizzling exponent,
is the sigmoid function. As the learned embeddings should well encode network structures, the numerator in Eq. (
11) models the max linking rate of the network with node embeddings, which decays over time. Hence, with Eq. (11), we can combine the representation learning and macrodynamics of the temporal network.4.3.2. Macro Constraint.
As the network evolves, we have the sequence of the real number of edges , hence we can get the changed number of edges, denoted as , , , , where . Then, we learn the parameters in Eq. (10) via minimizing the sum of square errors:
(12) 
where is the predicted number of new edges at time .
4.4. The Unified Model
As micro and macrodynamics mutually drive the evolution of the temporal network, which alternately influence the learning process of network embeddings, we have the following model to capture the formation process of topological structures and the evolutionary pattern of network scale in a unified manner:
(13) 
where is the weight of the constraint of macrodynamics on representations learning.
Optimization. As the second term (i.e., ) is actually a nonlinear least square problem, we can solve it with gradient descent (Marquardt, 1963). However, optimizing the first term (i.e., ) is computationally expensive due to the calculation of Eq. (8) (i.e., ). To address this problem, we introduce the transfer function in Eq. (3) as function, thereby Eq. (8) is a Softmax unit applied to , which can be optimized approximately via negative sampling (Mikolov et al., 2013; Shi et al., 2018b). Specifically, we sample an edge with probability proportional to its weight at each time. Then negative node pairs and
are sampled. Hence, the loss function Eq. (
9) can be rewritten as follows:(14) 
where is the sigmoid function. Note that we fix the maximum number of historical neighbors
and retain the most recent neighbors. We adopt minibatch gradient descent with PyTorch implementation to minimize the loss function.
The time complexity of is , where is the number of iterations and means the number of timestamps. is the embedding dimension, and are the number of neighbors and negative samples.
5. Experiments
In this section, we evaluate the proposed method on three datasets. Here we report experiments to answer the following questions:
Q1. Accuracy. Can accurately embed networks into a latent representation space which preserves networks structures?
Q2. Dynamics. Can effectively capture temporal information in networks for dynamic prediction tasks?
Q3. Tendency. Can well forecast the evolutionary tendency of network scale via the macrodynamics embedding?
5.1. Experimental Setup
5.1.1. Datasets.
We adopt three datasets from different domains, namely Eucore^{2}^{2}2https://snap.stanford.edu/data/, DBLP^{3}^{3}3https://dblp.unitrier.de and Tmall^{4}^{4}4https://tianchi.aliyun.com/dataset/. Eucore is generated using email data. The communications between people refer to edges and five departments are treated as labels. DBLP is a coauthor network and we take ten research areas as labels. Tmall is extracted from the sales data. We take users and items as nodes, purchases as edges. The five most purchased categories are retained as labels. The detailed statistics of datasets are shown in Table 1.
5.1.2. Baselines.
We compare the performance of against the following seven network embedding methods.

[leftmargin=*]

DeepWalk (Perozzi et al., 2014) performs a random walk on networks and then learns node vectors via the skipgram model. We set the number of walks per node , the walk length .

node2vec (Grover and Leskovec, 2016) generalizes DeepWalk with biased random walks. We tune the parameters and from .

LINE (Tang et al., 2015) considers first and secondorder proximities in networks. We employ the secondorder proximity and set the number of edge samples as 100 million.

SDNE (Wang et al., 2016) uses deep autoencoders to capture nonlinear dependencies in networks. We vary from .

TNE (Zhu et al., 2016) is a dynamic network embedding model based on matrix factorization. We set with a grid search from . We take the node embeddings of last timestamp for evaluations.

DynamicTriad (Zhou et al., 2018) models the triadic closure of the network evolution. We set and with a grid search from , and take the node embeddings of last timestamp for evaluations.

HTNE (Zuo et al., 2018) learns node representation via a Hawkes process. We set the neighbor length to be the same as ours.

MDNE is a variant of our proposed model, which only captures microdynamics in networks. (i.e., ).
Datasets  Eucore  DBLP  Tmall 
# nodes  986  28,085  577,314 
# static edges  24,929  162,451  2,992,964 
# temporal edges  332,334  236,894  4,807,545 
# time steps  526  27  186 
# labels  5  10  5 
Methods  Eucore  DBLP  Tmall  
Pre.@100  Pre.@1000  AUC  Pre.@100  Pre.@1000  AUC  Pre.@100  Pre.@1000  AUC  
DeepWalk  0.56  0.576  0.7737  0.92  0.321  0.9617  0.55  0.455  0.8852 
node2vec  0.73  0.535  0.8157  0.81  0.248  0.9833  0.58  0.493  0.9755 
LINE  0.58  0.487  0.7711  0.89  0.524  0.9859  0.11  0.183  0.8355 
SDNE  0.77  0.747  0.8833  0.88  0.278  0.8945  0.24  0.387  0.8934 
TNE  0.89  0.778  0.6803  0.03  0.013  0.9003  0.01  0.062  0.7278 
DynamicTriad  0.91  0.745  0.7234  0.93  0.469  0.7464  0.27  0.324  0.9534 
HTNE  0.84  0.776  0.9215  0.95  0.528  0.9944  0.40  0.404  0.9804 
MDNE  0.94  0.809  0.9217  0.97  0.543  0.9953  0.25  0.412  0.9853 
0.96  0.823  0.9276  0.99  0.553  0.9964  0.30  0.431  0.9865 
Evaluation of network reconstruction. Pre.@K means the Precision@K.
Datasets  Metrics  Tr.Ratio  DeepWalk  node2vec  LINE  SDNE  TNE  DynamicTriad  HTNE  MDNE  
Eucore  MacroF1  40%  0.1878  0.1575  0.1765  0.1723  0.0954  0.1486  0.1319  0.1598  0.1365 
60%  0.1934  0.1869  0.1777  0.1834  0.1272  0.1796  0.1731  0.1855  0.1952  
80%  0.2049  0.2022  0.1278  0.1987  0.1389  0.1979  0.1927  0.1948  0.2057  
MicroF1  40%  0.2089  0.2133  0.2266  0.2129  0.2298  0.2310  0.2200  0.2273  0.2311  
60%  0.2245  0.2400  0.1933  0.2321  0.2377  0.2333  0.2400  0.2501  0.2533  
80%  0.2400  0.2660  0.1466  0.2543  0.2432  0.2400  0.2672  0.2702  0.2800  
DBLP  MacroF1  40%  0.6708  0.6607  0.6393  0.5225  0.0580  0.6045  0.6768  0.6883  0.6902 
60%  0.6717  0.6681  0.6499  0.5498  0.1429  0.6477  0.6824  0.6915  0.6948  
80%  0.6712  0.6693  0.6513  0.5998  0.1488  0.6642  0.6836  0.6905  0.6975  
MicroF1  40%  0.6653  0.6680  0.6437  0.5517  0.2872  0.6513  0.6853  0.6892  0.6923  
60%  0.6689  0.6737  0.6507  0.5932  0.2931  0.6680  0.6857  0.6922  0.6947  
80%  0.6638  0.6731  0.6474  0.6423  0.2951  0.6695  0.6879  0.6924  0.6971  
Tmall  MacroF1  40%  0.4909  0.5437  0.4371  0.4845  0.1069  0.4498  0.5481  0.5648  0.5775 
60%  0.4929  0.5455  0.4376  0.4989  0.1067  0.4897  0.5489  0.5681  0.5799  
80%  0.4953  0.5458  0.4397  0.5312  0.1068  0.5116  0.5493  0.5728  0.5847  
MicroF1  40%  0.5711  0.6041  0.5367  0.5734  0.3647  0.5324  0.6253  0.6344  0.6421  
60%  0.5734  0.6056  0.5392  0.5788  0.3638  0.5688  0.6259  0.6369  0.6438  
80%  0.5778  0.6066  0.5428  0.5832  0.3642  0.6072  0.6264  0.6401  0.6465 
5.1.3. Parameter Settings
For a fair comparison, we set the embedding dimension for all methods. The number of negative samples is set to 5. For , we set the number of historical neighbors to 2, 5 and 2; the balance factor to 0.3, 0.4 and 0.3 for Eucore, DBLP and Tmall, respectively.
5.2. Q1: Accuracy
We evaluate the accuracy of the learned embeddings with two tasks, including network reconstruction and node classification.
5.2.1. Network Reconstruction.
In this task, we train embeddings on the fully evolved network and reconstruct edges based on the proximity between nodes. Following (Chen et al., 2018), for DeepWalk, node2vec, LINE and TNE, we take the inner product between node embeddings as the proximity due to their original inner productbased optimization objective. For SDNE, DynamicTriad, HTNE and our models (i.e., MDNE and ), we calculate the negative squared Euclidean distance. Then, we rank node pairs according to the proximity. As the number of possible node pairs (i.e., ) is too large in DBLP and Tmall, we randomly sample about 1% and 0.1% node pairs for evaluation, as in (Ou et al., 2016). We report the performance in term of Precision@K and AUC.
Table 2 shows that our proposed MDNE and consistently outperform the baselines on AUC. On Precision@K, achieves the best performance on Eucore and DBLP, improving by 5.78% and 4.73% in term of Precision@1000, compared with the best competitors. We believe the significant improvement is because modeling macrodynamics with constrains the establishment of some noise edges, so as to more accurately reconstruct the real edges in the network. Though MDNE only models the microdynamics, it captures finegrained structural properties with a temporal attention point process and hence still outperforms all baselines on Eucore and DBLP. One potential reason that is not performing that well on Tmall is that the evolutionary pattern of the purchase behaviors in short term is not significant, which leads to the slightly worse performance of models for temporal network embedding (i.e., TNE, DynamicTriad, HTNE, MDNE and ). However, still performs better than other temporal network embedding methods in most cases, which indicates the necessity of jointly capturing micro and macrodynamics for temporal network embedding.
5.2.2. Node Classification.
After learning the node embeddings on the fully evolved network, we train a logistic regression classifier that takes node embeddings as input features. The ratio of training set is set as 40%, 60%, and 80%. We report the results in terms of MacroF1 and MicroF1 in Table
3.As we can observe, our MDNE and achieve better performance than all baselines in all cases except one. Specifically, compared with methods for static networks (i.e., DeepWalk, node2vec, LINE and SDNE), the good performance of MDNE and suggests that the formation process of network structures preserved in our models provides effective information to make the embeddings more discriminative. In terms of methods for temporal networks (i.e., TNE, DynamicTriad and HTNE), our MDNE and capture the local and global structures aggregated from neighbors via a hierarchical temporal attention mechanism, which enhances the accuracy of the embeddings of structures. Besides, encodes highlevel structures in the latent embedding space, which further improves the performance of classification. From a vertical comparison, MDNE and continue to perform best against different sizes of training data in almost all cases, which implies the stability and robustness of our models.
5.3. Q2: Dynamics
We study the effectiveness of for capturing temporal information in networks via temporal node recommendation and link prediction. As the temporal evolution is a long term process while edges in Tmall have less significant evolution pattern with incomparable accuracy, we conduct experiments on Eucore and DBLP. Specifically, given a test timestamp , we train the node embeddings on the network before time (not included), and evaluate the prediction performance after time (included). For Eucore, we set the first 500 timestamps as training data due to its long evolution time. For DBLP, we train the embeddings before the first 26 timestamps.
5.3.1. Temporal Node Recommendation.
For each node in the network before time , we predict the top possible neighbors of at . We calculate the ranking score as the setting in network reconstruction task, and then derive the top nodes with the highest score as candidates. This task is mainly used to evaluate the performance of temporal network embedding methods. However, in order to provide a more comprehensive result, we also compare our method against one popular static method, i.e., DeepWalk.
The experimental results are reported in Figure 3 with respect to Recall@K and Precision@K. We can see that our models MDNE and perform better than all the baselines in terms of different metrics. Compared with the best competitors (i.e., HTNE), the recommendation performance of improves by 10.88% and 8.34%in terms of Recall@10 and Precision@10 on Eucore. On DBLP, the improvement is 6.05% and 11.69% with respect to Recall@10 and Precision@10. These significant improvements verify that the temporal attention point process proposed in MDNE and is capable of modeling finegrained structures and dynamic pattern of the network. Additionally, the significant improvement of benefits from the highlevel constraints of macrodynamics on network embeddings, thus encoding the inherent evolution of the network structure, which is good for temporal prediction tasks.
Methods  Eucore  DBLP  
ACC.  F1  ACC.  F1  
DeepWalk  0.8444  0.8430  0.7776  0.7778 
node2vec  0.8591  0.8583  0.8128  0.8059 
LINE  0.7837  0.7762  0.6711  0.6756 
SDNE  0.7533  0.7908  0.6971  0.6867 
TNE  0.6932  0.6691  0.5027  0.4799 
DynamicTriad  0.6775  0.6611  0.6189  0.6199 
HTNE  0.8539  0.8498  0.8123  0.8157 
MDNE  0.8649  0.8585  0.8292  0.8239 
0.8734  0.8681  0.8336  0.8341 
Methods  Eucore  DBLP  Tmall  
A.E.  A.E.  A.E.  
DeepWalk  444,539  24,929  419,610  335,916,746  162,451  335,754,295  118,381,361,880  2,992,964  118,378,368,916 
node2vec  479,583  24,929  454,654  363,253,815  162,451  363,091,364  135,349,949,950  2,992,964  135,346,956,986 
LINE  278,175  24,929  253,246  363,567,406  162,451  363,404,955  135,763,029,298  2,992,964  135,760,036,334 
SDNE  396,752  24,929  371,823  361,748,486  162,451  361,586,035  134,748,693,450  2,992,964  134,745,700,486 
TNE  485,584  332,334  153,250  389,257,712  236,894  389,020,818  166,630,196,186  4,807,545  166,625,388,641 
DynamicTriad  485,605  332,334  163,271  394,369,570  236,894  394,132,676  165,467,872,223  4,807,545  165,463,064,678 
HTNE  203,012  332,334  129,322  173,501,036  236,894  173,264,142  82,716,705,256  4,807,545  82,711,897,711 
MDNE  203,776  332,334  128,558  173,205,229  236,894  172,968,335  82,702,894,887  4,807,545  82,698,087,342 
349,157  332,334  16,823  222,993  236,894  13,901  3,855,548  4,807,545  951,997 
5.3.2. Temporal Link Prediction.
We learn node embedding on the network before time and predict the edges established at . Given nodes and , we define the edge’s representation as . We take edges built at time as positive ones, and randomly sample the same number of negative edges (i.e., two nodes share no link). And, we train a logistic regression classifier on the datasets.
As shown in Table 4, our methods MDNE and consistently outperform all the baselines on both metrics (accuracy and F1). We believe the worse performance of TNE and DynamicTriad is due to the fact that they model the dynamics with network snapshots, which cannot reveal the temporal process of edge formation. Though HTNE takes the dynamic process into account, it ignores the evolutionary pattern of network size and only retains the network structures with unilateral neighborhood structures. Since our proposed captures dynamics from both microscopic and macroscopic perspectives, which embeds more temporal and structural information into node representations.
5.4. Q3: Tendency
As captures both the micro and macrodynamics, our model can not only be applied to some traditional applications, but also some macro dynamics analysis related applications. Here, we perform scale prediction and trend forecast tasks.
5.4.1. Scale Prediction
In this task, we predict the number of edges at a certain time. We train node embeddings on networks before the first 500, 26 and 180 timestamps on Eucore, DBLP and Tmall, respectively. Then we predict the cumulative number of edges by next time. Since none of the baselines can predict the number of edges, we calculate a score for each node pairs, defined as . If , we count an edge connected nodes and at next time. While in our method, we can directly calculate the number of edges with Eq. (10).
The edge prediction results with respect to absolute error (i.e., A.E.) are reported in Table 5. is the real number of edges and is the prediction. Note that we only count static edges for models designed for static network embedding, while for models designed for dynamic networks, we count temporal edges, that is why we have different in Table 5. Apparently, can predict the number of edges much more accurately, which verify that the embedding parameterized dynamics equation (i.e., Eq. (10)) is capable of modeling the evolution pattern of the network scale. The prediction errors of the comparison methods are very large, because they can not or can only capture the local evolution pattern of networks, while ignore the global evolution trend.
5.4.2. Trend Forecast
Different from the scale prediction at a certain time, we forecast the overall trend of the network scale in this task. Given a timestamp , we learn node embeddings (i.e., ) and parameters (i.e., and ) on the network before time (not included). Based on the learned embeddings and parameters, we forecast the evolution trend of network scale and draw the evolution curves. Since none of the baselines can forecast the network scale, we only perform this task with our method w.r.t. different training timestamp. We set the training timestamp as and ( is the timestamp sequence) and forecast the remaining with Eq. (10).
As shown in Figure 4, can well fit the number of edges, with network embeddings and parameters learned via the dynamics equation (i.e., Eq. (10)), which proves that the designed linking rate (i.e., Eq. (11)) well bridges the dynamics of network evolution and embeddings of network structures. Moreover, as the training timestamp increases (i.e, from to ), the prediction errors decrease, which is consistent with common sense, i.e., more training data will help the learned node embeddings better capture the evolutionary pattern. We also notice that the forecast accuracy is slightly worse on Tmall, since the less significant evolutionary pattern of purchase behaviors in short term.
5.5. Parameter Analysis
Since the number of historical neighbors determines the captured neighborhood structures and the number of negative samples affects model optimization, we explore the sensitivity of these two parameters on DBLP dataset.
5.5.1. Number of Historical Neighbors
From Figure 5(b), we notice that performs stably and achieves the highest F1 score at 5. Since historical neighbors models the formation of network structures, the number of historical neighbors influences the performance of model. To strike a balance between performance and complexity, we set the number of neighbors as a small value.
5.5.2. Number of Negative Samples.
As shown in Figure 5(b), the performance of improves with the increase in the number of negative samples, and then achieves the best performance once the dimension of the representation reaches around 5. Overall, the performance of is stable, which proves that our model is less sensitive to the number of negative samples.
6. Conclusion
In this paper, we make the first attempt to explore temporal network embedding from microscopic and macroscopic perspectives. We propose a novel temporal network embedding with micro and macrodynamics (), where a temporal attention point process is designed to capture structural and temporal properties at a finegrained level, and a general dynamics equation parameterized with network embedding is presented to encode highlevel structures by constraining the inherent evolution pattern. Experimental results demonstrate that outperforms stateoftheart baselines in various tasks. One future direction is to generalize our model to incorporate the shrinking dynamics of temporal networks.
7. Acknowledgments
This work is supported in part by the National Natural Science Foundation of China (No. 61772082, 61702296, 61806020), the National Key Research and Development Program of China (2017YFB0803304), the Beijing Municipal Natural Science Foundation (4182043), and the 2018 and 2019 CCFTencent Open Research Fund. This work is supported in part by NSF under grants III1526499, III1763325, III1909323, SaTC1930941, and CNS1626432. This work is supported in part by the NSF under grants CNS1618629, CNS1814825, CNS1845138, OAC1839909 and III1908215, the NIJ 201875CX0032.
References
 A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
 PME: projected metric embedding on heterogeneous networks for link prediction. In Proceedings of SIGKDD, pp. 1177–1186. Cited by: §5.2.1.
 A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1, §2.
 Euclidean distance mapping. Computer Graphics and image processing 14 (3), pp. 227–248. Cited by: footnote 1.
 Metapath2vec: scalable representation learning for heterogeneous networks. In Proceedings of SIGKDD, pp. 135–144. Cited by: §1, §2.
 Dynamic network embedding: an extended approach for skipgram based network embedding.. In Proceedings of IJCAI, pp. 2086–2092. Cited by: §1, §2.
 COEVOLVE: a joint point process model for information diffusion and network coevolution. In Proceedings of NIPS, pp. 1954–1962. Cited by: §3.2.
 DynGEM: deep embedding method for dynamic graphs. In Proceedings of IJCAI, Cited by: §2.
 Node2vec: scalable feature learning for networks. In Proceedings of SIGKDD, pp. 855–864. Cited by: §1, §2, 2nd item.
 Triadic closure pattern analysis and prediction in social networks. IEEE Transactions on Knowledge and Data Engineering 27 (12), pp. 3374–3389. Cited by: §1.
 Semisupervised classification with graph convolutional networks. In Proceedings of ICLR, Cited by: §2.
 Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of SIGKDD, pp. 177–187. Cited by: §1, §4.3.1, §4.3.
 The fundamental advantages of temporal networks. Science 358 (6366), pp. 1042–1046. Cited by: §1.
 Attributed network embedding for learning in a dynamic environment. In Proceedings of CIKM, pp. 387–396. Cited by: §2.

Relation structureaware heterogeneous information network embedding.
In
The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27  February 1, 2019.
, pp. 4456–4463. Cited by: §2. 
An algorithm for leastsquares estimation of nonlinear parameters
. Journal of the society for Industrial and Applied Mathematics 11 (2), pp. 431–441. Cited by: §4.4.  The neural hawkes process: a neurally selfmodulating multivariate point process. In Proceedings of NIPS, pp. 6754–6764. Cited by: §3.2.
 Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, pp. 3111–3119. Cited by: §2, §4.4.
 Continuoustime dynamic network embeddings. In Proceedings of WWW, pp. 969–976. Cited by: §2.
 Asymmetric transitivity preserving graph embedding. In Proceedings of SIGKDD, pp. 1105–1114. Cited by: §5.2.1.
 Deepwalk: online learning of social representations. In Proceedings of SIGKDD, pp. 701–710. Cited by: §1, §2, 1st item.
 Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of WSDM, pp. 459–467. Cited by: §1, §2.
 Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.
 Easing embedding learning by comprehensive transcription of heterogeneous information networks. In Proceedings of SIGKDD, pp. 2190–2199. Cited by: §4.4.
 Line: largescale information network embedding. In Proceedings of WWW, pp. 1067–1077. Cited by: §1, 3rd item.
 Representation learning over dynamic graphs. arXiv preprint arXiv:1803.04051. Cited by: §1, §2.
 Graph attention networks. In Proceedings of ICLR, Cited by: §2.
 Structural deep network embedding. In Proceedings of SIGKDD, pp. 1225–1234. Cited by: §1, §2, 4th item.
 SHINE: signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of WSDM, pp. 592–600. Cited by: §2.
 Community preserving network embedding.. In Proceedings of AAAI, pp. 203–209. Cited by: §2.
 Netwalk: a flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of SIGKDD, pp. 2672–2681. Cited by: §2.
 Beyond sigmoids: the nettide model for social network growth, and its applications. In Proceedings of SIGKDD, pp. 2015–2024. Cited by: §1, §4.3.
 ANRL: attributed network representation learning via deep neural networks. In Proceedings of IJCAI, pp. 3155–3161. Cited by: §2.
 Dynamic network embedding by modeling triadic closure process. In Proceedings of AAAI, Cited by: §1, §2, 6th item.
 Highorder proximity preserved embedding for dynamic networks. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.
 Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28 (10), pp. 2765–2777. Cited by: 5th item.
 Embedding temporal network via neighborhood formation. In Proceedings of SIGKDD, pp. 2857–2866. Cited by: §1, §2, 7th item.
Comments
There are no comments yet.