1 Introduction
Now there is much network structured data like social networks and transportation networks in daily life and research. The realworld networks are often large and complicated so that it’s expensive to use them.
Network embedding means learning a lowdimensional representation, e.g., a numerical vector, for every node in a network. After embedding, other data driven algorithms that need node features as input can be conducted in the lowdimensional space directly. The network embedding is essential in the traditional tasks, such as link predictions, recommendation and classification.
There are mainly two approaches to conduct network embedding: a) Singular Value Decomposition (SVD)
[SVD]based methods, which is proven to be successful in many important network applications. It decomposes the adjacency matrix or Laplacian matrix to obtain the node representation. b) Deep learning based methods. Many deep learning based algorithms try to merge structural information to nodes to obtain the lowdimensional representation
[SDNE, deepwalk, line, node2vec].The mentioned algorithms of network embedding above are suitable for static networks, in which all the nodes, edges and the features are known and fixed before learning. However, many of the networks are highly dynamic in nature. For example, the social networks, financial transaction networks, telephone call networks, etc., change all the time and remain much information during network evolution. So when the nodes or edges of the network change, the algorithms need be rerun with the whole network data. Usually it will take a long time to learn the embedding again. The online learning of network embedding would be involved temporal analysis, which is similar as dynamic system modelling [li2018symbolic, gong2018sequential, chen2014cognitive], and its further analysis and work [ChenTRY14, chen2013model, gong2016model, chen2015model].
Most of the dynamic network embedding algorithms are based on the static network algorithms. They will more or less encounter the following challenges:

Network structure preservation: some algorithms learn representation of new nodes by performing information propagation [propagation], or optimizing a loss that encourages smooth changes between linked nodes [harmonic, nonparametric]. There are also methods that aim to learn a mapping from node features to representations, by imposing a manifold regularizer derived from the graph [Manifold]. But these methods do not preserve intricate network properties when inferring representation of new nodes.

Growing graphs: Structural Deep Network Embedding (SDNE) [SDNE] method and Deeply Transformed Highorder Laplacian Gaussian Process (DepthLGP) [DepthLGP] both use a deep neural network to learn representations with considering the network structure. But SDNE could not handle nodes change and DepthLGP could not handle edges change. The SVD based algorithms could not handle growing graphs either. Incremental SVD methods [fastSVD1, fastSVD2] are proposed to update previous SVD results to incorporate the changes without restarting the algorithm. But it can only deal with edges change and when errors cumulate, it still need to rerun SVD to correct the errors.

Information of evolving graphs: Dynamic Graph Embedding Model (DynGem) [dyngem]
uses a dynamically expanding deep autoencoder to keep network structure and deal with growing graphs. However, it only trains the current network on the basis of the old parameters and abandons the information contained in the network during the evolution.
To improve the embedding of dynamic network, we propose Recurrent Neural Network Embedding (RNNE), a neural network model, which is shown in Figure 1. In response to the three challenges mentioned above, RNNE has adopted the following approaches in the three main parts of the model (Pretreatment, Training Window and Training Model):

Network structure preservation
: RNNE calculates the node features from multistep probability transition matrices in Pretreatment, trying to preserve the structural characteristics of larger neighborhoods of each node than only using the adjacency matrix. And in Training Model, the loss function will consider the firstorder proximity, highorder proximity
^{1}^{1}1The firstorder proximity is determined by if there is a link between two nodes, and the highorder proximity means the similarity between the neiborhood structure of two nodes.together. 
Growing graphs: RNNE will first put some virtual nodes to the network. When new nodes arrive, RNNE will replace the virtual nodes with new nodes in Pretreatment. Similarly, if a node is deleted, RNNE will replace it with a virtual node.

Information of evolving graphs: The overall structure of Training Model is a RNN model. The previous node representations are inputted as hidden state to the RNNE cell, so more information of evolving graphs can be used during embedding. Considering that the representations of one node at different time should be closed if the node’s characters don’t change, RNNE also adds a corresponding part to the loss function to maintain the stability^{2}^{2}2Stability means reducing the effects of noise from network fluctuation over time.of embedding.
The main contributions of this paper are listed as follows:

RNNE considers the firstorder proximity and highorder proximity during training, so it can preserve the original network structure.

With virtual nodes, RNNE can unify the sizes of networks at different time and easily extract the changing part of the network.

RNNE takes graph sequences as input and can integrate information of evolving graphs when embedding. It is helpful to mitigate the effects of network fluctuation over time.
2 The RNNE model
2.1 Problem Definition
Given a dynamic network whose nodes and edges may change when time goes on and then given a series of graph , , …, where is the state of in a series of time, for each node of , learn , where is a positive integer given in advance.
2.2 Model Description
First RNNE assumes that the network series are stable. It means that there won’t be too many nodes changing at the same time, and the increase in weight of edges is nearly linear. Second, the size of the model is limited, so RNNE also assume that the network will not become too large with time goes on.
RNNE will not process the whole series at the same time, because the old network maybe invalid and too long series will take a lot of time. RNNE maintain a fixed length window to get the networks to train and then a concept drift checking part will exclude the nodes whose property maybe change.
For the node in dynamic network, we can’t represent it only with a state of a moment. So RNNE not only use the current state and also consider the previous state when learning the embedding. Learn from recurrent neural networks (RNN)
[RNN], RNNE use a hidden state to represent the previous state of the node.In general, RNNE will first choose suitable nodes, then use hidden state and node feature as input to minimize the loss of node proximity in neighboring time points. The entire process will be explained in detail in the following subsections.
2.3 Pretreatment
Each node has an attribute named “state”. At the beginning all the nodes’ “state” is “normal” which means it is only a normal node:
[“state” ] = “normal” ,
where is the th node in
Then in order to keep the size of input, we define a type of node named virtual node. It is not connected to any other node. And
[“state” ] = “virtual” ,
if is a virtual node
If the number of nodes in doesn’t reach the limit of the model which is , then we add virtual nodes into until = .
Before start training, we should add training networks to the training window one by one. When the new network arrives, if it’s the first type dynamic network, it can be put into training window directly. Otherwise, RNNE will put the subgraph of the increased part than the last network. If the training window is full, RNNE removes the earliest one from training window and then check every node in the window to keep out dangerous nodes when training.
Assume that the window size is 5 and there are 4 networks , , , in the window, then arrive. For every node in , if the state of and are both not “virtual”, we calculate using the row of their adjacency matrix. Most of the time, and are the same node at different time. At last, we use Grubbs test [grubuustest1, grubuustest2] to find the dangerous nodes whose property most likely change:
[“state” ] = “dangerous” ,
if is a dangerous node
The process described above is present in Algorithm 1.
In order to keep the highorder proximity, we calculate the node feature as below, assume that is the adjacency matrix of :
First we define a function :
(1) 
each element in will be divided by the largest element in the same row.
The feature matrix is calculated as follow:
(2) 
2.4 Training
RNNE model is shown in Figure 1. Suppose there are networks , , ……, in the window, the symbol is explained in Table 1.
Symbol  Definition 

are the same node on different time point  
the node size of after adding virtual nodes,  
the embedding size  
the batch size of training  
the feature of the th node in which is calculated as Eq.2,  
the reconstructed data of ,  
the adjacency matrix for the ,  
the hidden state of ,  
the representation of ,  
the state of “normal’, “virtual”, “dangerous” 
First RNNE sample a batch from nodes, a node can be chosen only when “normal” . Assume that node index are selected, then we get the input matrix series .
The RNN cell of our model has an encoderdecoder structure. The encoder consists of multiple nonlinear functions that map the input data to the representation space. The decoder also consists of multiple nonlinear functions mapping the representations in representation space to reconstruction space. Given the input ,the calculation is shown as follows:
(3) 
The goal of the autoencoder is to minimize the reconstruction error of the output and the input. The loss is calculated as below:
(4) 
As [hashing] mentioned, although minimizing the reconstruction loss does not explicitly preserve the similarity between samples, the reconstruction criterion can smoothly capture the data manifolds and thus preserve the similarity between samples. It means that if the input is similar then the output will likely similar. In other words, if the features of two nodes are similar, the embedding of the nodes are similar. Simultaneously, with reconstructing the node feature from the embedding, we can possibly make sure that the embedding vector contains enough information to represent the node.
Considering that there are a lot of zero elements in and , but in fact we are more concerned about the nonzero part in them. Learn from SDNE [SDNE], when calculate the reconstruction error, we will add different weight in zero and nonzero element. The new loss function is shown as below:
(5) 
where means the Hadamard product, . If , , else . Using this loss function, the nodes who have similar neighborhood structure will be mapped closely. It means that our model can keep the global network structure by keeping highorder proximity between nodes.
In addition to consider the neighborhood structure of different nodes, we should also pay attention to the local structure which means the direct link in nodes. We use the firstorder proximity to measure the local structure of network. The loss function is shown as below:
(6) 
if , there is a direct link in node and , we hope them can be mapped near in the embedding space.
In the above we only considered one network in the series of networks, though using the hidden state to transfer information between them. When we sample the training nodes, the states of them all are “normal”, so the representation of one node in different time should be close as far as possible without the influence of noise. The loss function of this part is shown as follows:
(7) 
In summary, to keep the firstorder proximity and secondorder proximity and the stability in time series, we combine Eq.57 and get the integrated final loss function:
(8) 
, and in are the parts of the hyper parameter of the model.
At last the model parameters can be adjusted by:
(9) 
where is the learning rate of the model.
The whole training process can be seen in Algorithm 2.
2.5 Analysis and Discussions
In this section, some analysis and discussions of RNNE are presented.
RNNE assumes a limit of node size in each snapshot of network. Usually the node size is less than . If the node size becomes larger than during network evolution, we can expand the layer size of RNNE cell with ramaining the old parameters, which is learned from DynGem [dyngem].
The training complexity of RNNE in one iteration is , where n is the size of trainning window, is the batch size, is the limit of node size in network, is the embedding size, and is the maximum size of the hidden layer. Usually , and are constants given in advance. is linear to the true node size of network. is related to the embedding size but not related to the node size. So the training complexity of RNNE in one iteration is and linear to the node size of network.
3 Experiments
In this section, we introduce the methods and datasets which are used to evaluate the RNNE algorithm.
3.1 Dataset
We use static networks and dynamic networks evaluate the RNNE algorithm. Some of the datasets don not have label information, so they won’t be used to do the classification experiment. For static networks, we randomly change some of the nodes and edges to generate a series of networks. All of the dataset’s length are 14.
dataset  nodes  edges 

Wiki  24052724  1798127754 
emailEucore  10051242  2557143249 
blogCatalog  1031210651  333983624250 
CACondMat  2313323252  93468176492 
CAHepPh  1200812337  118505209384 

Wiki : It is a reference network in wiki and each node has a label. There are totally 17 categories in this dataset.

blogCatalog [blog], emailEucore [CA] : They are social networks of people. There are 39 categories in blogCatalog and 42 in emailEucore.

CACondMat, CAHepPh [CA]: They are collaboration network of Arxiv. These two datasets are only used for the reconstruction and link prediction because we have no label information for of the nodes.
3.2 Baselines and Parameters
We use following algorithms as the baselines of the experiments. For the static network embedding algorithms, we will apply them to each snapshot of the dynamic network.

SDNE [SDNE] : It also uses an autoencoder structure, and learns embedding with minimizing the loss of firstorder and secondorder proximity.
, , , , . 
Line [line] : It doesn’t define a function to calculate the network embedding but learning a map of node to embedding directly. It’s loss fuction also consider the firstorder and secondorder proximity.

GrapRep [grarep] : It considers highorder proximity and use SVD to get network embedding.
. 
Hope [hope] : It constructs an asymmetric relation matrix from the adjacency matrix and then use JDGSVD [jdgsvd] to get the lowdimensional representation.
For RNNE, the layer size^{3}^{3}3
Layer size is a list of numbers of the neuron at each encoder layer, the last number is the dimension of the output node representation.
of autoencoder is different in each dataset. It is shown in table 3.dataSet  layer size 

Wiki  5128200128 
emailEucore  2128128 
blogCatalog  151281000128 
CACondMat  2512825001000128 
CAHepPh  151281500128 
The hyperparameters of , , are adjusted by grid search : , ,
3.3 Evaluation Metrics
In our experiments, we test three tasks of reconstruction, classification and link prediction.
In reconstruction and link predictions, we use whose definition is shown below to measure the performance of algorithms.
For :
where is the ranked index of which is predicted in .
In classification, we use and to measure the performance of algorithms. For a label , , and are the number of true positives, false positives and false negatives in the instances which are predicted as , is the label set:
Considering that each dataset has 14 snapshots in it, and the algorithms will be applied on each snapshot, so we choose the average performance as the final result.
4 Results and Analysis
4.1 Reconstruction
Reconstruction means restituting the original network information from the embedding. In this task, we calculate the distance of each pair of nodes in the embedding space to measure the firstorder proximity of the nodes. And then infer the edges using the proximity calculated above. There are the results of dataset CACondmat and CAHepPh in Figure 2.
From this result, we can see RNNE archives better than SDNE, Hope and Line on these two datasets. And when is not big, RNNE also do better than GraRep. The algorithms who consider the firstorder proximity or highorder proximity (RNNE, SDNE, GrapRep, Line) obviously perform better than those who doesn’t (Hope). This result show that highorder proximity is very helpful to preserve the original network structure. In fact, the history network information RNNE used is actually a noise in the reconstruction of current network. So RNNE has disadvantage on network reconstruction theoretically.
4.2 Classification
Classification is a very common and important task in daily research and work. In this experiment, we use the node embedding as feature to classify each node into a label and then compare with its ground truth. Specifically, we use the LIBLINEAR
[LIBLINEAR] as the solver of the classifiers. When training the classifiers, we randomly choose a part of nodes and their labels to train and use the rest to test. For Wiki, blogCatalog and emailEucore, we randomly choose 10% to 90% nodes to train. The results are shown in Figure 3.From this result, we can see RNNE make better performance than other four algorithms generally. The autoencoder structure of RNNE model can possibly make the nodes who are close in the feature space still be close in the embedding space. And when calculating node feature, we use the highorder proximity which are more expressive than adjacency matrix.
4.3 Link Predictions
Link predictions is a little similar with reconstruction, because they both need to judge whether an edge exist. Before doing this experiment, we will first randomly hide 15% edges in the test networks, and then using their embedding to predict the hidden edges. To pay attention, when calculate , we will ignore the edges who are predicted but already exist in the afterhidden network. There are the results of dataset CACondmat and CAHepPh in Figure 4.
When become larger and larger, in the beginning, RNNE gets higher than others, and afterwards, GraRep may do a little better. At most of the time in real world tasks such as recommendation, it doesn’t require to predict too many links. On the one hand, with the predicting goes on, it will inevitably reduce the accuracy. On the other hand, people pay more attention to the pair of nodes who are most likely have a link. So it’s very important to get higher precision when is small.
4.4 Parameter Influence
In this section, we investigate the parameter influence to prove they are really effective for our tasks. Specifically, we evaluate the parameters , and on the dataset of emailEucore. The results are shown in Figure 5.
In Figure 5(a), we can see the performance in classification and reconstruction when varies in and = 5, = 5. It is very obvious when becomes larger, the is higher. But when , the quantity of classification become significantly worse. is the weight of firstorder proximity in the loss function, so the larger is, the more the model is concerned on the direct links between nodes. It is important to find a balance between firstorder and highorder proximity.
In Figure 5(b), we can see the performance when varies in [1, 20] and = 0.1, = 0. is the weight of nonzero part when reconstructing the node feature in the autoencoder. When = 1, which means the nonzero elements and zero elements have the same weight, the results are not good. However, when is too large, the in reconstruction task is still not good enough (in this experiment, = 5 is the best) since too large makes the autoencoder ignore the information in zero element. Thus, we should pay more attention to the nonzero elements and still concentrate on zero elements properly.
In Figure 5(c), we can see the performance when varies in [0, 20] and = 0.001, = 5. is used to reduce the difference of the same node representations at different time. That means, the embedding results not only depend on the current network, but also depend on the previous. So we can see when =0, the is the best, though it still has “noise” because of the RNN structure. Of course, the larger gain better performance than = 0 in classification. So the choose of is depends on whether we focus more on network structure or node feature.
5 Conclusion
In this paper, we propose Recurrent Neural Network Embedding (RNNE), an algorithm for dynamic network embedding with deep neural network. In order to unify the input network structure at different time, we add virtual nodes and replace virtual nodes with real nodes when nodes changing happened. In the method of embedding, RNNE not only keeps the local and global network structure via firstorder and highorder proximity, but also reduces the influence of noise by transferring the previous embedding information. We compare RNNE with several other algorithms on various datasets and tasks, and then show the parameters influence on the performance of embedding. The results show that our method is effective and can achieve better performance than other algorithms on the tested datasets.
The future work will try to use the probabilistic models [chen2009pcvm, chen2014epcvm], its multiobjective version [lyu2019multiclass] and largescale version [jiang2017scalable] to incorporate with dynamic network embedding. In addition, the ensemble methods [chen2009regularized, chen2009predictive, chen2010multiobjective] could be employed to improve the performance of embedding and the following applications.
Comments
There are no comments yet.