Now there is much network structured data like social networks and transportation networks in daily life and research. The real-world networks are often large and complicated so that it’s expensive to use them.
Network embedding means learning a low-dimensional representation, e.g., a numerical vector, for every node in a network. After embedding, other data driven algorithms that need node features as input can be conducted in the low-dimensional space directly. The network embedding is essential in the traditional tasks, such as link predictions, recommendation and classification.
There are mainly two approaches to conduct network embedding: a) Singular Value Decomposition (SVD)[SVD]
based methods, which is proven to be successful in many important network applications. It decomposes the adjacency matrix or Laplacian matrix to obtain the node representation. b) Deep learning based methods. Many deep learning based algorithms try to merge structural information to nodes to obtain the low-dimensional representation[SDNE, deepwalk, line, node2vec].
The mentioned algorithms of network embedding above are suitable for static networks, in which all the nodes, edges and the features are known and fixed before learning. However, many of the networks are highly dynamic in nature. For example, the social networks, financial transaction networks, telephone call networks, etc., change all the time and remain much information during network evolution. So when the nodes or edges of the network change, the algorithms need be re-run with the whole network data. Usually it will take a long time to learn the embedding again. The online learning of network embedding would be involved temporal analysis, which is similar as dynamic system modelling [li2018symbolic, gong2018sequential, chen2014cognitive], and its further analysis and work [ChenTRY14, chen2013model, gong2016model, chen2015model].
Most of the dynamic network embedding algorithms are based on the static network algorithms. They will more or less encounter the following challenges:
Network structure preservation: some algorithms learn representation of new nodes by performing information propagation [propagation], or optimizing a loss that encourages smooth changes between linked nodes [harmonic, non-parametric]. There are also methods that aim to learn a mapping from node features to representations, by imposing a manifold regularizer derived from the graph [Manifold]. But these methods do not preserve intricate network properties when inferring representation of new nodes.
Growing graphs: Structural Deep Network Embedding (SDNE) [SDNE] method and Deeply Transformed High-order Laplacian Gaussian Process (DepthLGP) [DepthLGP] both use a deep neural network to learn representations with considering the network structure. But SDNE could not handle nodes change and DepthLGP could not handle edges change. The SVD based algorithms could not handle growing graphs either. Incremental SVD methods [fastSVD1, fastSVD2] are proposed to update previous SVD results to incorporate the changes without restarting the algorithm. But it can only deal with edges change and when errors cumulate, it still need to re-run SVD to correct the errors.
Information of evolving graphs: Dynamic Graph Embedding Model (DynGem) [dyngem]
uses a dynamically expanding deep autoencoder to keep network structure and deal with growing graphs. However, it only trains the current network on the basis of the old parameters and abandons the information contained in the network during the evolution.
To improve the embedding of dynamic network, we propose Recurrent Neural Network Embedding (RNNE), a neural network model, which is shown in Figure 1. In response to the three challenges mentioned above, RNNE has adopted the following approaches in the three main parts of the model (Pretreatment, Training Window and Training Model):
Network structure preservation
: RNNE calculates the node features from multi-step probability transition matrices in Pretreatment, trying to preserve the structural characteristics of larger neighborhoods of each node than only using the adjacency matrix. And in Training Model, the loss function will consider the first-order proximity, high-order proximity111The first-order proximity is determined by if there is a link between two nodes, and the high-order proximity means the similarity between the neiborhood structure of two nodes.together.
Growing graphs: RNNE will first put some virtual nodes to the network. When new nodes arrive, RNNE will replace the virtual nodes with new nodes in Pretreatment. Similarly, if a node is deleted, RNNE will replace it with a virtual node.
Information of evolving graphs: The overall structure of Training Model is a RNN model. The previous node representations are inputted as hidden state to the RNNE cell, so more information of evolving graphs can be used during embedding. Considering that the representations of one node at different time should be closed if the node’s characters don’t change, RNNE also adds a corresponding part to the loss function to maintain the stability222Stability means reducing the effects of noise from network fluctuation over time.of embedding.
The main contributions of this paper are listed as follows:
RNNE considers the first-order proximity and high-order proximity during training, so it can preserve the original network structure.
With virtual nodes, RNNE can unify the sizes of networks at different time and easily extract the changing part of the network.
RNNE takes graph sequences as input and can integrate information of evolving graphs when embedding. It is helpful to mitigate the effects of network fluctuation over time.
2 The RNNE model
2.1 Problem Definition
Given a dynamic network whose nodes and edges may change when time goes on and then given a series of graph , , …, where is the state of in a series of time, for each node of , learn , where is a positive integer given in advance.
2.2 Model Description
First RNNE assumes that the network series are stable. It means that there won’t be too many nodes changing at the same time, and the increase in weight of edges is nearly linear. Second, the size of the model is limited, so RNNE also assume that the network will not become too large with time goes on.
RNNE will not process the whole series at the same time, because the old network maybe invalid and too long series will take a lot of time. RNNE maintain a fixed length window to get the networks to train and then a concept drift checking part will exclude the nodes whose property maybe change.
For the node in dynamic network, we can’t represent it only with a state of a moment. So RNNE not only use the current state and also consider the previous state when learning the embedding. Learn from recurrent neural networks (RNN)[RNN], RNNE use a hidden state to represent the previous state of the node.
In general, RNNE will first choose suitable nodes, then use hidden state and node feature as input to minimize the loss of node proximity in neighboring time points. The entire process will be explained in detail in the following subsections.
Each node has an attribute named “state”. At the beginning all the nodes’ “state” is “normal” which means it is only a normal node:
[“state” ] = “normal” ,
where is the -th node in
Then in order to keep the size of input, we define a type of node named virtual node. It is not connected to any other node. And
[“state” ] = “virtual” ,
if is a virtual node
If the number of nodes in doesn’t reach the limit of the model which is , then we add virtual nodes into until = .
Before start training, we should add training networks to the training window one by one. When the new network arrives, if it’s the first type dynamic network, it can be put into training window directly. Otherwise, RNNE will put the subgraph of the increased part than the last network. If the training window is full, RNNE removes the earliest one from training window and then check every node in the window to keep out dangerous nodes when training.
Assume that the window size is 5 and there are 4 networks , , , in the window, then arrive. For every node in , if the state of and are both not “virtual”, we calculate using the row of their adjacency matrix. Most of the time, and are the same node at different time. At last, we use Grubbs test [grubuustest1, grubuustest2] to find the dangerous nodes whose property most likely change:
[“state” ] = “dangerous” ,
if is a dangerous node
The process described above is present in Algorithm 1.
In order to keep the high-order proximity, we calculate the node feature as below, assume that is the adjacency matrix of :
First we define a function :
each element in will be divided by the largest element in the same row.
The feature matrix is calculated as follow:
|are the same node on different time point|
|the node size of after adding virtual nodes,|
|the embedding size|
|the batch size of training|
|the feature of the -th node in which is calculated as Eq.2,|
|the reconstructed data of ,|
|the adjacency matrix for the ,|
|the hidden state of ,|
|the representation of ,|
|the state of “normal’, “virtual”, “dangerous”|
First RNNE sample a batch from nodes, a node can be chosen only when “normal” . Assume that node index are selected, then we get the input matrix series .
The RNN cell of our model has an encoder-decoder structure. The encoder consists of multiple non-linear functions that map the input data to the representation space. The decoder also consists of multiple non-linear functions mapping the representations in representation space to reconstruction space. Given the input ,the calculation is shown as follows:
The goal of the autoencoder is to minimize the reconstruction error of the output and the input. The loss is calculated as below:
As [hashing] mentioned, although minimizing the reconstruction loss does not explicitly preserve the similarity between samples, the reconstruction criterion can smoothly capture the data manifolds and thus preserve the similarity between samples. It means that if the input is similar then the output will likely similar. In other words, if the features of two nodes are similar, the embedding of the nodes are similar. Simultaneously, with reconstructing the node feature from the embedding, we can possibly make sure that the embedding vector contains enough information to represent the node.
Considering that there are a lot of zero elements in and , but in fact we are more concerned about the non-zero part in them. Learn from SDNE [SDNE], when calculate the reconstruction error, we will add different weight in zero and non-zero element. The new loss function is shown as below:
where means the Hadamard product, . If , , else . Using this loss function, the nodes who have similar neighborhood structure will be mapped closely. It means that our model can keep the global network structure by keeping high-order proximity between nodes.
In addition to consider the neighborhood structure of different nodes, we should also pay attention to the local structure which means the direct link in nodes. We use the first-order proximity to measure the local structure of network. The loss function is shown as below:
if , there is a direct link in node and , we hope them can be mapped near in the embedding space.
In the above we only considered one network in the series of networks, though using the hidden state to transfer information between them. When we sample the training nodes, the states of them all are “normal”, so the representation of one node in different time should be close as far as possible without the influence of noise. The loss function of this part is shown as follows:
, and in are the parts of the hyper parameter of the model.
At last the model parameters can be adjusted by:
where is the learning rate of the model.
The whole training process can be seen in Algorithm 2.
2.5 Analysis and Discussions
In this section, some analysis and discussions of RNNE are presented.
RNNE assumes a limit of node size in each snapshot of network. Usually the node size is less than . If the node size becomes larger than during network evolution, we can expand the layer size of RNNE cell with ramaining the old parameters, which is learned from DynGem [dyngem].
The training complexity of RNNE in one iteration is , where n is the size of trainning window, is the batch size, is the limit of node size in network, is the embedding size, and is the maximum size of the hidden layer. Usually , and are constants given in advance. is linear to the true node size of network. is related to the embedding size but not related to the node size. So the training complexity of RNNE in one iteration is and linear to the node size of network.
In this section, we introduce the methods and datasets which are used to evaluate the RNNE algorithm.
We use static networks and dynamic networks evaluate the RNNE algorithm. Some of the datasets don not have label information, so they won’t be used to do the classification experiment. For static networks, we randomly change some of the nodes and edges to generate a series of networks. All of the dataset’s length are 14.
Wiki : It is a reference network in wiki and each node has a label. There are totally 17 categories in this dataset.
blogCatalog [blog], email-Eu-core [CA] : They are social networks of people. There are 39 categories in blogCatalog and 42 in email-Eu-core.
CA-CondMat, CA-HepPh [CA]: They are collaboration network of Arxiv. These two datasets are only used for the reconstruction and link prediction because we have no label information for of the nodes.
3.2 Baselines and Parameters
We use following algorithms as the baselines of the experiments. For the static network embedding algorithms, we will apply them to each snapshot of the dynamic network.
SDNE [SDNE] : It also uses an autoencoder structure, and learns embedding with minimizing the loss of first-order and second-order proximity.
, , , , .
Line [line] : It doesn’t define a function to calculate the network embedding but learning a map of node to embedding directly. It’s loss fuction also consider the first-order and second-order proximity.
GrapRep [grarep] : It considers high-order proximity and use SVD to get network embedding.
Hope [hope] : It constructs an asymmetric relation matrix from the adjacency matrix and then use JDGSVD [jdgsvd] to get the low-dimensional representation.
For RNNE, the layer size333 Layer size is a list of numbers of the neuron at each encoder layer, the last number is the dimension of the output node representation.
Layer size is a list of numbers of the neuron at each encoder layer, the last number is the dimension of the output node representation.of autoencoder is different in each dataset. It is shown in table 3.
The hyper-parameters of , , are adjusted by grid search : , ,
3.3 Evaluation Metrics
In our experiments, we test three tasks of reconstruction, classification and link prediction.
In reconstruction and link predictions, we use whose definition is shown below to measure the performance of algorithms.
where is the ranked index of which is predicted in .
In classification, we use and to measure the performance of algorithms. For a label , , and are the number of true positives, false positives and false negatives in the instances which are predicted as , is the label set:
Considering that each dataset has 14 snapshots in it, and the algorithms will be applied on each snapshot, so we choose the average performance as the final result.
4 Results and Analysis
Reconstruction means restituting the original network information from the embedding. In this task, we calculate the distance of each pair of nodes in the embedding space to measure the first-order proximity of the nodes. And then infer the edges using the proximity calculated above. There are the results of dataset CA-Condmat and CA-HepPh in Figure 2.
From this result, we can see RNNE archives better than SDNE, Hope and Line on these two datasets. And when is not big, RNNE also do better than GraRep. The algorithms who consider the first-order proximity or high-order proximity (RNNE, SDNE, GrapRep, Line) obviously perform better than those who doesn’t (Hope). This result show that high-order proximity is very helpful to preserve the original network structure. In fact, the history network information RNNE used is actually a noise in the reconstruction of current network. So RNNE has disadvantage on network reconstruction theoretically.
Classification is a very common and important task in daily research and work. In this experiment, we use the node embedding as feature to classify each node into a label and then compare with its ground truth. Specifically, we use the LIBLINEAR[LIBLINEAR] as the solver of the classifiers. When training the classifiers, we randomly choose a part of nodes and their labels to train and use the rest to test. For Wiki, blogCatalog and email-Eu-core, we randomly choose 10% to 90% nodes to train. The results are shown in Figure 3.
From this result, we can see RNNE make better performance than other four algorithms generally. The autoencoder structure of RNNE model can possibly make the nodes who are close in the feature space still be close in the embedding space. And when calculating node feature, we use the high-order proximity which are more expressive than adjacency matrix.
4.3 Link Predictions
Link predictions is a little similar with reconstruction, because they both need to judge whether an edge exist. Before doing this experiment, we will first randomly hide 15% edges in the test networks, and then using their embedding to predict the hidden edges. To pay attention, when calculate , we will ignore the edges who are predicted but already exist in the after-hidden network. There are the results of dataset CA-Condmat and CA-HepPh in Figure 4.
When become larger and larger, in the beginning, RNNE gets higher than others, and afterwards, GraRep may do a little better. At most of the time in real world tasks such as recommendation, it doesn’t require to predict too many links. On the one hand, with the predicting goes on, it will inevitably reduce the accuracy. On the other hand, people pay more attention to the pair of nodes who are most likely have a link. So it’s very important to get higher precision when is small.
4.4 Parameter Influence
In this section, we investigate the parameter influence to prove they are really effective for our tasks. Specifically, we evaluate the parameters , and on the dataset of email-Eu-core. The results are shown in Figure 5.
In Figure 5(a), we can see the performance in classification and reconstruction when varies in and = 5, = 5. It is very obvious when becomes larger, the is higher. But when , the quantity of classification become significantly worse. is the weight of first-order proximity in the loss function, so the larger is, the more the model is concerned on the direct links between nodes. It is important to find a balance between first-order and high-order proximity.
In Figure 5(b), we can see the performance when varies in [1, 20] and = 0.1, = 0. is the weight of non-zero part when reconstructing the node feature in the autoencoder. When = 1, which means the non-zero elements and zero elements have the same weight, the results are not good. However, when is too large, the in reconstruction task is still not good enough (in this experiment, = 5 is the best) since too large makes the autoencoder ignore the information in zero element. Thus, we should pay more attention to the non-zero elements and still concentrate on zero elements properly.
In Figure 5(c), we can see the performance when varies in [0, 20] and = 0.001, = 5. is used to reduce the difference of the same node representations at different time. That means, the embedding results not only depend on the current network, but also depend on the previous. So we can see when =0, the is the best, though it still has “noise” because of the RNN structure. Of course, the larger gain better performance than = 0 in classification. So the choose of is depends on whether we focus more on network structure or node feature.
In this paper, we propose Recurrent Neural Network Embedding (RNNE), an algorithm for dynamic network embedding with deep neural network. In order to unify the input network structure at different time, we add virtual nodes and replace virtual nodes with real nodes when nodes changing happened. In the method of embedding, RNNE not only keeps the local and global network structure via first-order and high-order proximity, but also reduces the influence of noise by transferring the previous embedding information. We compare RNNE with several other algorithms on various datasets and tasks, and then show the parameters influence on the performance of embedding. The results show that our method is effective and can achieve better performance than other algorithms on the tested datasets.
The future work will try to use the probabilistic models [chen-2009-pcvm, chen-2014-epcvm], its multi-objective version [lyu2019multiclass] and large-scale version [jiang2017scalable] to incorporate with dynamic network embedding. In addition, the ensemble methods [chen2009regularized, chen2009predictive, chen2010multiobjective] could be employed to improve the performance of embedding and the following applications.