Online Dynamic Network Embedding

06/30/2020
by   Haiwei Huang, et al.
USTC
0

Network embedding is a very important method for network data. However, most of the algorithms can only deal with static networks. In this paper, we propose an algorithm Recurrent Neural Network Embedding (RNNE) to deal with dynamic network, which can be typically divided into two categories: a) topologically evolving graphs whose nodes and edges will increase (decrease) over time; b) temporal graphs whose edges contain time information. In order to handle the changing size of dynamic networks, RNNE adds virtual node, which is not connected to any other nodes, to the networks and replaces it when new node arrives, so that the network size can be unified at different time. On the one hand, RNNE pays attention to the direct links between nodes and the similarity between the neighborhood structures of two nodes, trying to preserve the local and global network structure. On the other hand, RNNE reduces the influence of noise by transferring the previous embedding information. Therefore, RNNE can take into account both static and dynamic characteristics of the network.We evaluate RNNE on five networks and compare with several state-of-the-art algorithms. The results demonstrate that RNNE has advantages over other algorithms in reconstruction, classification and link predictions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/01/2020

Modeling Dynamic Heterogeneous Network for Link Prediction using Hierarchical Attention with Temporal RNN

Network embedding aims to learn low-dimensional representations of nodes...
06/08/2021

Online Algorithms for Network Robustness under Connectivity Constraints

In this paper, we present algorithms for designing networks that are rob...
08/24/2018

GoT-WAVE: Temporal network alignment using graphlet-orbit transitions

Global pairwise network alignment (GPNA) aims to find a one-to-one node ...
10/01/2021

Temporal Graphs and Temporal Network Characteristics for Bio-Inspired Networks During Optimization

Temporal network analysis and time evolution of network characteristics ...
01/25/2022

Structural importance and evolution: an application to financial transaction networks

A fundamental problem in the study of networks is the identification of ...
11/16/2017

(geo)graphs - Complex Networks as a shapefile of nodes and a shapefile of edges for different applications

Spatial dependency and spatial embedding are basic physical properties o...
05/20/2018

Network Learning with Local Propagation

This paper presents a locally decoupled network parameter learning with ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Now there is much network structured data like social networks and transportation networks in daily life and research. The real-world networks are often large and complicated so that it’s expensive to use them.

Network embedding means learning a low-dimensional representation, e.g., a numerical vector, for every node in a network. After embedding, other data driven algorithms that need node features as input can be conducted in the low-dimensional space directly. The network embedding is essential in the traditional tasks, such as link predictions, recommendation and classification.

There are mainly two approaches to conduct network embedding: a) Singular Value Decomposition (SVD) 

[SVD]

based methods, which is proven to be successful in many important network applications. It decomposes the adjacency matrix or Laplacian matrix to obtain the node representation. b) Deep learning based methods. Many deep learning based algorithms try to merge structural information to nodes to obtain the low-dimensional representation 

[SDNE, deepwalk, line, node2vec].

The mentioned algorithms of network embedding above are suitable for static networks, in which all the nodes, edges and the features are known and fixed before learning. However, many of the networks are highly dynamic in nature. For example, the social networks, financial transaction networks, telephone call networks, etc., change all the time and remain much information during network evolution. So when the nodes or edges of the network change, the algorithms need be re-run with the whole network data. Usually it will take a long time to learn the embedding again. The online learning of network embedding would be involved temporal analysis, which is similar as dynamic system modelling [li2018symbolic, gong2018sequential, chen2014cognitive], and its further analysis and work [ChenTRY14, chen2013model, gong2016model, chen2015model].

Most of the dynamic network embedding algorithms are based on the static network algorithms. They will more or less encounter the following challenges:

  • Network structure preservation: some algorithms learn representation of new nodes by performing information propagation [propagation], or optimizing a loss that encourages smooth changes between linked nodes [harmonic, non-parametric]. There are also methods that aim to learn a mapping from node features to representations, by imposing a manifold regularizer derived from the graph [Manifold]. But these methods do not preserve intricate network properties when inferring representation of new nodes.

  • Growing graphs: Structural Deep Network Embedding (SDNE) [SDNE] method and Deeply Transformed High-order Laplacian Gaussian Process (DepthLGP) [DepthLGP] both use a deep neural network to learn representations with considering the network structure. But SDNE could not handle nodes change and DepthLGP could not handle edges change. The SVD based algorithms could not handle growing graphs either. Incremental SVD methods [fastSVD1, fastSVD2] are proposed to update previous SVD results to incorporate the changes without restarting the algorithm. But it can only deal with edges change and when errors cumulate, it still need to re-run SVD to correct the errors.

  • Information of evolving graphs: Dynamic Graph Embedding Model (DynGem) [dyngem]

    uses a dynamically expanding deep autoencoder to keep network structure and deal with growing graphs. However, it only trains the current network on the basis of the old parameters and abandons the information contained in the network during the evolution.

To improve the embedding of dynamic network, we propose Recurrent Neural Network Embedding (RNNE), a neural network model, which is shown in Figure 1. In response to the three challenges mentioned above, RNNE has adopted the following approaches in the three main parts of the model (Pretreatment, Training Window and Training Model):

  • Network structure preservation

    : RNNE calculates the node features from multi-step probability transition matrices in Pretreatment, trying to preserve the structural characteristics of larger neighborhoods of each node than only using the adjacency matrix. And in Training Model, the loss function will consider the first-order proximity, high-order proximity

    111The first-order proximity is determined by if there is a link between two nodes, and the high-order proximity means the similarity between the neiborhood structure of two nodes.together.

  • Growing graphs: RNNE will first put some virtual nodes to the network. When new nodes arrive, RNNE will replace the virtual nodes with new nodes in Pretreatment. Similarly, if a node is deleted, RNNE will replace it with a virtual node.

  • Information of evolving graphs: The overall structure of Training Model is a RNN model. The previous node representations are inputted as hidden state to the RNNE cell, so more information of evolving graphs can be used during embedding. Considering that the representations of one node at different time should be closed if the node’s characters don’t change, RNNE also adds a corresponding part to the loss function to maintain the stability222Stability means reducing the effects of noise from network fluctuation over time.of embedding.

(a) RNNE cell
(b) RNNE structure
Figure 1: (a) is the structure of RNNE cell that is used in (b). (b) is the components and processes of RNNE model, all of the RNNE cells share the same parameters. is the series of networks.

The main contributions of this paper are listed as follows:

  • RNNE considers the first-order proximity and high-order proximity during training, so it can preserve the original network structure.

  • With virtual nodes, RNNE can unify the sizes of networks at different time and easily extract the changing part of the network.

  • RNNE takes graph sequences as input and can integrate information of evolving graphs when embedding. It is helpful to mitigate the effects of network fluctuation over time.

This paper is organized as follows. Section 2 introduces and explains the RNNE model in detail. And Section 3 introduces the experiments and datasets. The experimental results and analysis are presented in Section 4. Finally, Section 5 concludes this paper.

2 The RNNE model

2.1 Problem Definition

Given a dynamic network whose nodes and edges may change when time goes on and then given a series of graph , , …, where is the state of in a series of time, for each node of , learn , where is a positive integer given in advance.

2.2 Model Description

First RNNE assumes that the network series are stable. It means that there won’t be too many nodes changing at the same time, and the increase in weight of edges is nearly linear. Second, the size of the model is limited, so RNNE also assume that the network will not become too large with time goes on.

RNNE will not process the whole series at the same time, because the old network maybe invalid and too long series will take a lot of time. RNNE maintain a fixed length window to get the networks to train and then a concept drift checking part will exclude the nodes whose property maybe change.

For the node in dynamic network, we can’t represent it only with a state of a moment. So RNNE not only use the current state and also consider the previous state when learning the embedding. Learn from recurrent neural networks (RNN) 

[RNN], RNNE use a hidden state to represent the previous state of the node.

In general, RNNE will first choose suitable nodes, then use hidden state and node feature as input to minimize the loss of node proximity in neighboring time points. The entire process will be explained in detail in the following subsections.

2.3 Pretreatment

Each node has an attribute named “state”. At the beginning all the nodes’ “state” is “normal” which means it is only a normal node:

[“state” ] = “normal” ,

where is the -th node in

Then in order to keep the size of input, we define a type of node named virtual node. It is not connected to any other node. And

[“state” ] = “virtual” ,

if is a virtual node

If the number of nodes in doesn’t reach the limit of the model which is , then we add virtual nodes into until = .

Before start training, we should add training networks to the training window one by one. When the new network arrives, if it’s the first type dynamic network, it can be put into training window directly. Otherwise, RNNE will put the subgraph of the increased part than the last network. If the training window is full, RNNE removes the earliest one from training window and then check every node in the window to keep out dangerous nodes when training.

Assume that the window size is 5 and there are 4 networks , , , in the window, then arrive. For every node in , if the state of and are both not “virtual”, we calculate using the row of their adjacency matrix. Most of the time, and are the same node at different time. At last, we use Grubbs test [grubuustest1, grubuustest2] to find the dangerous nodes whose property most likely change:

[“state” ] = “dangerous” ,

if is a dangerous node

The process described above is present in Algorithm 1.

In order to keep the high-order proximity, we calculate the node feature as below, assume that is the adjacency matrix of :
First we define a function :

(1)

each element in will be divided by the largest element in the same row.
The feature matrix is calculated as follow:

(2)
0:  new network with adjacency matrix , the last network with adjacency matrix , the size limit , the significance level
0:  mark the state of
1:  Add virtual node into until
2:  for each in  do
3:     [“state”]=“virtual”
4:  end for
5:  
6:  use Grubbs test on D with ignoring the virtual node data to find target node set with significance level
7:  for each in  do
8:     [“state”]=“dangerous”
9:  end for
10:  for each in  do
11:     [“state”]=“normal”
12:  end for
Algorithm 1 mark state of new network

2.4 Training

RNNE model is shown in Figure 1. Suppose there are networks , , ……, in the window, the symbol is explained in Table 1.

Symbol Definition
are the same node on different time point
the node size of after adding virtual nodes,
the embedding size
the batch size of training
the feature of the -th node in which is calculated as Eq.2,
the reconstructed data of ,
the adjacency matrix for the ,
the hidden state of ,
the representation of ,
the state of “normal’, “virtual”, “dangerous”
Table 1: Symbol Explanation

First RNNE sample a batch from nodes, a node can be chosen only when “normal” . Assume that node index are selected, then we get the input matrix series .

The RNN cell of our model has an encoder-decoder structure. The encoder consists of multiple non-linear functions that map the input data to the representation space. The decoder also consists of multiple non-linear functions mapping the representations in representation space to reconstruction space. Given the input ,the calculation is shown as follows:

(3)

The goal of the autoencoder is to minimize the reconstruction error of the output and the input. The loss is calculated as below:

(4)

As [hashing] mentioned, although minimizing the reconstruction loss does not explicitly preserve the similarity between samples, the reconstruction criterion can smoothly capture the data manifolds and thus preserve the similarity between samples. It means that if the input is similar then the output will likely similar. In other words, if the features of two nodes are similar, the embedding of the nodes are similar. Simultaneously, with reconstructing the node feature from the embedding, we can possibly make sure that the embedding vector contains enough information to represent the node.

Considering that there are a lot of zero elements in and , but in fact we are more concerned about the non-zero part in them. Learn from SDNE [SDNE], when calculate the reconstruction error, we will add different weight in zero and non-zero element. The new loss function is shown as below:

(5)

where means the Hadamard product, . If , , else . Using this loss function, the nodes who have similar neighborhood structure will be mapped closely. It means that our model can keep the global network structure by keeping high-order proximity between nodes.

In addition to consider the neighborhood structure of different nodes, we should also pay attention to the local structure which means the direct link in nodes. We use the first-order proximity to measure the local structure of network. The loss function is shown as below:

(6)

if , there is a direct link in node and , we hope them can be mapped near in the embedding space.

In the above we only considered one network in the series of networks, though using the hidden state to transfer information between them. When we sample the training nodes, the states of them all are “normal”, so the representation of one node in different time should be close as far as possible without the influence of noise. The loss function of this part is shown as follows:

(7)

In summary, to keep the first-order proximity and second-order proximity and the stability in time series, we combine Eq.57 and get the integrated final loss function:

(8)

, and in are the parts of the hyper parameter of the model.

At last the model parameters can be adjusted by:

(9)

where is the learning rate of the model.

The whole training process can be seen in Algorithm 2.

0:  the network list with their adjacency matrix and feature matrix in trainning window, the hidden state of
0:  network embedding and hidden state
1:  if it is the first time to train then
2:     initialize the parameter
3:  end if
4:  repeat
5:     sample a minibatch of nodes satisfied that : for in , [“state”]=“normal”
6:     get the slice of ’s part in , ,
7:     calculate using Eq.3 and Eq.8
8:     update parameter with Eq.9
9:  until converge
10:  using Eq.3 to get and
Algorithm 2 Train RNNE model

2.5 Analysis and Discussions

In this section, some analysis and discussions of RNNE are presented.

RNNE assumes a limit of node size in each snapshot of network. Usually the node size is less than . If the node size becomes larger than during network evolution, we can expand the layer size of RNNE cell with ramaining the old parameters, which is learned from DynGem [dyngem].

The training complexity of RNNE in one iteration is , where n is the size of trainning window, is the batch size, is the limit of node size in network, is the embedding size, and is the maximum size of the hidden layer. Usually , and are constants given in advance. is linear to the true node size of network. is related to the embedding size but not related to the node size. So the training complexity of RNNE in one iteration is and linear to the node size of network.

3 Experiments

In this section, we introduce the methods and datasets which are used to evaluate the RNNE algorithm.

3.1 Dataset

We use static networks and dynamic networks evaluate the RNNE algorithm. Some of the datasets don not have label information, so they won’t be used to do the classification experiment. For static networks, we randomly change some of the nodes and edges to generate a series of networks. All of the dataset’s length are 14.

dataset nodes edges
Wiki 2405-2724 17981-27754
email-Eu-core 1005-1242 25571-43249
blogCatalog 10312-10651 333983-624250
CA-CondMat 23133-23252 93468-176492
CA-HepPh 12008-12337 118505-209384
Table 2: dataset information
  • Wiki : It is a reference network in wiki and each node has a label. There are totally 17 categories in this dataset.

  • blogCatalog [blog], email-Eu-core [CA] : They are social networks of people. There are 39 categories in blogCatalog and 42 in email-Eu-core.

  • CA-CondMat, CA-HepPh [CA]: They are collaboration network of Arxiv. These two datasets are only used for the reconstruction and link prediction because we have no label information for of the nodes.

3.2 Baselines and Parameters

We use following algorithms as the baselines of the experiments. For the static network embedding algorithms, we will apply them to each snapshot of the dynamic network.

  • SDNE [SDNE] : It also uses an autoencoder structure, and learns embedding with minimizing the loss of first-order and second-order proximity.
    , , , , .

  • Line [line] : It doesn’t define a function to calculate the network embedding but learning a map of node to embedding directly. It’s loss fuction also consider the first-order and second-order proximity.

  • GrapRep [grarep] : It considers high-order proximity and use SVD to get network embedding.
    .

  • Hope [hope] : It constructs an asymmetric relation matrix from the adjacency matrix and then use JDGSVD [jdgsvd] to get the low-dimensional representation.

For RNNE, the layer size333

Layer size is a list of numbers of the neuron at each encoder layer, the last number is the dimension of the output node representation.

of autoencoder is different in each dataset. It is shown in table 3.

dataSet layer size
Wiki 5128-200-128
email-Eu-core 2128-128
blogCatalog 15128-1000-128
CA-CondMat 25128-2500-1000-128
CA-HepPh 15128-1500-128
Table 3: autoencoder size

The hyper-parameters of , , are adjusted by grid search : , ,

3.3 Evaluation Metrics

In our experiments, we test three tasks of reconstruction, classification and link prediction.

In reconstruction and link predictions, we use whose definition is shown below to measure the performance of algorithms.
For :

where is the ranked index of which is predicted in .

In classification, we use and to measure the performance of algorithms. For a label , , and are the number of true positives, false positives and false negatives in the instances which are predicted as , is the label set:

Considering that each dataset has 14 snapshots in it, and the algorithms will be applied on each snapshot, so we choose the average performance as the final result.

4 Results and Analysis

4.1 Reconstruction

Reconstruction means restituting the original network information from the embedding. In this task, we calculate the distance of each pair of nodes in the embedding space to measure the first-order proximity of the nodes. And then infer the edges using the proximity calculated above. There are the results of dataset CA-Condmat and CA-HepPh in Figure 2.

(a) CA-Condmat
(b) CA-HepPh
Figure 2: average on CA-Condmat and CA-HepPh in reconstruction.

From this result, we can see RNNE archives better than SDNE, Hope and Line on these two datasets. And when is not big, RNNE also do better than GraRep. The algorithms who consider the first-order proximity or high-order proximity (RNNE, SDNE, GrapRep, Line) obviously perform better than those who doesn’t (Hope). This result show that high-order proximity is very helpful to preserve the original network structure. In fact, the history network information RNNE used is actually a noise in the reconstruction of current network. So RNNE has disadvantage on network reconstruction theoretically.

4.2 Classification

Classification is a very common and important task in daily research and work. In this experiment, we use the node embedding as feature to classify each node into a label and then compare with its ground truth. Specifically, we use the LIBLINEAR 

[LIBLINEAR] as the solver of the classifiers. When training the classifiers, we randomly choose a part of nodes and their labels to train and use the rest to test. For Wiki, blogCatalog and email-Eu-core, we randomly choose 10% to 90% nodes to train. The results are shown in Figure 3.

(a) blogCatalog
(b) Wiki
(c) email-Eu-core
Figure 3: Micro-F1 and Macro-F1 on three datasets when the training percentage change.

From this result, we can see RNNE make better performance than other four algorithms generally. The autoencoder structure of RNNE model can possibly make the nodes who are close in the feature space still be close in the embedding space. And when calculating node feature, we use the high-order proximity which are more expressive than adjacency matrix.

4.3 Link Predictions

Link predictions is a little similar with reconstruction, because they both need to judge whether an edge exist. Before doing this experiment, we will first randomly hide 15% edges in the test networks, and then using their embedding to predict the hidden edges. To pay attention, when calculate , we will ignore the edges who are predicted but already exist in the after-hidden network. There are the results of dataset CA-Condmat and CA-HepPh in Figure 4.

(a) CA-Condmat
(b) CA-HepPh
Figure 4: average on CA-Condmat and CA-HepPh in link predictions.

When become larger and larger, in the beginning, RNNE gets higher than others, and afterwards, GraRep may do a little better. At most of the time in real world tasks such as recommendation, it doesn’t require to predict too many links. On the one hand, with the predicting goes on, it will inevitably reduce the accuracy. On the other hand, people pay more attention to the pair of nodes who are most likely have a link. So it’s very important to get higher precision when is small.

4.4 Parameter Influence

In this section, we investigate the parameter influence to prove they are really effective for our tasks. Specifically, we evaluate the parameters , and on the dataset of email-Eu-core. The results are shown in Figure 5.

(a) Influence of (=5, =5)
(b) Influence of (=0.1, =0)
(c) Influence of (=0.001, =5)
Figure 5: The influence of parameters , and on the dataset of email-Eu-core.

In Figure 5(a), we can see the performance in classification and reconstruction when varies in and = 5, = 5. It is very obvious when becomes larger, the is higher. But when , the quantity of classification become significantly worse. is the weight of first-order proximity in the loss function, so the larger is, the more the model is concerned on the direct links between nodes. It is important to find a balance between first-order and high-order proximity.

In Figure 5(b), we can see the performance when varies in [1, 20] and = 0.1, = 0. is the weight of non-zero part when reconstructing the node feature in the autoencoder. When = 1, which means the non-zero elements and zero elements have the same weight, the results are not good. However, when is too large, the in reconstruction task is still not good enough (in this experiment, = 5 is the best) since too large makes the autoencoder ignore the information in zero element. Thus, we should pay more attention to the non-zero elements and still concentrate on zero elements properly.

In Figure 5(c), we can see the performance when varies in [0, 20] and = 0.001, = 5. is used to reduce the difference of the same node representations at different time. That means, the embedding results not only depend on the current network, but also depend on the previous. So we can see when =0, the is the best, though it still has “noise” because of the RNN structure. Of course, the larger gain better performance than = 0 in classification. So the choose of is depends on whether we focus more on network structure or node feature.

5 Conclusion

In this paper, we propose Recurrent Neural Network Embedding (RNNE), an algorithm for dynamic network embedding with deep neural network. In order to unify the input network structure at different time, we add virtual nodes and replace virtual nodes with real nodes when nodes changing happened. In the method of embedding, RNNE not only keeps the local and global network structure via first-order and high-order proximity, but also reduces the influence of noise by transferring the previous embedding information. We compare RNNE with several other algorithms on various datasets and tasks, and then show the parameters influence on the performance of embedding. The results show that our method is effective and can achieve better performance than other algorithms on the tested datasets.

The future work will try to use the probabilistic models [chen-2009-pcvm, chen-2014-epcvm], its multi-objective version [lyu2019multiclass] and large-scale version [jiang2017scalable] to incorporate with dynamic network embedding. In addition, the ensemble methods [chen2009regularized, chen2009predictive, chen2010multiobjective] could be employed to improve the performance of embedding and the following applications.

References