### GloDyNE

GloDyNE: Global Topology Preserving Dynamic Network Embedding

view repo

Learning low-dimensional topological representation of a network in dynamic environments is attracting much attention due to the time-evolving nature of many real-world networks. The main and common objective of Dynamic Network Embedding (DNE) is to efficiently update node embeddings while preserving network topology at each time step. The idea of most existing DNE methods is to capture the topological changes at or around the most affected nodes (instead of all nodes) and accordingly update node embeddings. Unfortunately, this kind of approximation, although can improve efficiency, cannot effectively preserve the global topology of a dynamic network at each time step, due to not considering the inactive sub-networks that receive accumulated topological changes propagated via the high-order proximity. To tackle this challenge, we propose a novel node selecting strategy to diversely select the representative nodes over a network, which is coordinated with a new incremental learning paradigm of Skip-Gram based embedding approach. The extensive experiments show GloDyNE, with a small fraction of nodes being selected, can already achieve the superior or comparable performance w.r.t. the state-of-the-art DNE methods in three typical downstream tasks. Particularly, GloDyNE significantly outperforms other methods in the graph reconstruction task, which demonstrates its ability of global topology preservation. The source code is available at https://github.com/houchengbin/GloDyNE

READ FULL TEXT VIEW PDFGloDyNE: Global Topology Preserving Dynamic Network Embedding

view repo

The interactions or connectivities between entities of a real-world complex system can be naturally represented as a network (or graph), e.g., social networks, biological networks, and sensor networks. Learning topological representation of a network, especially low-dimensional node embeddings which encode network topology therein so as to facilitate downstream tasks, has received a great success in the past few years [cui2018survey, hamilton2017representation, goyal2018graph].

Most previous Network Embedding methods such as [perozzi2014deepwalk, tang2015line, cao2015grarep, grover2016node2vec, ou2016asymmetric] are designed for static networks. However, many real-world networks are dynamic by nature, i.e., edges might be added/removed between seen and/or unseen nodes as time goes on. For instance, in a wireless sensor network, devices will regularly connect to or accidentally disconnect from routers; in a social network, new friendships will establish between new users and/or existing users. Due to the time-evolving nature of many real-world networks, Dynamic Network Embedding (DNE) is now attracting much attention [zhu2016scalable, li2017attributed, goyal2017dyngem, zhu2018high, zhang2018timers, du2018dynamic, zhou2018dynamic, chen2018scalable, mahdavi2018dynnode2vec, singer2019node, trivedi2019dyrep]. The main and common objective of DNE is to efficiently update node embeddings while preserving network topology at each time step. Most existing DNE methods try to compromise between effectiveness (evaluated by downstream tasks) and efficiency (while obtaining node embeddings). The idea is to capture the topological changes at or around the most affected nodes (instead of all nodes), and promptly update node embeddings based on an efficient incremental learning paradigm.

Unfortunately, this kind of approximation, although can improve the efficiency, cannot effectively preserve the global topology of a dynamic network at each time step. Specifically, any changes, i.e., edges added/removed between nodes, would affect all nodes in a connected network via the high-order proximity as illustrated in Figure 1 a). On the other hand, as observed from Figure 1 b-d), the real-world dynamic networks usually have some inactive sub-networks where no change occurs lasting for several time steps. Putting both together, the existing DNE methods, which focus on the most affected nodes (belonging to the active sub-networks) but do not consider the inactive sub-networks, would overlook the accumulated topological changes propagating to the inactive sub-networks via the high-order proximity.

To tackle this challenge, the proposed DNE method–Global topology preserving Dynamic Network Embedding (GloDyNE) first partitions a current network into smaller sub-networks where one representative node in each sub-network is selected, so as to ensure the diversity

of selected nodes. The representative node for each sub-network is sampled via a probability distribution over all nodes within each sub-network, such that a higher probability is assigned to a node with the larger accumulated topological changes. After that, GloDyNE captures the latest topologies around the selected nodes by truncated random walks

[perozzi2014deepwalk], and then promptly updates node embeddings based on the Skip-Gram Negative Sampling (SGNS) model [mikolov2013distributed] and an incremental learning paradigm.The contributions of this work are as follows: 1) We demonstrate the existence of inactive sub-networks in real-world dynamic networks. Together with the propagation of topological changes via the high-order proximity, we find the issue of global topology preservation for many existing DNE methods. 2) To better preserve the global topology, unlike all previous DNE methods, we propose to also consider the accumulated topological changes in inactive sub-networks. A novel node selecting strategy is thus proposed to diversely select the representative nodes over a network. 3) We further develop a new DNE method or framework, namely GloDyNE, which extends the random walk and Skip-Gram based network embedding approach to an incremental learning paradigm with a free hyper-parameter for controlling the number of selected nodes at each time step. 4) The extensive empirical studies show the superiority of GloDyNE compared with the state-of-the-art DNE methods in terms of both effectiveness and efficiency, as well as verify the usefulness of some special designs of GloDyNE, such as the node selecting strategy and the free hyper-parameter.

The remainder of the paper is organized as follows. We first review the related works in Section 2, and then formally give the definition of DNE problem in Section 3. In Section 4, we present GloDyNE step by step, as well as its pseudocode and time complexity. The empirical studies are reported and discussed in Section 5. In particular, Section 5.2 aims to compare GloDyNE with other DNE methods, whereas Section 5.3 tries to investigate GloDyNE itself. And finally, we conclude this work in Section 6.

To learn low-dimensional topological representation of a network in dynamic environments, one naive solution is to treat the snapshot of a dynamic network at each time step as a static network, so that a static Network Embedding method such as [perozzi2014deepwalk, tang2015line, grover2016node2vec, ou2016asymmetric] can be directly applied to learn node embeddings for each snapshot.

As reported in recent DNE works [zhang2018timers, zhu2018high, du2018dynamic, zhu2016scalable], this naive solution obtains superior results compared with some DNE methods. One possible reason is that this solution does not suffer the aforementioned issue of global topology preservation as introduced in Section 1. However, it is time-consuming [du2018dynamic, zhu2018high], and thus may not satisfy the requirement of promptly updating embeddings for some DNE downstream tasks [chen2018scalable, yu2018netwalk].

To compromise between effectiveness and efficiency, most existing DNE methods try to capture the topological changes at or around the most affected nodes (instead of all nodes or edges), and promptly update node embeddings based on an incremental learning paradigm. BCGD [zhu2016scalable] aims to minimize the loss of reconstructing the network proximity matrix using the node embedding matrix with a temporal regularization term, and it is optimized by the Block-Coordinate Gradient Descent algorithm. Particularly, this work further offers an efficient solution–BCGD-incremental that only updates the most affected nodes’ embeddings based on their previous embeddings. DynAE [goyal2017dyngem] and NetWalk [yu2018netwalk] both utilize an auto-encoder with some regularization terms for modeling. They continuously train the model inherited from the last time step, so that the model converges in a few iterations thanks to the knowledge transfer from previous models. To efficiently cope with dynamic changes at each time step, some DNE methods [du2018dynamic, mahdavi2018dynnode2vec] propose an incremental version of the Skip-Gram model [mikolov2013distributed] to update embeddings based on the most affected nodes. Likewise, DHEP [zhu2018high] extends HOPE [ou2016asymmetric]

to an incremental version by modifying the most affected eigenvectors using the matrix perturbation theory.

Apart from above DNE methods with the trade-off between effectiveness (evaluated by downstream tasks) and efficiency (while obtaining node embeddings), some DNE methods [zhou2018dynamic, singer2019node, trivedi2019dyrep] aim to further improve the effectiveness without considering the efficiency. For example, tNodeEmbed [singer2019node]

runs a static Network Embedding method to obtain node embeddings at each available time step, and then employs Recurrent Neural Networks among them (for better exploiting the temporal dependence and hence may further improve the effectiveness) to obtain the final node embeddings at each time step.

The proposed method–GloDyNE considers both effectiveness and efficiency. Nevertheless, unlike the DNE methods that focus on the most affected nodes or sub-networks, GloDyNE additionally considers the accumulated topological changes in inactive sub-networks (i.e., no change occurs lasting for several time steps) for the better global topology preservation of a dynamic network at each time step.

Definition 1. A Static Network. Let be a static network where denotes a set of nodes or vertices, and denotes a set of edges or links. The adjacency matrix of is denoted as where is the weight of edge between a pair of nodes , and if , there is no edge between the two nodes. For the unweighted and undirected network, and .

Definition 2. A Dynamic Network. A dynamic network is represented by a sequence of snapshots taken at each time step , i.e., . Each snapshot can be treated as a static network.

Definition 3. Static Network Embedding. The static network embedding aims to find a mapping function , where ,

, and each row vector

is the node embedding for , such that the pairwise similarity of node embeddings in best preserves the pairwise topological similarity of the nodes in .Definition 4. Dynamic Network Embedding. The DNE problem, under an incremental learning paradigm, can be defined as where is the latest node embeddings, and are the model and embeddings from the last time step respectively. The main objective of DNE in this work is to efficiently update node embeddings at each current time step , such that the pairwise similarity of node embeddings in best preserves the pairwise topological similarity of the nodes in .

Definition 5. Sub-networks of A Snapshot. Let denote a sub-network of a snapshot . All sub-networks of a snapshot , after network partition [bulucc2016recent], should be non-overlapping, i.e, , . And their node sets should satisfy .

The proposed DNE method–GloDyNE consists of four important components which are introduced step by step in Section 4.1. Intuitively, Step 1 and 2 ensure the selected nodes diversely distributed over a network, and meanwhile, bias to the nodes with larger accumulated topological changes for each sub-network. Step 3 encodes the latest topologies around the selected nodes into random walks, which are then decoded by a sliding window and the SGNS model for incrementally training node embeddings as described in Step 4. Note that, these four steps would be repeatedly executed at each time step. The implementation details (via pseudocode) and complexity analysis are presented in Section 4.2 and 4.3 respectively.

In order to realize inactivate sub-networks of a snapshot , it is needed to divide into sub-networks where is the number of sub-networks of a snapshot. The sub-networks are desirable to be non-overlapped and to cover all nodes in the original snapshot as defined in Definition 5, so that the later Step 2 can select unique nodes from each sub-network and the later Step 3 is easier to explore the whole snapshot based on the selected nodes from each sub-network. A network partition algorithm [bulucc2016recent] is therefore used to achieve the desirable goals. The most common objective function is to minimize the edge cut and can be formulated as

(1) |

where the subscripts indicate node ID and indicate sub-network ID. Note that Eq. (1) should subject to two constraints , , and for the reasons as discussed above.

Moreover, an additional constraint of balanced sub-networks is introduced to let the number of nodes be similar among all sub-networks, so as to facilitate the later steps to fairly explore all sub-networks and hence better preserve the global topology. The third constraint of balanced sub-networks can be defined as

(2) |

where is the number of nodes in and is the tolerance parameter. Note that, if is 0, network partitions are perfectly balanced. In practice, is set to a small number to allow a slight violation. However, such a balanced network partition is a NP hard problem [bulucc2016recent]. In order to address this problem, METIS algorithm [karypis1998fast] is employed. There are roughly three steps. Firstly, the coarsening phase, the original network is recursively transformed into a series of smaller and smaller abstract networks, via collapsing nodes with common neighbors into one collapsed node until the abstract network is small enough. Secondly, the partition phase, a -way partition algorithm is applied on the smallest abstract network to get the initial partition of sub-networks. Thirdly, the uncoarsening phase, it recursively expands the smallest abstract network back to the original network, and meanwhile recursively swaps the collapsed nodes (or the original nodes lastly) at the boarder of sub-networks between two neighboring sub-networks, so as to minimize the edge cut as describe in Eq. (1).

In order to ensure the selected nodes diversely distributed over a snapshot , one natural idea is to select one representative node from each sub-network. As a result, the total number of selected nodes is . To increase the total number of selected nodes, one can simply increase the number of sub-networks by network partition. We let , so that can freely control the total number of selected nodes for the trade-off between effectiveness and efficiency.

The problem now becomes as how to select one representative node from a sub-network. According to the recent DNE works such as [du2018dynamic, mahdavi2018dynnode2vec, yu2018netwalk], the nodes affected greatly by edge steams are selected for updating their embeddings, since their topologies are altered greatly. Similarly, in this work, the representative node to be selected is biased to the node with larger topological changes. Motivating by the concept of inertia^{1}^{1}1Here the node degree is regarded as the inertia of this node. from Physics, an efficient scoring function is designed to evaluate the accumulated topological changes of a node in a current snapshot as follows

(3) | ||||

where the reservoir stores the accumulated changes^{2}^{2}2The accumulated changes in reservoir are used to fix the case when a node has small changes at each time step for a long time, which will greatly affect network topology but may be ignored if not recorded. of up to . For simplicity, we treat as an undirected and unweighted network^{3}^{3}3If one wants to consider edge’s weight in Eq. (3), let where the first term gives the total weight changes of ’ neighbors presented at both and , and presented at but not presented at ; while the second term gives the total weight changes of ’ neighbors presented at but not presented at . The operator on a set gives its cardinal number, and on a scalar gives its absolute value., so that the current changes of at , denoted as , can be easily obtained by the set operations on neighbors of as shown in Eq. (3), which is equivalent to count the number of the edges with node from current edge steams . The representative node of a sub-network is then selected based on the probability distribution over its node set , i.e.,

(4) |

where is Euler’s number and is the score of the accumulated topological changes of node given by Eq. (3). Note that, if , and

, so that even for an inactivate sub-network with no change at all nodes, the probability distribution over this sub-network is still a valid uniform distribution. Intuitively, within a sub-network, the higher score of a node given by Eq. (

3) is, the higher probability of this node will be selected as the representative node for this sub-network. Because one representative node from each sub-network is selected, all the selected nodes are therefore diversely distributed over the whole snapshot, and meanwhile, biased to the larger accumulated topological changes for each sub-network.Given the selected representative nodes from Step 2, this step will explain how to capture the topological changes based on the selected nodes. As the topological changes at the selected nodes can propagate to other nodes via the high-order proximity, the truncated random walk sampling [perozzi2014deepwalk] (instead of edge sampling [tang2015line]) strategy is employed to capture the topological changes around (instead of at) the selected nodes. Concretely, for each selected node, truncated random walks with length are conducted starting from the selected node. For a random walk, the next node is sampled based on the probability distribution over its previous node’s neighbors , i.e.,

(5) |

After Step 3, the latest topological information around the selected nodes is encoded in random walks. Step 4 aims to utilize the random walks to update node embeddings. Following [perozzi2014deepwalk] and [grover2016node2vec], a sliding window with length is used to slide along each walk (i.e., node sequence), and the positive node-pair samples in a set is built via where . As a result, the node-pair samples can encode -order proximity of a given center node with another node. Note that, several network embedding works have shown the advantage of using the high-order proximity [cao2015grarep, ou2016asymmetric, zhang2018arbitrary].

Assuming the observations of node pairs in are mutually independent [grover2016node2vec], the objective function to maximize the node co-occurrence log probability over all node pairs in can be written as

(6) |

where is the center node, and is another node with order proximity to . Unlike [perozzi2014deepwalk] which defines as a softmax, we follow [levy2014neural] to treat it as a binary classification problem, so as to further reduce the complexity. Concretely, it aims to distinguish a positive sample from negative samples s. The probability of observing a positive sample can be defined as

(7) |

where is the node embedding vector parameterized by the mapping function , the operator represents the dot product between two vectors, and gives the probability of a positive prediction given a positive sample . Likewise, the probability of observing a negative sample can be defined as

(8) |

where gives the probability of a negative prediction given a negative sample . The above Skip-Gram Negative Sampling (SGNS) model [levy2014neural] then try to maximize for each positive sample in and for the negative samples corresponding to each positive sample, i.e.,

(9) |

where negative samples are drawn from a unigram distribution [levy2014neural]. The overall objective of SGNS is to sum over all positive samples and their corresponding negative samples, i.e.,

(10) |

where denotes the number of times a positive sample occurs in . Intuitively, the more frequently a pair of nodes co-occurs, the closer their embeddings should be.

Finally, we extend the SGNS model as described above to an incremental learning paradigm. The overall framework of GloDyNE can be formalized as

(11) |

where is the trained SGNS model from last time step, is the current embedding matrix directly taken from newly trained via an index operator, and and are the two consecutive snapshots for generating the edge steams if not directly given. The implementation details are presented in Section 4.2 and 4.3.

The pseudocode of GloDyNE is summarized in Algorithm 1

, and the open source code is provided at

https://github.com/houchengbin/GloDyNE.According to Eq. (11) and Algorithm 1, GloDyNE consists of two stages. During the offline stage, i.e., , Step 3 (specifically ) and Step 4 are employed to obtain the initial SGNS model and node embeddings. Lines 2-5 are indeed a static network embedding method–a modified version of DeepWalk [perozzi2014deepwalk], which trains a SGNS model instead of Skip-Gram Hierarchical Softmax (SGHS) model. As such, the time complexity of lines 2-5 is further reduced to [grover2016node2vec] where is due to one positive sample corresponding to negative samples.

During the online stage, i.e., , steps 1-4 are employed to incrementally update the SGNS model and node embeddings. For lines 7-8 corresponding to Step 1, the time complexity is [karypis1998fast] where . For lines 9-14 corresponding to Step 2, the time complexity of lines 9-10 is ; the time complexity of lines 11-13 using alias sampling method [grover2016node2vec] requires ; the time complexity of line 14 is due to . For lines 14-18 corresponding to Step 3 and Step 4, similarly to lines 2-5 above, the complexity of lines 14-18 is where . Because most real-world networks are sparse, edges in a snapshot such that the average degree is a very small number compared with . Besides, since the edge steams between two consecutive snapshots are often much less than the edges in the snapshot, edge steams such that . Regarding those real-world assumptions, the overall complexity of online stage at each time step can be approximated as where is used to control the number of selected nodes, denotes the number of nodes at , and others are negligible constants compared to . Consequently, GloDyNE is scalable w.r.t. , as there is no quadratic or higher term.

In this work, six real-world datasets are employed to empirically evaluate the effectiveness and efficiency of the proposed method. To construct the dynamic networks, except AS733 (which is given as the snapshot representation), all other ones (which are the edge steams ) are constructed by continuously adding edge steams to the existing network based on the ascending order of timestamps. The snapshots of a dynamic network are then taken at the specified timestamps successively, and the gap between the specified timestamps are identical. For each of six dynamic networks, the largest connected component is finally used in experiments, and the details are as follows.

AS733 contains 733 daily instances of the Autonomous System of routers exchanging traffic flows with neighbors. Since AS733 is directly given as the snapshot representation, we directly take out the recent 21 snapshots (13/Dec./1991–02/Jan./2000) to form its dynamic network. The initial snapshot has 1476 nodes and 3123 edges, and the final snapshot has 3570 nodes and 7033 edges. The original dataset comes from https://snap.stanford.edu/data/as-733.html.

Elec is the network of English Wikipedia users vote for and against each other in admin elections. The gap between the specified timestamps for taking snapshots is set to one calendar day. We take out the recent 21 snapshots (16/Dec./2007–05/Jan./2008) to form its dynamic network. The initial snapshot has 6968 nodes and 98947 edges, and the final snapshot has 7058 nodes and 100521 edges. The original dataset comes from http://konect.uni-koblenz.de/networks/elec.

HepPh is a co-author network extracted from the papers of High Energy Physics Phenomenology in the arXiv. The gap between the specified timestamps for taking snapshots is set to one calendar day. We take out the recent 21 snapshots (11/Dec./1991–31/Dec./1991) to form its dynamic network. The initial snapshot has 16729 nodes and 1170677 edges, and the final snapshot has 16910 nodes and 1194275 edges. The original dataset comes from http://konect.uni-koblenz.de/networks/elec.

FBW is a social network of Facebook Wall posts where nodes are the users and edges are built based on the interactions in their wall posts. The gap between the specified timestamps for taking snapshots is set to one calendar day. We take out the 21 snapshots (01/Jan./2009–21/Jan./2009) to form its dynamic network. The initial snapshot has 41603 nodes and 169427 edges, and the final snapshot has 43889 nodes and 181974 edges. The original dataset comes from http://konect.uni-koblenz.de/networks/facebook-wosn-wall.

Cora is a citation network where each node represent a paper, a edge between two nodes representation a citation. Each paper is assigned with a label (from xx different labels) based on its field of the publication. Following [singer2019node], the gap between the specified timestamps for taking snapshots is set to one year. The 11 snapshots (1989–1999) are taken out to form its dynamic network. The initial snapshot has 348 nodes and 481 edges, and the final snapshot has 12022 nodes and 45421 edges. The original dataset comes from https://people.cs.umass.edu/~mccallum/data.html.

DBLP is a co-author network in computer science field. Each author is associated with a label (from 15 different labels). The label of an author is defined by the fields in which the author has the most publications. Following [singer2019node], the gap between the specified timestamps for taking snapshots is set to one year. The 12 snapshots (1984–1995) are taken out to form its dynamic network. The initial snapshot has 391 nodes and 848 edges, and the final snapshot has 24157 nodes and 53431 edges. The original dataset comes from https://dblp.org/xml/release/.

The proposed DNE method–GloDyNE is compared with five state-of-the-art DNE methods for demonstrating the effectiveness and efficiency. All the compared methods can be regarded as the unsupervised approach, because no explicit node label is used to learn node embedding, and the learned node embeddings are not dedicated to a specific downstream task. The details of each method are as follows.

BCGD (2016) [zhu2016scalable]: The general objective of BCGD is to minimize the quadratic loss of reconstructing the network proximity matrix using the node embedding matrix with a temporal regularization term. BCGD (or BCGD-global) employs all historical snapshots to jointly and cyclically update embeddings for all time steps.

BCGD (2016) [zhu2016scalable]: Unlike BCGD but following the same general objective of BCGD as above, BCGD (or BCGD-local) iteratively employs the previous snapshot and initializes current embeddings with the previous embeddings to update embeddings for current time step.

DynAE (2017) [goyal2017dyngem]: This work proposes a strategy to modify the structure of a deep auto-encoder model based on the size of a current snapshot. At each time step, the auto-encoder model is initialized by its previous model. DynAE continuously trains the adaptive auto-encoder model based on the existing edges in a current snapshot.

DynTriad (2018) [zhou2018dynamic]: DynTriad models the triadic closure process, social homophily, and temporal smoothness in its objective function to learn node embeddings at each time step. It optimizes the objective function according to the existing edges of each snapshot respectively.

tNodeEmbed (2019) [singer2019node]: tNodeEmbed runs a static Network Embedding method to obtain node embeddings at each time step, and then exploits the temporal dependence among all currently available static node embeddings using Recurrent Neural Networks to obtain the final node embeddings for current time step.

The original open source codes with the default settings of BCGD^{4}^{4}4https://github.com/linhongseba/Temporal-Network-Embedding, DynAE^{5}^{5}5https://github.com/palash1992/DynamicGEM, DynTriad^{6}^{6}6https://github.com/luckiezhou/DynamicTriad, and tNodeEmbed^{7}^{7}7https://github.com/urielsinger/tNodeEmbed are adopted in the experiments. Note that, BCGD and BCGD are two proposed algorithms in BCGD correspondingly to the type of algorithm 2 and 4. Moreover, we adopt the link prediction architecture of tNodeEmbed to obtain node embeddings, so that all methods only use network linkage information as the supervised signal to learn node embeddings. Furthermore, for fair comparison, the dimensionality of node embeddings is set to 128 for all methods.

Regarding our method–GloDyNE, following [perozzi2014deepwalk] and [grover2016node2vec], the hyper-parameters of walks per node, walk length, window size, and negative samples are set to 10, 80, 10, and 5 respectively. The hyper-parameter to control the number of selected nodes for freely trade-off between effectiveness and efficiency, is set to 0.1 unless otherwise specified.

In this section, three typical types of downstream tasks are employed to evaluate the quality of obtained node embeddings by the six methods on the six datasets. In particular, the graph reconstruction task is used to demonstrate the ability of global topology preservation, while the link prediction task and node classification task are used to show the benefit of global topology preservation. For fairness, we first take out the node embeddings obtained by each method respectively, and then feed them to exactly the same downstream tasks with the same training and testing sets. The above process is repeated for 10 runs, and we report their average results in Section 5.2.1, 5.2.2, and 5.2.3. Moreover, the average results of the wall-clock time to obtain node embeddings by each method, are reported in Section 5.2.4 for comparing the efficiency of the implementation of the six methods.

All experiments in Section 5.2 are conducted in the following hardware specification. For all methods, we enable 32 Intel-Xeon-E5-2.2GHz CPUs and 512G memory. In addition, for DynAE, DynTriad, and tNodeEmbed that can use GPU for acceleration, we also enable 1 Nvidia-Tesla-P100 GPU with 16G memory. The N/A values for tNodeEmbed on AS733 are due to that tNodeEmbed cannot handle node deletions. The N/A values for DynAE on HepPh, DBLP, and FBW are because of running out of GPU memory.

In order to demonstrate the ability of the global topology preservation of each method, one possible way is to use the obtained node embeddings to reconstruct the original network. For this purpose, precision at or is used as the metric to evaluate how well the top- similar nodes of each node in the embedding space can match the ground-truth neighbors of each node in the original network [zhang2018arbitrary, zhu2018high, cao2015grarep]. Concretely, , where gives a set of the top- similar nodes of a queried node

based on the cosine similarity between node embeddings, and

denotes a set of the ground-truth neighbors of . To show the ability of global topology preservation, we further calculate the mean of over all nodes in a current snapshot, i.e., Mean where is a set of all nodes in a current snapshot , and counts the number of nodes in . Note that, each result shown in Table I is calculated by the mean of Mean over all time steps and over 10 runs, and finally, Mean, Mean, Mean, and Mean are employed.AS733 | Elec | Cora | HepPh | DBLP | FBW | |
---|---|---|---|---|---|---|

Mean | ||||||

BCGD | 1.62 | 8.32 | 11.78 | 28.06 | 5.92 | 0.14 |

BCGD | 38.63 | 17.35 | 8.12 | 58.95 | 2.62 | 4.80 |

DynAE | 0.55 | 3.36 | 10.17 | N/A | N/A | N/A |

DynTriad | 58.99 | 60.44 | 51.77 | 62.06 | 69.33 | 54.15 |

tNodeEmbed | N/A | 5.62 | 58.86 | 50.12 | 64.76 | 25.28 |

GloDyNE | 65.54 | 57.63 | 76.60 | 65.11 | 77.00 | 80.03 |

Mean | ||||||

BCGD | 2.51 | 9.32 | 21.06 | 31.50 | 16.95 | 0.12 |

BCGD | 49.55 | 17.62 | 9.69 | 61.44 | 5.20 | 4.40 |

DynAE | 0.59 | 3.62 | 10.39 | N/A | N/A | N/A |

DynTriad | 65.80 | 64.55 | 55.09 | 66.43 | 74.79 | 54.39 |

tNodeEmbed | N/A | 5.81 | 69.44 | 50.12 | 80.09 | 26.40 |

GloDyNE | 76.56 | 68.39 | 88.03 | 69.32 | 93.02 | 87.06 |

Mean | ||||||

BCGD | 43.70 | 9.23 | 29.55 | 32.85 | 30.56 | 0.11 |

BCGD | 66.52 | 18.42 | 15.55 | 59.76 | 13.36 | 3.86 |

DynAE | 0.63 | 3.64 | 11.81 | N/A | N/A | N/A |

DynTriad | 72.38 | 67.92 | 60.19 | 66.95 | 79.26 | 56.60 |

tNodeEmbed | N/A | 5.90 | 79.36 | 47.20 | 88.39 | 28.85 |

GloDyNE | 84.64 | 73.20 | 96.27 | 69.92 | 98.85 | 91.24 |

Mean | ||||||

BCGD | 89.29 | 8.64 | 40.15 | 32.39 | 43.73 | 0.12 |

BCGD | 80.51 | 26.39 | 24.87 | 57.45 | 29.39 | 4.11 |

DynAE | 0.78 | 3.60 | 14.17 | N/A | N/A | N/A |

DynTriad | 78.74 | 71.74 | 65.92 | 66.75 | 83.26 | 60.68 |

tNodeEmbed | N/A | 6.36 | 84.72 | 43.22 | 91.40 | 32.61 |

GloDyNE | 90.51 | 76.59 | 98.76 | 70.40 | 99.86 | 94.84 |

In general, GloDyNE significantly outperforms all other methods on all datasets, except that it obtains the second best result on Elec dataset under Mean metric^{8}^{8}8The tendency of recall at or is exactly the same as , since the only difference between them is in the denominator. For , the denominator is .. Specifically, Mean measures how well the top-5 similar nodes of each node in the embedding space can match the ground-truth neighbors of each node in the original network. For Elec dataset, DynTriad outperforms GloDyNE by 2.81 under Mean, however, GloDyNE outperforms DynTriad by 3.84, 5.28, 4.85 under Mean, Mean, Mean respectively. For other five datasets, it is easy to verify that GloDyNE consistently achieves the best results (often with a large margin over 10) compared with all other methods under all four metrics.

The main reason of such superiority of GloDyNE in the GR task is due to that GloDyNE is designed to better preserve the global topology of a dynamic network at each time step, while the GR task is also used for demonstrating the ability of global topology preservation.

The (dynamic) LP task aims to predict future edges at time step using the obtained node embeddings at . The testing edges include both added and removed edges from to , plus other edges randomly sampled from the snapshot at for balancing existent edges (or positive samples) and non-existent edges (or negative samples). The LP task is then evaluated by Area under the ROC Curve (AUC) score based on the cosine similarity between node embeddings [zhu2016scalable, Fu2019Learning, liao2018attributed]. Each result shown in Table II is calculated by the mean of AUC scores over all time steps and over 10 runs.

AS733 | Elec | Cora | HepPh | DBLP | FBW | |
---|---|---|---|---|---|---|

BCGD | 69.96 | 82.65 | 68.37 | 79.74 | 66.71 | 82.26 |

BCGD | 62.24 | 87.69 | 80.94 | 89.23 | 88.25 | 83.41 |

DynAE | 59.70 | 60.56 | 59.09 | N/A | N/A | N/A |

DynTriad | 64.69 | 94.87 | 62.74 | 87.60 | 59.99 | 77.58 |

tNodeEmbed | N/A | 80.88 | 51.62 | 85.04 | 56.82 | 73.37 |

GloDyNE | 81.94 | 87.20 | 93.59 | 88.84 | 77.05 | 88.00 |

GloDyNE obtains either the best result (on 3 datasets) or the acceptable result compared to other methods. Specifically, GloDyNE outperforms the second best method on AS733, Cora, and FBW by 11.98, 12.56, and 4.59 respectively. For other three datasets, GloDyNE obtains the third best result on Elec (only 0.49 gap from the second best result), the second best result on HepPh (only 0.39 gap from the best result), and the second best result on DBLP.

Overall, GloDyNE is also a good method for the (dynamic) LP task on most datasets, thanks to the high-order proximities being used for better preserving the global topology [cao2015grarep]. In fact, the high-order proximity between nodes is an important temporal feature for predicting future edges. For example, the triadic closure process which tries to predict the third edge among three nodes if there have already been two edges among them, as modelled in DynTriad [zhou2018dynamic], can be easily realized by considering the second-order proximity via setting and (see Section 4.1.3 and 4.1.4). In the experiments, we set and . As a result, much higher order proximities (up to 10 order according to ) are considered for better preserving the global topology, which therefore provides more advanced temporal features (analogous to triadic closure process) to improve the performance of GloDyNE in LP tasks on most datasets. However, this kind of temporal features might not be always very useful, e.g., both GloDyNE and DynTriad receive degraded performance on DBLP dataset.

The NC task aims to infer the most likely label for the nodes without labels. Specifically, , , and

nodes are randomly picked respectively to train a one-vs-rest logistic regression classifier based on their embeddings and labels. The left nodes respectively are treated as the testing set. At each time step, the latest node embeddings are employed as the input features to logistic regression classifier. The prediction of the trained classifier over the testing set are evaluated by Micro-F1 and Macro-F1

[perozzi2014deepwalk, grover2016node2vec, singer2019node] respectively. Each result shown in Table III is calculated by the mean of a F1 metric over all time steps and over 10 runs.Cora | DBLP | |||||
---|---|---|---|---|---|---|

0.5 | 0.7 | 0.9 | 0.5 | 0.7 | 0.9 | |

Micro-F1 | ||||||

BCGD | 32.12 | 32.82 | 31.99 | 49.63 | 49.41 | 50.35 |

BCGD | 36.76 | 37.15 | 37.28 | 50.91 | 49.96 | 51.03 |

DynAE | 33.74 | 34.83 | 34.64 | N/A | N/A | N/A |

DynTriad | 35.91 | 35.85 | 36.17 | 50.84 | 50.91 | 51.17 |

tNodeEmbed | 66.28 | 65.53 | 65.19 | 58.82 | 58.01 | 58.45 |

GloDyNE | 73.88 | 73.87 | 73.89 | 59.49 | 59.27 | 59.93 |

Macro-F1 | ||||||

BCGD | 7.95 | 8.48 | 8.02 | 10.29 | 10.15 | 10.19 |

BCGD | 12.20 | 12.38 | 12.59 | 11.36 | 11.24 | 11.27 |

DynAE | 7.04 | 7.49 | 7.61 | N/A | N/A | N/A |

DynTriad | 15.61 | 15.92 | 16.23 | 14.39 | 14.87 | 14.46 |

tNodeEmbed | 52.00 | 51.40 | 51.81 | 23.91 | 23.69 | 23.96 |

GloDyNE | 60.76 | 62.20 | 61.09 | 26.60 | 27.39 | 26.57 |

It is obvious that GloDyNE obtains the best result for all cases on Cora and DBLP datasets, which shows the benefit of global topology preservation in NC tasks. Comparing Cora and DBLP, GloDyNE achieves better performance on Cora than DBLP. The reason is that Cora is a citation network where the label/field of nodes/papers contains less noise (the field of a journal or conference often remains the same), while DBLP is a co-author network where the label/field of nodes/authors contains more noise (the field of an author varies over time or an author with few papers is not accurate). Note that, the approach to construct the dynamic network of Cora and DBLP, and to generate node labels are described in Section 5.1.1.

To conduct the downstream tasks in Section 5.2.1, 5.2.2, and 5.2.3, the common step is to first obtain node embeddings which serve as the low dimensional hidden features of each node in the downstream tasks. In this section, the wall-clock time (or running time) of obtaining node embeddings over all time steps are reported, and each result shown in Table IV is given by the mean over 10 runs.

AS733 | Elec | Cora | HepPh | DBLP | FBW | |

BCGD | 2402 | 6111 | 3868 | 25972 | 6682 | 40975 |

BCGD | 704 | 2314 | 1543 | 15228 | 2878 | 13569 |

DynAE | 536 | 9473 | 1126 | N/A | N/A | N/A |

DynTriad | 156 | 2546 | 233 | 34718 | 377 | 4458 |

tNodeEmbed | N/A | 3734 | 892 | 9224 | 1377 | 48631 |

GloDyNE | 62 | 205 | 105 | 1254 | 155 | 923 |

of nodes | 45k | 147k | 66k | 353k | 92k | 900k |

of edges | 91k | 2093k | 216k | 24888k | 202k | 3690k |

According to Table IV, GloDyNE is the most efficient method among all methods on all datasets. In addition, the superiority of efficiency of GloDyNE grows, as the size of a dynamic network (given by the number of nodes or edges over all snapshots) grows. There are two reasons to explain the observations: 1) GloDyNE is scalable since there is no quadratic or higher term appeared in and as analyzed in Section 4.3; and 2) the Step 3 and Step 4 in Section 4.1 are parallelized in the implementation of GloDyNE.

To better visualize the comparison among the six methods in terms of both effectiveness and efficiency, we further make the scatter plots as shown in Figure 2 based on the quantitative results in Section 5.2.1, 5.2.2, 5.2.3, and 5.2.4. Note that, the experiments in these sections are conducted under the fair hardware environment as mentioned in Section 5.2. Besides, the N/A values shown in the tables are now located at the most bottom-left corner (i.e., the origin) such as t-NodeEmbed on AS733 in GR-Mean task and DynAE on HepPh in LP-AUC task.

It is worth noticing that Figure 2 only shows the two tasks, i.e., GR-Mean and LP-AUC for further illustration, since GloDyNE does not always obtain the best results in terms of effectiveness on all datasets in the two tasks. Concretely, we make the following observations according to Figure 2, Table I, Table II, and Table IV. Firstly, overall, GloDyNE is the best choice for the all twelve sub-figures, if one prefers efficiency (ranked at top-1 for all cases) as well as considers effectiveness (ranked at top-1 for most cases and at least top-3 for all cases). Secondly, in the GR-Mean task, GloDyNE is outperformed by DynTriad on Elec by . However, GloDyNE is faster than DynTriad. Thirdly, in the LP-AUC task, GloDyNE is outperformed by DynTriad, BCGD and BCGD on Elec, HepPh and DBLP by , and respectively. However, GloDyNE is , and faster than DynTriad, BCGD and BCGD on Elec, HepPh and DBLP respectively.

Last but not least, it is also worth noticing that, for other nine tasks such as GR-Mean and NC-0.5-Micro-F1, although their plots are omitted, one can easily image that GloDyNE is always located at the most top-left corner, i.e., it is always the best choice in terms of both effectiveness and efficiency on all datasets in all nine tasks.

In this section, we further investigate the proposed method–GloDyNE. Because GloDyNE is proposed to better preserve the global topology, we will focus on the ability of global topology preservation, and thus adopt the graph reconstruction task to quantify the effectiveness. Besides, due to the good time and space efficiency of GloDyNE and its variants, all experiments in this section are conducted with less expensive hardware specification: 16 Intel-Xeon-E5-2.2GHz CPUs and 8G memory. All experiments are repeated for 20 runs, and their average results are reported.

One advantage of DNE is that, it promptly updates node embeddings at each time step, so that the latest node embeddings can better reflect the original network topology at each time step. To demonstrate this point, two variants of GloDyNE based on the SGNS model, namely SGNS-static and SGNS-retrain, are used for comparison. For SGNS-static, we only perform the part of Algorithm 1, and the obtained node embeddings at will be identically used in the downstream task at each time step. For SGNS-retrain, we repeatedly perform the part of Algorithm 1 at each time step, and the obtain node embeddings at each time step will be used in the downstream task at each time step respectively.

According to Figure 3, SGNS-retrain outperforms SGNS-static on both tested datasets. For AS733, SGNS-retrain maintains the performance at a superior level all the time under Mean and Mean, whereas the performance of SGNS-static suddenly decreases at and then maintains a poor level afterward. For Elec, SGNS-retrain maintains the performance at a superior level all the time, whereas the performance of SGNS-static gradually decreases. The difference of the sudden drops on AS733 and the gradual drops on Elec is due to that the network topology between two consecutive time steps on AS733 varies more severely than on Elec (see Section 5.1.1), so that the obtained node embeddings at is less useful afterward. Consequently, it is needed to promptly update node embeddings at each time step (i.e., the necessity of DNE) as what SGNS-retrain–the naive DNE method does.

Instead of SGNS-retrain, recent DNE methods often adopt the incremental learning paradigm by continuously training the previous model on a new training set. Accordingly, another baseline–SGNS-increment, which follows Algorithm 1 but replaces all operations in lines 4-17 with . The difference between SGNS-increment and SGNS-retrain is whether they reuse the previous model as the initialization of next model.

According to Figure 4, SGNS-increment outperforms SGNS-retrain on both tested datasets. The general tendency on the two datasets under the two metrics are the same, although the performances of SGNS-increment and SGNS-retrain are both less stable on AS733 than on Elec, due to the larger variations between two consecutive snapshots on AS733 than on Elec (see Section 5.1.1). These observations show that reusing the previous model as the initialization of next model might be not only useful for a dynamic network with small dynamic changes (e.g., Elec), but also useful for a dynamic network with large dynamic changes (e.g., AS733).

According to Figure 3 and Figure 4, the ranking of performances among the three baselines is as follows (from high to low): SGNS-increment, SGNS-retrain, and SGNS-static. Although, SGNS-increment (i.e., GloDyNE with ) achieves the best performance, it is not efficient enough since all nodes in a current snapshot are selected for conducting random walks and then training the SGNS model. One natural idea for further improving the efficiency is to select some representative nodes as the approximate solution, such that it can significantly reduce the running time but meanwhile, still retain a good performance. Consequently, in this work, we propose a node selecting strategy, denoted as , as described in Section 4.1.1 and Section 4.1.2.

In order to show the advantage of used in GloDyNE, the following baselines with different node selecting strategies are used for comparison. For fairness, the number of selected nodes at each time step is set to for all strategies. Concretely, selects the nodes randomly with replacement from the reservoir which records the most affected nodes (see Section 4.1.2); selects the nodes randomly without replacement from and then, from all nodes in a current snapshot if ; selects the nodes randomly without replacement from all nodes in a current snapshot. Intuitively, from the perspective of the diversity of selected nodes, due to 1) sampling nodes from cannot be aware of inactive sub-networks which exist in many real-world dynamic networks; 2) sampling nodes from all nodes in a current snapshot cannot guarantee the selected nodes have an enough distance from each other; 3) sampling one node from each sub-network after network partition as introduced in , however, can ensure the selected nodes have an enough distance from each other.

To compare the performance of GloDyNE with different node selecting strategies, the length of random walks (see Section 4.1.3) should be also considered. Because as increases, the generated random walks (or node sequences) become less distinguishable. An extreme case is that, if goes to infinity, a random walker starting from any node in a network can well explore its global topology. As a result, we compare the four different node selecting strategies w.r.t. different s as shown in Table V.

AS733 | Elec | |||||||

Mean | ||||||||

3 | 17.71 | 22.78 | 24.74 | 25.74 | 4.17 | 5.83 | 6.05 | 6.11 |

5 | 45.52 | 46.84 | 47.49 | 47.85 | 6.12 | 10.57 | 11.01 | 11.27 |

8 | 51.47 | 51.72 | 51.81 | 51.88 | 8.31 | 13.54 | 14.16 | 14.41 |

10 | 52.02 | 52.30 | 52.54 | 52.56 | 9.68 | 16.54 | 17.31 | 17.65 |

15 | 56.79 | 57.23 | 57.58 | 57.72 | 17.09 | 26.69 | 27.47 | 27.70 |

20 | 63.34 | 63.82 | 64.38 | 64.40 | 28.71 | 37.35 | 37.84 | 37.97 |

30 | 73.04 | 73.31 | 73.66 | 73.69 | 49.09 | 52.90 | 53.04 | 53.16 |

40 | 77.79 | 78.19 | 78.63 | 78.45 | 58.93 | 60.80 | 60.89 | 60.90 |

50 | 80.49 | 80.68 | 81.14 | 81.13 | 64.40 | 65.49 | 65.58 | 65.64 |

60 | 82.05 | 82.34 | 82.80 | 82.70 | 68.08 | 68.81 | 68.84 | 68.89 |

70 | 83.24 | 83.61 | 83.93 | 83.94 | 70.80 | 71.34 | 71.38 | 71.39 |

80 | 84.34 | 84.62 | 84.92 | 84.92 | 72.99 | 73.34 | 73.35 | 73.40 |

Mean | ||||||||

3 | 21.38 | 27.29 | 29.38 | 30.35 | 4.90 | 6.49 | 6.73 | 6.79 |

5 | 53.06 | 54.28 | 54.67 | 55.02 | 5.81 | 11.30 | 11.83 | 12.14 |

8 | 58.60 | 58.96 | 59.08 | 59.15 | 7.42 | 14.30 | 15.07 | 15.37 |

10 | 60.01 | 60.38 | 60.51 | 60.64 | 9.17 | 18.34 | 19.31 | 19.71 |

15 | 66.21 | 66.68 | 67.36 | 67.36 | 19.69 | 31.60 | 32.46 | 32.73 |

20 | 73.31 | 73.65 | 74.37 | 74.31 | 35.03 | 44.23 | 44.64 | 44.75 |

30 | 81.62 | 81.99 | 82.50 | 82.52 | 56.99 | 60.14 | 60.24 | 60.32 |

40 | 85.37 | 85.87 | 86.36 | 86.31 | 65.81 | 67.06 | 67.12 | 67.13 |

50 | 87.48 | 87.79 | 88.19 | 88.21 | 70.27 | 70.85 | 70.86 | 70.92 |

60 | 88.83 | 89.09 | 89.35 | 89.29 | 73.11 | 73.39 | 73.38 | 73.40 |

70 | 89.75 | 89.98 | 90.07 | 90.11 | 75.13 | 75.24 | 75.25 | 75.27 |

80 | 90.57 | 90.64 | 90.73 | 90.75 | 76.71 | 76.70 | 76.69 | 76.71 |

over all time steps and over 20 runs. The two-tailed and two-sample equal variance T-Test is applied to

and . and indicate the p-value of T-Test and respectively. Note that, is the proposed node selecting strategy of this work.There are three important observations from Table V. Firstly, the overall ranking of the performance among four strategies under a same is , which exactly matches the ranking of the diversity of selected nodes among them as discussed above. Secondly, as increases, the four strategies become less distinguishable, which verifies the analysis above of four strategies w.r.t. . Thirdly, under a smaller , statistically significantly^{9}^{9}9Two-tailed and two-sample equal variance T-Test is applied to and

with the null hypothesis that there is no statistically significant difference of the mean over 20 runs between

and . outperforms on both datasets, while as increases, is still a better choice than on Elec (but and become less distinguishable on AS733). It suggests that using with GloDyNE on a lager dataset (Elec is larger than AS733 as shown in Table IV) might gain more benefits.The hyper-parameter , which determines the number of selected nodes of GloDyNE at each time step, is designed for freely trade-off between effectiveness and efficiency. We vary from 0.1 to 1.0 with step 0.1, together with four additional small values: 0.001, 0.005, 0.01, and 0.05. Each bar as shown in Figure 5 has two results: the blue one shows the effectiveness which is measured by the mean of Mean over all time steps and over 20 runs; while the red one shows the efficiency which is measured by the mean over 20 runs of the total wall-clock time during all time steps.

According to Figure 5, it obviously demonstrates that the hyper-parameter can be used to freely compromise between effectiveness and efficiency. Note that, all experiments in the above sections set , which indicates one can obtain better results by increasing at the risk of consuming more running time. Besides, with this free hyper-parameter, one can compromise between effectiveness and efficiency to fulfill the real-world requirements, e.g., have to promptly update node embeddings within a specified period.

Furthermore, an interesting observation is that increasing to a certain level obtains a very competitive performance as (GloDyNE with is equivalent to SGNS-increment), but consumes much less running time. This observation also supports that GloDyNE especially the proposed node selecting strategy that selects partial nodes, makes a good approximation to SGNS-increment that selects all nodes for further computation.

This work proposed a new DNE method–GloDyNE, which aims to efficiently update node embeddings while better preserving the global topology of a dynamic network at each time step, by extending the SGNS model to an incremental learning paradigm. In particular, unlike all previous DNE methods, a novel node selecting strategy is proposed to diversely select the representative nodes over a network, so as to additionally considers the inactive sub-networks for better global topology preservation. The extensive experiments not only confirmed the effectiveness and efficiency of GloDyNE w.r.t. other five state-of-the-art DNE methods, but also verified the usefulness of some special designs or considerations in GloDyNE.

From a high-level view, GloDyNE can also be seen as a general DNE framework based on the incremental learning paradigm of SGNS model. Under this framework, one may design a different node selecting strategy to preserve other desirable topological features into node embeddings for a specific application. On the other hand, the idea of selecting diverse nodes could be adapted to other existing DNE methods for better global topology preservation. Besides, one more future work, according to Figure 5, is to further investigate why selecting partial nodes can receive almost the same performance or even the superior performance compared to selecting all nodes.

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2017YFB1003102), the Natural Science Foundation of China (Grant No. 61672478), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), the Shenzhen Peacock Plan (Grant No. KQTD2016112514355531) and the National Leading Youth Talent Support Program of China.

Comments

There are no comments yet.