HONEM: Network Embedding Using Higher-Order Patterns in Sequential Data

08/15/2019 ∙ by Mandana Saebi, et al. ∙ University of Notre Dame University of South Florida 5

Representation learning offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. This success is especially striking in the context of graph mining, since networks can take advantage of vast troves of sequential data to encode information about interactions between entities of interest. But how do we learn embeddings on networks that have higher-order and sequential dependencies? Existing network embedding methods naively assume the Markovian property (first-order dependency) for node interactions, which may not capture the time-dependent and longer-range underlying complex interactions of the raw data. To address the limitation of current methods, we propose a network embedding method for higher-order networks (HON). We demonstrate that the higher-order network embedding (HONEM) method is able to extract higher-order dependencies from HON to construct the higher-order neighborhood matrix of the network, while existing methods are not able to capture these higher-order dependencies. We show that our method outperforms other state-of-the-art methods in node classification, network reconstruction, link prediction, and visualization.



There are no comments yet.


page 1

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Networks are ubiquitous, representing interactions between components of a complex system. Applying machine learning algorithms on such networks has typically required a painstaking feature engineering task to develop feature vectors for learning algorithms. Motivated by this feature engineering task, recent research has focused on developing representation learning methods on such networks. These representation learning methods strive to discover the appropriate lower-dimensional embedding vectors, which are then used as feature vectors for application of machine learning algorithms for several tasks, such as node classification, link prediction, etc. However, the research on network representation learning has largely focused on first-order networks (FON), that is the networks where only first-order or dyadic interactions among the nodes are captured in network construction (Markovian process). Several embedding approaches strive to preserve the network structure and connectivity of the nodes 

[7, 28, 6, 36, 18, 22]. While preserving the higher-order proximity patterns in the network structure results in improved performance on several tasks, including link prediction, network reconstruction, and community detection tasks [6], these methods do not extend to higher-order networks that functionally incorporate the higher and variable orders of dependencies in the raw data.

Several recent papers have pointed out the weaknesses and limitations of FONs, and proposed higher-order network construction algorithms that have been demonstrated to be more accurate and effective in capturing the trends in the underlying raw data of a complex system [33, 25, 23, 26, 13]. However, higher-order modeling of complex networks brings forth further challenges, as there is no representation learning framework for such HONs. Thus we ask the following questions in this paper: 1) Are the existing embedding methods able to capture the higher-order network representation? 2) How to learn embeddings on higher-order networks?


To address the limitation of existing methods, we propose an embedding method, HONEM, for the higher-order network (HON) [33] representation of the raw data. The main idea of HONEM is to generate a low-dimensional embedding of the networks such that the temporal higher-order dependencies (represented in HON) are accurately preserved. Consider the following scenario. We are provided human trajectory/traffic data of the area around a University campus. Suppose from the trajectory data, we observe that students that live on-campus are more likely to visit the central library after visiting the downtown area, while people living in a certain residential area are more likely to go to the business area of the city after passing through the downtown area (assuming none of the four locations overlap with each other). In Figure  1, each of the nodes represents the following: C: on-campus dorm, B: residential area, A: downtown area, E: library, D: business area. Suppose we model such time-ordered dependencies as FON (Figure 1 (b)), and then try to infer second-order dependencies from FON structure to derive the node embeddings (as done in [7, 28, 36, 18]

). In the FON structure, both library and business area are two-steps away from campus dorms (or the residential area). Therefore, we may conclude that students living on-campus have an equal probability of visiting the library and the business area through downtown. As a result,

all the above mentioned methods based on FON will miss important higher-order dependencies or infer higher-order dependencies that do not exist in the original raw data. By modeling these interactions as HON (Figure 1 (c)), we observe that node C and E have a second-order dependency through node A—C. Similarly, node B and D have a second-order dependency through node A—B. There is not a second-order dependency between B and E, or C and D.

To summarize, the key contributions and properties of HONEM are as follows:

  1. Data-agnostic: HONEM extracts the actual order of dependency from the temporal patterns in raw data by allowing for variable orders of dependencies rather than a fixed order for the entire network, as used in prior work [28, 7, 3, 18].

  2. Scalable and parameter-free: HONEM does not require a seep through the parameter space of window length. HONEM

    also does not require any hyper-parameter tuning or sampling as is often the case with deep learning or random walk based embedding methods.

  3. Generalizable: HONEM embeddings are directly applicable to a variety of network analytics tasks such as network reconstruction, link prediction, node classification, and visualization.

Fig. 1: A toy example showing how higher-order neighborhood can be inferred from HON. From the sequential data provided in (a), we can construct both FON (b) and HON. From FON, it is not clear that only node C and E have a second-order dependency through node A—C. Similarly, only node B and D have a second-order dependency through node A—B. There is not a second-order dependency between B and E, or C and D. (c) The neighborhood information is inferred from HON

Ii Related work

Recent advances in graph mining have motivated the need to automate feature engineering from networks. This problem finds its roots in traditional dimensionality reduction techniques [24, 31, 11]. For example, LLE [24] represents each node as a linear combination of its immediate neighbors, and LE [2] uses the spectral properties of the Laplacian matrix of the network to derive node embeddings.

More recently, methods based on random walks, matrix factorization, and deep learning have been proposed as well, albeit applicable to FONs. DeepWalk [19] learned node embeddings by combining random walks with the skip-gram model [17]. Node2Vec [7]

extended this approach further, proposing to use biased random walks to capture any homophily and structural equivalence present in the network. A random walk based method for knowledge graph embedding is proposed in 

[35]. In contrast, factorization methods derive embeddings by factorizing a matrix that represents the connections between nodes. GF [1] explicitly factorizes the adjacency matrix of the FON. LINE [28] attempts to preserve both first-order and second-order proximities by defining explicit functions. GraRep [3] and HOPE [18] go beyond second-order, and factorize a similarity matrix containing higher-order proximites. Walklets [20] approximates the higher-order proximity matrix by skipping over some nodes in the network. Qiu et al., [21] show that LINE, Node2Vec, DeepWalk, and PTE [27] are implicitly factorizing a higher-order proximity matrix of the network. A new crop of methods has been proposed recently that allows for dependencies of arbitrary order [36, 3]. However, this order needs to be set by the user beforehand. Therefore, these methods are unable to extract the order of the system from raw sequential data and fail to identify the higher-order dependencies of the network without trial and error. HONE [22] uses motifs as higher-order structures, however these motifs do not capture temporal higher-order dependencies. In addition, several deep learning-based methods have also been proposed. SDNE [29] uses auto-encoders to preserve first-order and second-order proxmities. DNGR [4]

combines auto-encoders with random surfing to capture higher-order proximities beyond second-order. However, both methods present high computational complexity. Models based on Convolutional Neural Networks (CNN) were proposed to address the complexity issue 

[10, 9, 14, 8].

Finally, dynamic approaches have been recently proposed to capture the evolution of the network with embeddings [15, 34, 38, 37]. These methods still feature a computationally demanding task of dynamic network modeling. Furthermore, these methods are developed based on FON and require specification of a time window, making them data dependent.

To the best of our knowledge, there is a gap in the literature when it comes to approaches to representation learning that capture the higher-order dependencies over time without dynamic network modeling. HONEM fills an important and critical gap in the literature by addressing the challenges of learning embeddings from the higher-order dependencies in a network, thereby providing a more accurate and effective embedding.

Iii higher-order Network Embedding: Honem

In summary, the HONEM algorithm comprises of the following steps:

  1. Extraction of the higher-order dependencies from the raw data.

  2. Generation of a higher-order neighborhood matrix given the extracted dependencies.

  3. Applying truncated SVD on the higher-order neighborhood matrix of the network to discover embeddings, which can then be used by machine learning algorithms.

Iii-a Preliminaries

Let us consider a set of interacting entities and a set of variable-length sequences of interactions between entities. Given the raw sequential data, the HON can be represented as with edges and nodes of various orders, in which a node can represent a sequence of interactions (path). For example, a higher-order node represents the fact that node is being visited given that node was previously visited, while a higher-order node represents the node given previously visited nodes and . In this context, a first-order node is shown by node , in which the notation “ ” indicates that no previous step information is included in the data.

Using these higher-order nodes and edges in , our goal is to learn embeddings for nodes in the first-order network, . Keep in mind that, , as several nodes in will correspond to a node in . For example, all HON nodes , and represent node in the FON. It is important to highlight this connection between HON nodes and their FON counterparts. Indeed, we are interested in evaluating our embeddings in a number of machine learning tasks — such as node classification and link prediction — that are formulated in terms of FON nodes, for example the class label information is available on (and not ). Therefore, it is important to eventually obtain embeddings for FON nodes.

One approach to address the above challenge is to learn embeddings on higher-order nodes using existing network embedding methods and then combine them to derive the embedding for node . We experimented with this approach using different method of combining HON embeddings (max, mean, weighted mean) and realized that it does not scale to large networks, as the number of higher-order nodes can be much higher than that of first-order nodes. We therefore refrain from constructing the HON directly, and modify the “rule extraction” step in the HON algorithm to generate the higher-order dependencies and the higher-order neighborhood matrix.

Iii-B Extracting high-order dependencies

The first step of the HONEM framework is to extract higher-order dependencies from the raw sequential data. To accomplish this task, we modify the rule extraction step in the HON construction algorithm [33]. We briefly explain the rule extraction in the HON algorithm below:

Rule Extraction (HON): In the first-order network, all the nodes are assumed to be connected through pairwise interactions.

In order to discover the higher-order dependencies in the sequential data, given a pathway of order : , we follow the steps below:

  1. Step 1: Count all the observed paths of length= (where is the MaxOrder) in the sequential data.

  2. Step 2: Calculate probability distributions

    for next step in each path, given the current and previous steps.

  3. Step 3: Extent the current path by checking whether including a previous step and extending to (of order ) will significantly change the normalized count of movements (or the probability distribution,

    ). To detect a significant change, the Kullback-Leibler divergence 

    [12] of and , defined as , is compared with a dynamic threshold, . If is larger than , order is assumed as the new order of dependency, and will be extended to .

This procedure is repeated recursively until a pre-defined parameter, MaxOrder is reached. However, the new parameter-free version of the algorithm (which is used in the paper) does not require setting a pre-defined Max-Order, and extracts the MaxOrder automatically for each sequence. The parameter refers to the number of times the path appears in the raw trajectories. The threshold assures that higher-orders are only considered if they have sufficient support, which is set with the parameter . Patterns less frequent than are discarded. For an example of this procedure, refer to supplementary materials in section VII-A.

The above method only accepts dependencies that are significant and that have occurred a sufficient enough number of times. This is required to ensure that any random pattern in the data will not appear as a spurious dependency rule. Furthermore, this method admits dependencies of variable order for different paths. Using this approach, we extract all possible higher-order dependencies from the sequential data. These dependencies are then used to construct the HON. For example, the edge in the HON corresponds to the rule — in other words, and are connected through a second-order path.

Modified Rule Extraction for HONEM: In HONEM framework, we modify the standard HON rule extraction approach by preserving all lower orders when including any higher-order dependency. This is motivated by a limitation of the previously proposed HON algorithm [33]. In the original HON rule extraction algorithm, after extracting all dependencies, the HON is constructed with the assumption that if higher-orders are discovered, all the lower orders (except the first-order) are ignored. However, discovering a higher-order path between two nodes does not imply that the nodes cannot be connected through shorter pathways. For example, if and are connected through the third-order path , and a second-order path , they have a second-order dependency as well as a third-order dependency.

Note that, in HONEM we extract the higher-order decencies from the sequential data and not from the first-order network topology, as is done by other methods in the literature [18, 7, 19, 28]. Therefore, our notion of “higher-order dependecies” refers to such dependencies that are extracted from sequential data over time. Although these methods are able to improve performance by preserving higher-order distances between nodes given the topology of the first-order network, they are unable to capture dependencies over time. This is important because not all the connections through higher-order pathways will occur if they do not exist in the raw sequential data in the first place.

Iii-C Higher-order neighborhood matrix

In the second step of our framework, we design a mechanism for encoding these higher-order dependencies into a neighborhood matrix. In this context, we refer to higher-order dependencies as higher-order distances. We define a -order neighborhood matrix as , in which the element represents the -order distance between nodes and . Intuitively, is the first-order adjacency matrix. We derive the neighborhood matrices of various orders until the maximum existing order in the network, , is reached. The maximum order is determined by finding the nodes of highest order in the network. For each node pair, the distance is obtained by the edge weights of HON (or the corresponding higher-order dependencies). For example, in figure 1 and .

It is possible, however, that two given nodes are connected through multiple higher-order distances (i.e., multiple paths). In this case, the average probabilities of all paths (or the average edge weights in HON) is considered as the higher-order distance. For example, suppose node can be reached from node via either path 1: (with probability ) or path 2: (with probability ). The higher-order distance between node and node is equal to the average edge weight of and , corresponding to path 1 and path 2, respectively. Both of these connections have a second-order dependency. Note that, node (or ) may have different dependency orders, but only second-order ones are included in . Once distances for all desired orders are obtained, we derive the higher-order neighborhood matrix as:


For , equals the conventional first-order adjacency matrix. The exponentially decaying weights are chosen to prefer lower-order distances over higher-orders ones, since higher-order paths are generally less frequent in the sequential data [33]. We experimented with increasing and constant weights, and found decaying weights to work best with our method. We leave out the exploration of other potential weighting mechanisms to future work.

It is worth mentioning that, the higher-order neighborhood matrix provides a richer and more accurate representation of node interactions in FON and thus, can be viewed as a means of connecting HON and FON representation. In many network analysis and machine learning applications – such as node classification and link prediction– working with the HON representation is inconvenient, and requires some form of transformation. HONEM provides a more convenient and generalizable interpretation of HON, while preserving the benefits of the more accurate HON representation.

Iii-D Higher-order embeddings

In the third step, the higher-order embeddings are obtained by preserving the higher-order neighborhood in vector space. A popular method to accomplish this is to obtain embedding vector

using matrix factorization, in which the objective is to minimize the loss function:


The widely-adopted method for solving the above equation is SVD (Singular value Decomposition). Formally, we can factorize a given matrix

as below:


Where are the orthogonal matrices containing content and context embedding vectors. is a diagonal matrix containing the singular values in decreasing order.

However, this solution is not scalable to sparse, large networks. Therefore, we use truncated SVD [5] to approximate the matrix by () as below:


where contain the first columns of and , respectively. contains the top- singular values. The embedding vectors can then be obtained by means of the following equations:


Without loss of generality, we use as the embedding matrix.

Rome Bari Shipping Wiki
FON nodes 477 522 3,058 4,043
FON edges 5,614 5,916 52,366 38,580
FON avg in-degree 11.76 11.33 17.12 9.54
HON nodes 19,403 13,893 59,779 67,907
HON edges 119,566 88,594 311,691 255,672
HON avg in-degree 6.16 6.37 5.214 3.76
21.29 14.97 5.95 6.62
TABLE I: Basic properties of each dataset. The gap between the number of first-order and higher-order nodes and edges in each dataset indicates density of higher-order dependencies in each data.
Fig. 2: (a) Reconstruction results. The x-axis represents the number of evaluated edge pairs. HONEM performs better than other baselines with a large margin. (b) Link prediction results. The x-axis indicates the embedding dimension. HONEM provides the best performance on all datasets in dimension 64 or more. In traffic dataset, even though Node2Vec provides better scores. HONEM provides the best precision for the top-k predictions (refer to table II).

Iv Experiments

We used three different real world data sets representing transportation and information networks. We used these data to assess the performance on the following tasks: 1) network reconstruction; 2) link prediction; 3) node classification; and 4) visualization. We compared HONEM to a number of baselines representing the popular deep learning and matrix factorization based methods. We provide details on the data and benchamrks first, before presenting the performance results on the aforementioned tasks. We also provide a complexity analysis of HONEM in the next Section.

Iv-a Datasets

The HONEM framework can be applied to any sequential dataset describing interacting entities to extract latent higher-order dependencies among them. To validate our method, we use four different datasets for which raw sequential data is available. Table I summarizes the basic FON and HON network properties for each dataset. To emphasize the versatility of HONEM, these datasets are drawn from three different domains: vehicular traffic flows from two Italian cities (Rome and Bari), Web browsing patterns on Wikipedia, and global freight shipping. Specifically, the four datasets are:

  • Traffic data of Rome: This is a car-sharing data provided by Telecom Italia big data challenge 2015111https://bit.ly/2UGcEoN, which contains the trajectories of unique vehicles over 30 days. We divided the city into a grid containing first-order nodes with edges. Each taxi location is mapped to a node in the grid, and the edges are derived from the number of taxis traveling between the nodes. This dataset contains higher-order dependencies of order or less. With the inclusion of higher-order patterns, the number of nodes and edges increases by 39.67% and 20.29%, respectively. This dataset also contains locations of accident claims which are used for node labeling.

  • Traffic data of Bari: This is another car-sharing data (provided by Telecom Italia big data challenge 2015) containing trajectories of taxis over 30 days. We divided the city into a grid containing first-order nodes with edges (obtained using the same approach as the Rome traffic data). This dataset contains higher-order dependencies of order or less. With the inclusion of higher-order patterns, number of the nodes and edges increases by 25.61% and 13.97%, respectively. This dataset also contains locations of accident claims which are used for node labeling.

  • Global shipping data: Provided by Lloyd’s Maritime Intelligence Unit (LMIU), contains a total of voyages over a span of 15 years (1997-2012). Applying the rule extraction step to this network yields higher-order dependencies of up to the order. The number of nodes and edges increase by 18.54% and 4.95% respectively, after including the higher-order patterns in HON.

  • Wikipedia game: Available from West et al. [30], contains human navigation paths on Wikipedia. In this game, users start at a Wikipedia entry and are asked to reach a target entry by following only hyperlinks to other Wikipedia entries. The data includes a total of articles with incomplete and complete paths. We discarded incomplete paths of length 3 or shorter. This dataset contains higher-order dependencies of order or less. The inclusion of higher-order patterns results in an increase in the number of nodes and edges by 15.79% and 5.62%, respectively.

Even though we do not directly use HON, comparison of FON and HON for each dataset provides some insight into the higher-order structure of these networks. We define the ratio as a measure of the density of higher-order dependencies, resulting in a larger gap between FON and HON. The two traffic datasets show the highest gap between FON and HON in terms of the number of nodes and edges. Specifically, the gap is the highest in the traffic data of Rome.

Iv-B Baselines

We compare our method with the following state-of-the-art embedding algorithms:

  • DeepWalk [19]: This algorithm uses uniform random walks to generate the node similarity and learns embeddings by preserving the higher-order proximity of nodes. It is equivalent to Node2Vec with and .

  • Node2Vec [7]: This method is a generalized version of DeepWalk, allowing biased random walks. We used 0.5, 1 and 2 for and values and report the best performing results.

  • LINE [28]: This algorithm derives the embeddings by preserving the first and second-order proximities (and a combination of the two). We ran the experiments for both the second-order and combined proximity, but did not notice a major improvement with the combined one. Thus we report results only for the embeddings derived from second-order proximity.

  • Graph Factorization (GF) [1]: This method generates the embeddings by factorizing the adjacency matrix of the network. HONEM will reduce to GF if it only uses the first-order adjacency matrix.

  • LAP [2]: This method generates the embeddings by performing eigen-decomposition of the Laplacian matrix of the network. In this framework, if two nodes are connected with a large weight, their embeddings are expected to be close to each other.

Among the above baselines, Node2Vec, DeepWalk, and LINE learn embeddings using higher-order proximities. We also used Locally Linear embedding (LLE) as a baseline in our early experiments. However, LLE failed to converge on several dimensions in link prediction and network reconstruction experiments. Therefore, we did not include it in the final results. All the baselines are learning the embeddings based on FON.

Iv-C Network Reconstruction

We posit that any embedding method should allow one to reconstruct the original network with sufficient accuracy. This provides an insight into the quality of the embeddings generated by the method. We measure the reconstruction precision for the top evaluated edge pairs as follows:


Where when the reconstructed edge is correctly recovered, and otherwise.

Figure 2 (b) displays the network reconstruction results with varying . We notice that although the performance of other baselines is data dependent, HONEM performs significantly better on all data sets. Results on both traffic datasets display similar trends, and methods like LINE which perform relatively well on these datasets fails on the larger datasets (shipping and Wikipedia). HONEM not only performs better than GF which preserves the first-order proximity, but also outperforms Node2Vec, DeepWalk and LINE which preserve the higher-order proximity based on FON. With the increase in , all of the actual edges are recovered but the number of possible pairs of edges becomes too large, and thus almost all methods converge to a small value. However, there is still a large gap between HONEM and other baselines even at the largest on all datasets.

Fig. 3: Variation of node precision with embedding dimension for Rome. The highlighted green lines indicate the major traffic routes of the city. The node color intensity indicates the link prediction precision for node (). In lower dimension, higher-order nodes have the best precision using HONEM. The precision of other nodes increases in higher dimensions, eventually outperforming Node2Vec in Dim=32 and Dim=64. Node2Vec does not differentiate between higher-order or first-order nodes in lower dimensions.

Iv-D Link Prediction

4 8 16 32 64 128 256
Node2Vec 0.079 0.152 0.165 0.184 0.195 0.1536 0.141
HONEM 0.316 0.364 0.409 0.528 0.529 0.543 0.591
TABLE II: Comparison of Precision@k () for link prediction using Node2Vec and HONEM over various dimensions. Even though Node2Vec provides better score in lower dimensions, it fails to accurately predict the top- links.

We posit that embeddings derived from HON perform better in the link prediction task. Methods based on FON do not capture the temporal higher-order distance of the nodes, which creates a potential for link formation. For example, suppose there is a directed edge in HON from to , denoted by (corresponding to the path ) and another directed edge from second-order node to , denoted by (corresponding to the path ) . In this structure, node can be reached within three steps from node . In FON, however, we only have , and . Therefore, FON might miss the potential interesting edge between and , or and . To validate our argument, we remove 20% of the edges from the current network, and derive node embeddings on the remaining network using HONEM. We then predict the missing edges by calculating the pairwise distance between embedding vectors and select the top highest values as potential edges.

We use

as the link prediction evaluation metrics.

is the average precision over all the nodes, and is defined as:




Where . is the number of evaluated edges, when the reconstructed edge for node exists in the original network, and otherwise. We evaluated link prediction using the measure on dim=128 as well (refer to the supplementary for details). However, since we are interested to analyze the effect of dimension, we provide as a precision measure for all nodes. The results are displayed in figure 2 (a). We notice that score is generally lower in larger datasets, namely Shipping and Wiki (due to sparsity). In the traffic datasets (Bari and Rome) the HONEM shows a monotonically increasing performance with increasing the embedding dimension, while the performance of other methods either saturates after a certain dimension or deteriorates.

Effect of dimension: Overall, HONEM provides superior performance in dimensions of 64 or larger. We notice that while Node2Vec provides a better MAP score on the traffic datasets in lower dimensions (smaller than 64 in Bari and smaller than 32 in Rome), it fails to improve over higher dimensions. We further investigated our results by visualizing the node precision ,, over various dimensions on the Rome city map. The results are shown in figure 3. We realize that nodes with the highest precision (darker color) are located in the high-traffic city zones (green lines show the major highways of the city). Based on our analysis, nodes located in the high-traffic zones are 80.56% more likely to have a dependency of second-order or more. As a result, we observe that in lower dimensions, HONEM consistently exhibits high precision for these higher-order nodes. As the dimension increase, the precision of the lower-order nodes also increases. On the other hand, node precision obtained by Node2Vec is not related to the node location. In dim=32 and dim=64, HONEM provides an overall better coverage and better precision than Node2Vec. A comparison of the top-k (k=1024) prediction between Node2Vec and HONEM is provided in table II. Even though Node2Vec provides better scores in lower dimensions, HONEM provides better precision for the top-k predictions (for more details on refer to the supplementary materials, section VII-B). Looking back at data characteristic, we notice that this phenomenon only happens for the traffic dataset, where is significantly larger than the other two datasets. Therefore, in datasets with significant higher-order dependencies resulting in a large gap between HON and FON, our method provides the best precision for the potentially most important nodes (i.e., those of higher-order).

Iv-E Node classification

We hypothesize that higher-order dependencies can reveal important node structural roles. In this section, we validate this hypothesis using experiments on real-world datasets. Our goal is to find out whether HONEM can improve the node classification accuracy by encoding the knowledge of higher-order dependencies.

We answer the above question by comparing state-of-the-art node embedding methods based on FON and our proposed embedding method, HONEM, which captures temporal higher-order dependencies. We evaluate our method on four different datasets and compare the performance with state-of-the-art embedding methods based on FON. In the traffic data, nodes are labeled given the likelihood of having accidents (i.e., “Low” or “High”). In Wikipedia, the nodes are labeled based on whether or not they are reachable within less than 5 clicks in the network. In the shipping data nodes are labeled given the volume of the shipping traffic (i.e., “Low” or “High”). Our experiments show that compared to five state-of-the-art embedding method, HONEM

yields significantly more accurate results across all datasets regardless of the type of classifier used.

We evaluated the node classification performance using AUROC across four different classifiers: Logistic Regression, Random Forest, Decision Tree, and AdaBoost. The results are shown in figure 

4. We observe that HONEM performs consistently better than other embedding methods. Specifically, we analyzed the HONEM advantage in each dataset. We noticed that in the traffic datasets, nodes with more higher-order dependencies are more likely to have an accident (Pearson correlation: 0.7535, -value ). In the Wikipedia data, reachable nodes are more likely to have higher-order dependencies (Pearson correlation: 0.6845, -value ). In the shipping data, nodes with higher shipping traffic contain more higher-order dependencies (Pearson correlation: 0.8612, -value ). Such time-dependent signals do not emerge in methods based on FON (regardless of the method complexity). Furthermore, we notice that HONEM is fairly robust to the type of classifier. However, Decision Tree performs poorly regardless of the embedding method, as it picks a subset of features which do not fully capture the node representation in the network. In line with expectations, ensemble methods perform better overall, even though Logistic Regression offers competitive performance on the Wikipedia dataset.

Fig. 4: Node classification results. HONEM performs better across all datasets and is fairly robust to the type of the classifier.
Fig. 5: Visualization of Mathematics and Geography topics in the Wikipedia data stet

Iv-F Visualization

To provide a more intuitive interpretation for the improvement offered by HONEM, we compare visualizations of the produced embeddings against those of the baseline methods. As a case example, we visualize the subgraphs corresponding to two different topics from the Wikipedia dataset. This is shown in figure 5. Topics were selected from standard Wikipedia categories. Here we show results for Mathematics and Geography, as they arguably represent two topics that are comparable in terms of generality but are also distinct enough to allow for meaningful interpretation. We use t-SNE [16] to map 128-dimensional embeddings to the 2-dimensional coordinates. Figure 5 shows two separate clusters for the embeddings derived from HONEM. However, it is possible to notice that a number of Mathematics entries are interspersed with Geography entries. These are the nodes of encyclopedic entries such as Sphere, Quantity, Arithmetic, Measurement which, albeit primarily categorized under Mathematics, are also related to many other topics — including Geography.

Figure 5 shows also the visualization results for the baselines. We observe that for many methods the clusters are not as neatly distinguishable as those produced by HONEM. Specifically, DeepWalk, Node2Vec, and LINE display separate clusters, but there are many misclassified nodes within each cluster. With GF and LAP it is even more difficult to identify proper clustering among the articles. This indicates that higher-order patterns are important to distinguish clusters and capture node concepts within the network.

V Analysis of running time

The running time of HONEM consists of the time required for extracting the higher-order dependencies and the time required for factorizing the higher-order local neighborhood matrix. In practice, this is dominated by the time complexity of extracting higher-order dependencies. To analyze this complexity, suppose the size of raw sequential data is , and is the number of unique entities in the raw data. Then, the time complexity of the algorithm is , where is the actual number of higher-order dependencies for order : all observations will be traversed at least once. Testing whether adding a previous step significantly changes the probability distribution of the next step (using Kullback-Leibler divergence) takes up to time [32].

Fig. 6: Comparison of the running time on the global shipping data. HONEM provides the best running time after GF. Both methods are robust to embedding dimension.

We compare the running time of HONEM with the state-of-the-art baselines on the shipping data. We tested the running time on other datasets and found the shipping data to be the most challenging, both in terms of the number of nodes and edges, and network density. All the experiments run on the same machine (Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz). The results are shown in figure 6. The running time of HONEM is robust to the embedding dimension. We notice that GF is the only method having better running time than HONEM. This is understandable since GF directly factorizes the first-order adjacency matrix of the network, while HONEM requires extra time for extracting the higher-order neighborhood. However, the difference in running time of HONEM and GF translates to significantly better performance in link prediction, network reconstruction and node classification. Moreover, higher-order dependencies only need to be extracted once for each dataset (regardless of the embedding dimension). However, for fair comparison, we added this time for experiments over all dimensions.

Vi Conclusion

In this paper, we developed HONEM, a network embedding method that captures higher-order patterns of the underlying sequential data. We show that current embedding methods fail to capture temporal higher-order dependencies, resulting in missing important information or misleading conclusions based on the first-order network (FON). HONEM, on the other hand, extracts the significant higher-order proximities from the sequential data to construct the higher-order neighborhood matrix of the network. The node embeddings are obtained by applying truncated SVD on the higher-order neighborhood matrix. We demonstrate that compared to five state-of-the-art methods, HONEM performs better in node classification, link prediction, network reconstruction, and visualization tasks. We evaluate model robustness against different classifiers and across different dimensions. Finally, we compare the running time of our method against other baselines using the global shipping data.

There are several directions for future improvements. In particular, different weighting mechanism for modeling the effect of distance matrix for various orders can be explored. Furthermore, HONEM can be generalized as a bridge between first-order and higher-order networks. The HONEM framework creates a new path for the exploration of higher-order networks. In the context of network embedding, various decomposition methods –other than truncated SVD– can be applied to learn the node embeddings from the proposed higher-order neighborhood matrix.


  • [1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola (2013) Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pp. 37–48. Cited by: §II, 4th item.
  • [2] M. Belkin and P. Niyogi (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §II, 5th item.
  • [3] S. Cao, W. Lu, and Q. Xu (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900. Cited by: item 1, §II.
  • [4] S. Cao, W. Lu, and Q. Xu (2016) Deep neural networks for learning graph representations.. In AAAI, pp. 1145–1152. Cited by: §II.
  • [5] C. Eckart and G. Young (1936) The approximation of one matrix by another of lower rank. Psychometrika 1 (3), pp. 211–218. Cited by: §III-D.
  • [6] P. Goyal and E. Ferrara (2018) Graph embedding techniques, applications, and performance: a survey. Knowledge-Based Systems 151, pp. 78–94. Cited by: §I.
  • [7] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: item 1, §I, §I, §II, §III-B, 2nd item.
  • [8] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §II.
  • [9] M. Henaff, J. Bruna, and Y. LeCun (2015) Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163. Cited by: §II.
  • [10] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II.
  • [11] J. B. Kruskal and M. Wish (1978) Multidimensional scaling. number 07–011 in sage university paper series on quantitative applications in the social sciences. Sage Publications, Beverly Hills. Cited by: §II.
  • [12] S. Kullback and R. A. Leibler (1951) On information and sufficiency. The annals of mathematical statistics 22 (1), pp. 79–86. Cited by: item 3.
  • [13] R. Lambiotte, M. Rosvall, and I. Scholtes (2018) Understanding complex systems: from networks to optimal higher-order models. arXiv preprint arXiv:1806.05977. Cited by: §I.
  • [14] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §II.
  • [15] J. Ma, P. Cui, and W. Zhu (2018) DepthLGP: learning embeddings of out-of-sample nodes in dynamic networks. Cited by: §II.
  • [16] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §IV-F.
  • [17] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §II.
  • [18] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu (2016) Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1105–1114. Cited by: item 1, §I, §I, §II, §III-B.
  • [19] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §II, §III-B, 1st item.
  • [20] B. Perozzi, V. Kulkarni, and S. Skiena (2016) Walklets: multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115. Cited by: §II.
  • [21] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §II.
  • [22] R. A. Rossi, N. K. Ahmed, and E. Koh (2018) Higher-order network representation learning. In Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 3–4. Cited by: §I, §II.
  • [23] M. Rosvall, A. V. Esquivel, A. Lancichinetti, J. D. West, and R. Lambiotte (2014) Memory in network flows and its effects on spreading dynamics and community detection. Nature communications 5, pp. 4630. Cited by: §I.
  • [24] S. T. Roweis and L. K. Saul (2000) Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. Cited by: §II.
  • [25] I. Scholtes, N. Wider, R. Pfitzner, A. Garas, C. J. Tessone, and F. Schweitzer (2014) Causality-driven slow-down and speed-up of diffusion in non-markovian temporal networks. Nature communications 5, pp. 5024. Cited by: §I.
  • [26] I. Scholtes (2017) When is a network a network?: multi-order graphical model selection in pathways and temporal networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1046. Cited by: §I.
  • [27] J. Tang, M. Qu, and Q. Mei (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165–1174. Cited by: §II.
  • [28] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. Cited by: item 1, §I, §I, §II, §III-B, 3rd item.
  • [29] D. Wang, P. Cui, and W. Zhu (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §II.
  • [30] R. West and J. Leskovec (2012) Human wayfinding in information networks. In Proceedings of the 21st international conference on World Wide Web, pp. 619–628. Cited by: 4th item.
  • [31] S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §II.
  • [32] J. Xu, M. Saebi, B. Ribeiro, L. M. Kaplan, and N. V. Chawla (2017) Detecting anomalies in sequential data with higher-order networks. arXiv preprint arXiv:1712.09658. Cited by: §V.
  • [33] J. Xu, T. L. Wickramarathne, and N. V. Chawla (2016) Representing higher-order dependencies in networks. Science advances 2 (5), pp. e1600028. External Links: Link Cited by: §I, §I, §III-B, §III-B, §III-C, §VII-A.
  • [34] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich (2017) Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. Cited by: §II.
  • [35] H. Yu, V. Kulkarni, and W. Wang (2018) MOHONE: modeling higher order network effects in knowledgegraphs via network infused embeddings. arXiv preprint arXiv:1811.00198. Cited by: §II.
  • [36] Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu (2018) Arbitrary-order proximity preserved network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2778–2786. Cited by: §I, §I, §II.
  • [37] L. Zhou, Y. Yang, X. Ren, F. Wu, and Y. Zhuang (2018) Dynamic network embedding by modeling triadic closure process.. In AAAI, Cited by: §II.
  • [38] Y. Zuo, G. Liu, H. Lin, J. Guo, X. Hu, and J. Wu (2018) Embedding temporal network via neighborhood formation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2857–2866. Cited by: §II.

Vii Supplementary materials

Vii-a Extracting higher-order dependencies

Below we provide an example to clarify the higher-order rule extraction step. We encourage readers to review the HON paper [33] for more details. Given the raw sequential data: , we can extract the higher-order dependencies using procedure explained in III-B. An example of this procedure is provided in table III. In this example, the probability distribution of the next steps from changes significantly if the previous step (coming to from or ) is known, but knowing more previous steps (coming to from or ) does not make a difference; therefore, and demonstrate second-order dependencies. Note that the probability distribution of the next steps from does not change, no matter how many previous steps are known. Therefore, and only have a first-order dependency. In this example and . As mentioned in the main manuscript, the higher-order dependencies are interpreted as higher-order distance for neighborhood calculation. In this example, first-order distances include: and second-order distances are: . These values can be used to populate the higher-order neighborhood matrix.

-order -order -order
(1) Observations
:2 :1 :1
:1 :1 :1
:2 :2 :1
:1 :1 :1
:1 :1 :1
:1 :1 :1
:1 :1 :1
(2) Distributions
:1 :1 :1
:1 :1 :1
:0.66 :1 :1
:0.33 :1 :1
:0.5 :0.5 :0.5
:0.5 :0.5 :0.5
:1 :1 :1
(3) Higher-order
dependencies :1 :1 :1
:1 :1 :1
:0.66 :1 :1
:0.33 :1 :0.5
:0.5 :0.5 :0.5
:0.5 :0.5 :0.5
:1 :1 :1
TABLE III: An example of extracting higher-order dependencies from the raw sequential data

Vii-B Supplementary results on link prediction

In this section we provide additional materials for our link prediction experiments. These results clarify our claim regarding HONEM link prediction performance on top predictions. This is shown in figure 7. We fixed the dimension at 128 and analyzed link prediction precision on evaluated edge pairs. We observe that while other baselines perform poorly on larger and sparser networks (Shipping and Wikipedia networks), HONEM provides significantly better results on all datasets.

Fig. 7: Link prediction precision. The x-axis represents the number of evaluated edge pairs. HONEM performs better than other baselines with a large margin.