I Introduction
Networks are ubiquitous, representing interactions between components of a complex system. Applying machine learning algorithms on such networks has typically required a painstaking feature engineering task to develop feature vectors for learning algorithms. Motivated by this feature engineering task, recent research has focused on developing representation learning methods on such networks. These representation learning methods strive to discover the appropriate lowerdimensional embedding vectors, which are then used as feature vectors for application of machine learning algorithms for several tasks, such as node classification, link prediction, etc. However, the research on network representation learning has largely focused on firstorder networks (FON), that is the networks where only firstorder or dyadic interactions among the nodes are captured in network construction (Markovian process). Several embedding approaches strive to preserve the network structure and connectivity of the nodes
[7, 28, 6, 36, 18, 22]. While preserving the higherorder proximity patterns in the network structure results in improved performance on several tasks, including link prediction, network reconstruction, and community detection tasks [6], these methods do not extend to higherorder networks that functionally incorporate the higher and variable orders of dependencies in the raw data.Several recent papers have pointed out the weaknesses and limitations of FONs, and proposed higherorder network construction algorithms that have been demonstrated to be more accurate and effective in capturing the trends in the underlying raw data of a complex system [33, 25, 23, 26, 13]. However, higherorder modeling of complex networks brings forth further challenges, as there is no representation learning framework for such HONs. Thus we ask the following questions in this paper: 1) Are the existing embedding methods able to capture the higherorder network representation? 2) How to learn embeddings on higherorder networks?
Contributions.
To address the limitation of existing methods, we propose an embedding method, HONEM, for the higherorder network (HON) [33] representation of the raw data. The main idea of HONEM is to generate a lowdimensional embedding of the networks such that the temporal higherorder dependencies (represented in HON) are accurately preserved. Consider the following scenario. We are provided human trajectory/traffic data of the area around a University campus. Suppose from the trajectory data, we observe that students that live oncampus are more likely to visit the central library after visiting the downtown area, while people living in a certain residential area are more likely to go to the business area of the city after passing through the downtown area (assuming none of the four locations overlap with each other). In Figure 1, each of the nodes represents the following: C: oncampus dorm, B: residential area, A: downtown area, E: library, D: business area. Suppose we model such timeordered dependencies as FON (Figure 1 (b)), and then try to infer secondorder dependencies from FON structure to derive the node embeddings (as done in [7, 28, 36, 18]
). In the FON structure, both library and business area are twosteps away from campus dorms (or the residential area). Therefore, we may conclude that students living oncampus have an equal probability of visiting the library and the business area through downtown. As a result,
all the above mentioned methods based on FON will miss important higherorder dependencies or infer higherorder dependencies that do not exist in the original raw data. By modeling these interactions as HON (Figure 1 (c)), we observe that node C and E have a secondorder dependency through node A—C. Similarly, node B and D have a secondorder dependency through node A—B. There is not a secondorder dependency between B and E, or C and D.To summarize, the key contributions and properties of HONEM are as follows:

Scalable and parameterfree: HONEM does not require a seep through the parameter space of window length. HONEM
also does not require any hyperparameter tuning or sampling as is often the case with deep learning or random walk based embedding methods.

Generalizable: HONEM embeddings are directly applicable to a variety of network analytics tasks such as network reconstruction, link prediction, node classification, and visualization.
Ii Related work
Recent advances in graph mining have motivated the need to automate feature engineering from networks. This problem finds its roots in traditional dimensionality reduction techniques [24, 31, 11]. For example, LLE [24] represents each node as a linear combination of its immediate neighbors, and LE [2] uses the spectral properties of the Laplacian matrix of the network to derive node embeddings.
More recently, methods based on random walks, matrix factorization, and deep learning have been proposed as well, albeit applicable to FONs. DeepWalk [19] learned node embeddings by combining random walks with the skipgram model [17]. Node2Vec [7]
extended this approach further, proposing to use biased random walks to capture any homophily and structural equivalence present in the network. A random walk based method for knowledge graph embedding is proposed in
[35]. In contrast, factorization methods derive embeddings by factorizing a matrix that represents the connections between nodes. GF [1] explicitly factorizes the adjacency matrix of the FON. LINE [28] attempts to preserve both firstorder and secondorder proximities by defining explicit functions. GraRep [3] and HOPE [18] go beyond secondorder, and factorize a similarity matrix containing higherorder proximites. Walklets [20] approximates the higherorder proximity matrix by skipping over some nodes in the network. Qiu et al., [21] show that LINE, Node2Vec, DeepWalk, and PTE [27] are implicitly factorizing a higherorder proximity matrix of the network. A new crop of methods has been proposed recently that allows for dependencies of arbitrary order [36, 3]. However, this order needs to be set by the user beforehand. Therefore, these methods are unable to extract the order of the system from raw sequential data and fail to identify the higherorder dependencies of the network without trial and error. HONE [22] uses motifs as higherorder structures, however these motifs do not capture temporal higherorder dependencies. In addition, several deep learningbased methods have also been proposed. SDNE [29] uses autoencoders to preserve firstorder and secondorder proxmities. DNGR [4]combines autoencoders with random surfing to capture higherorder proximities beyond secondorder. However, both methods present high computational complexity. Models based on Convolutional Neural Networks (CNN) were proposed to address the complexity issue
[10, 9, 14, 8].Finally, dynamic approaches have been recently proposed to capture the evolution of the network with embeddings [15, 34, 38, 37]. These methods still feature a computationally demanding task of dynamic network modeling. Furthermore, these methods are developed based on FON and require specification of a time window, making them data dependent.
To the best of our knowledge, there is a gap in the literature when it comes to approaches to representation learning that capture the higherorder dependencies over time without dynamic network modeling. HONEM fills an important and critical gap in the literature by addressing the challenges of learning embeddings from the higherorder dependencies in a network, thereby providing a more accurate and effective embedding.
Iii higherorder Network Embedding: Honem
In summary, the HONEM algorithm comprises of the following steps:

Extraction of the higherorder dependencies from the raw data.

Generation of a higherorder neighborhood matrix given the extracted dependencies.

Applying truncated SVD on the higherorder neighborhood matrix of the network to discover embeddings, which can then be used by machine learning algorithms.
Iiia Preliminaries
Let us consider a set of interacting entities and a set of variablelength sequences of interactions between entities. Given the raw sequential data, the HON can be represented as with edges and nodes of various orders, in which a node can represent a sequence of interactions (path). For example, a higherorder node represents the fact that node is being visited given that node was previously visited, while a higherorder node represents the node given previously visited nodes and . In this context, a firstorder node is shown by node , in which the notation “ ” indicates that no previous step information is included in the data.
Using these higherorder nodes and edges in , our goal is to learn embeddings for nodes in the firstorder network, . Keep in mind that, , as several nodes in will correspond to a node in . For example, all HON nodes , and represent node in the FON. It is important to highlight this connection between HON nodes and their FON counterparts. Indeed, we are interested in evaluating our embeddings in a number of machine learning tasks — such as node classification and link prediction — that are formulated in terms of FON nodes, for example the class label information is available on (and not ). Therefore, it is important to eventually obtain embeddings for FON nodes.
One approach to address the above challenge is to learn embeddings on higherorder nodes using existing network embedding methods and then combine them to derive the embedding for node . We experimented with this approach using different method of combining HON embeddings (max, mean, weighted mean) and realized that it does not scale to large networks, as the number of higherorder nodes can be much higher than that of firstorder nodes. We therefore refrain from constructing the HON directly, and modify the “rule extraction” step in the HON algorithm to generate the higherorder dependencies and the higherorder neighborhood matrix.
IiiB Extracting highorder dependencies
The first step of the HONEM framework is to extract higherorder dependencies from the raw sequential data. To accomplish this task, we modify the rule extraction step in the HON construction algorithm [33]. We briefly explain the rule extraction in the HON algorithm below:
Rule Extraction (HON): In the firstorder network, all the nodes are assumed to be connected through pairwise interactions.
In order to discover the higherorder dependencies in the sequential data, given a pathway of order : , we follow the steps below:

Step 1: Count all the observed paths of length= (where is the MaxOrder) in the sequential data.

Step 2: Calculate probability distributions
for next step in each path, given the current and previous steps. 
Step 3: Extent the current path by checking whether including a previous step and extending to (of order ) will significantly change the normalized count of movements (or the probability distribution,
). To detect a significant change, the KullbackLeibler divergence
[12] of and , defined as , is compared with a dynamic threshold, . If is larger than , order is assumed as the new order of dependency, and will be extended to .
This procedure is repeated recursively until a predefined parameter, MaxOrder is reached. However, the new parameterfree version of the algorithm (which is used in the paper) does not require setting a predefined MaxOrder, and extracts the MaxOrder automatically for each sequence. The parameter refers to the number of times the path appears in the raw trajectories. The threshold assures that higherorders are only considered if they have sufficient support, which is set with the parameter . Patterns less frequent than are discarded. For an example of this procedure, refer to supplementary materials in section VIIA.
The above method only accepts dependencies that are significant and that have occurred a sufficient enough number of times. This is required to ensure that any random pattern in the data will not appear as a spurious dependency rule. Furthermore, this method admits dependencies of variable order for different paths. Using this approach, we extract all possible higherorder dependencies from the sequential data. These dependencies are then used to construct the HON. For example, the edge in the HON corresponds to the rule — in other words, and are connected through a secondorder path.
Modified Rule Extraction for HONEM: In HONEM framework, we modify the standard HON rule extraction approach by preserving all lower orders when including any higherorder dependency. This is motivated by a limitation of the previously proposed HON algorithm [33]. In the original HON rule extraction algorithm, after extracting all dependencies, the HON is constructed with the assumption that if higherorders are discovered, all the lower orders (except the firstorder) are ignored. However, discovering a higherorder path between two nodes does not imply that the nodes cannot be connected through shorter pathways. For example, if and are connected through the thirdorder path , and a secondorder path , they have a secondorder dependency as well as a thirdorder dependency.
Note that, in HONEM we extract the higherorder decencies from the sequential data and not from the firstorder network topology, as is done by other methods in the literature [18, 7, 19, 28]. Therefore, our notion of “higherorder dependecies” refers to such dependencies that are extracted from sequential data over time. Although these methods are able to improve performance by preserving higherorder distances between nodes given the topology of the firstorder network, they are unable to capture dependencies over time. This is important because not all the connections through higherorder pathways will occur if they do not exist in the raw sequential data in the first place.
IiiC Higherorder neighborhood matrix
In the second step of our framework, we design a mechanism for encoding these higherorder dependencies into a neighborhood matrix. In this context, we refer to higherorder dependencies as higherorder distances. We define a order neighborhood matrix as , in which the element represents the order distance between nodes and . Intuitively, is the firstorder adjacency matrix. We derive the neighborhood matrices of various orders until the maximum existing order in the network, , is reached. The maximum order is determined by finding the nodes of highest order in the network. For each node pair, the distance is obtained by the edge weights of HON (or the corresponding higherorder dependencies). For example, in figure 1 and .
It is possible, however, that two given nodes are connected through multiple higherorder distances (i.e., multiple paths). In this case, the average probabilities of all paths (or the average edge weights in HON) is considered as the higherorder distance. For example, suppose node can be reached from node via either path 1: (with probability ) or path 2: (with probability ). The higherorder distance between node and node is equal to the average edge weight of and , corresponding to path 1 and path 2, respectively. Both of these connections have a secondorder dependency. Note that, node (or ) may have different dependency orders, but only secondorder ones are included in . Once distances for all desired orders are obtained, we derive the higherorder neighborhood matrix as:
(1) 
For , equals the conventional firstorder adjacency matrix. The exponentially decaying weights are chosen to prefer lowerorder distances over higherorders ones, since higherorder paths are generally less frequent in the sequential data [33]. We experimented with increasing and constant weights, and found decaying weights to work best with our method. We leave out the exploration of other potential weighting mechanisms to future work.
It is worth mentioning that, the higherorder neighborhood matrix provides a richer and more accurate representation of node interactions in FON and thus, can be viewed as a means of connecting HON and FON representation. In many network analysis and machine learning applications – such as node classification and link prediction– working with the HON representation is inconvenient, and requires some form of transformation. HONEM provides a more convenient and generalizable interpretation of HON, while preserving the benefits of the more accurate HON representation.
IiiD Higherorder embeddings
In the third step, the higherorder embeddings are obtained by preserving the higherorder neighborhood in vector space. A popular method to accomplish this is to obtain embedding vector
using matrix factorization, in which the objective is to minimize the loss function:
(2) 
The widelyadopted method for solving the above equation is SVD (Singular value Decomposition). Formally, we can factorize a given matrix
as below:(3) 
Where are the orthogonal matrices containing content and context embedding vectors. is a diagonal matrix containing the singular values in decreasing order.
However, this solution is not scalable to sparse, large networks. Therefore, we use truncated SVD [5] to approximate the matrix by () as below:
(4) 
where contain the first columns of and , respectively. contains the top singular values. The embedding vectors can then be obtained by means of the following equations:
(5)  
(6) 
Without loss of generality, we use as the embedding matrix.
Rome  Bari  Shipping  Wiki  

FON nodes  477  522  3,058  4,043 
FON edges  5,614  5,916  52,366  38,580 
FON avg indegree  11.76  11.33  17.12  9.54 
HON nodes  19,403  13,893  59,779  67,907 
HON edges  119,566  88,594  311,691  255,672 
HON avg indegree  6.16  6.37  5.214  3.76 
21.29  14.97  5.95  6.62 
Iv Experiments
We used three different real world data sets representing transportation and information networks. We used these data to assess the performance on the following tasks: 1) network reconstruction; 2) link prediction; 3) node classification; and 4) visualization. We compared HONEM to a number of baselines representing the popular deep learning and matrix factorization based methods. We provide details on the data and benchamrks first, before presenting the performance results on the aforementioned tasks. We also provide a complexity analysis of HONEM in the next Section.
Iva Datasets
The HONEM framework can be applied to any sequential dataset describing interacting entities to extract latent higherorder dependencies among them. To validate our method, we use four different datasets for which raw sequential data is available. Table I summarizes the basic FON and HON network properties for each dataset. To emphasize the versatility of HONEM, these datasets are drawn from three different domains: vehicular traffic flows from two Italian cities (Rome and Bari), Web browsing patterns on Wikipedia, and global freight shipping. Specifically, the four datasets are:

Traffic data of Rome: This is a carsharing data provided by Telecom Italia big data challenge 2015^{1}^{1}1https://bit.ly/2UGcEoN, which contains the trajectories of unique vehicles over 30 days. We divided the city into a grid containing firstorder nodes with edges. Each taxi location is mapped to a node in the grid, and the edges are derived from the number of taxis traveling between the nodes. This dataset contains higherorder dependencies of order or less. With the inclusion of higherorder patterns, the number of nodes and edges increases by 39.67% and 20.29%, respectively. This dataset also contains locations of accident claims which are used for node labeling.

Traffic data of Bari: This is another carsharing data (provided by Telecom Italia big data challenge 2015) containing trajectories of taxis over 30 days. We divided the city into a grid containing firstorder nodes with edges (obtained using the same approach as the Rome traffic data). This dataset contains higherorder dependencies of order or less. With the inclusion of higherorder patterns, number of the nodes and edges increases by 25.61% and 13.97%, respectively. This dataset also contains locations of accident claims which are used for node labeling.

Global shipping data: Provided by Lloyd’s Maritime Intelligence Unit (LMIU), contains a total of voyages over a span of 15 years (19972012). Applying the rule extraction step to this network yields higherorder dependencies of up to the order. The number of nodes and edges increase by 18.54% and 4.95% respectively, after including the higherorder patterns in HON.

Wikipedia game: Available from West et al. [30], contains human navigation paths on Wikipedia. In this game, users start at a Wikipedia entry and are asked to reach a target entry by following only hyperlinks to other Wikipedia entries. The data includes a total of articles with incomplete and complete paths. We discarded incomplete paths of length 3 or shorter. This dataset contains higherorder dependencies of order or less. The inclusion of higherorder patterns results in an increase in the number of nodes and edges by 15.79% and 5.62%, respectively.
Even though we do not directly use HON, comparison of FON and HON for each dataset provides some insight into the higherorder structure of these networks. We define the ratio as a measure of the density of higherorder dependencies, resulting in a larger gap between FON and HON. The two traffic datasets show the highest gap between FON and HON in terms of the number of nodes and edges. Specifically, the gap is the highest in the traffic data of Rome.
IvB Baselines
We compare our method with the following stateoftheart embedding algorithms:

DeepWalk [19]: This algorithm uses uniform random walks to generate the node similarity and learns embeddings by preserving the higherorder proximity of nodes. It is equivalent to Node2Vec with and .

Node2Vec [7]: This method is a generalized version of DeepWalk, allowing biased random walks. We used 0.5, 1 and 2 for and values and report the best performing results.

LINE [28]: This algorithm derives the embeddings by preserving the first and secondorder proximities (and a combination of the two). We ran the experiments for both the secondorder and combined proximity, but did not notice a major improvement with the combined one. Thus we report results only for the embeddings derived from secondorder proximity.

Graph Factorization (GF) [1]: This method generates the embeddings by factorizing the adjacency matrix of the network. HONEM will reduce to GF if it only uses the firstorder adjacency matrix.

LAP [2]: This method generates the embeddings by performing eigendecomposition of the Laplacian matrix of the network. In this framework, if two nodes are connected with a large weight, their embeddings are expected to be close to each other.
Among the above baselines, Node2Vec, DeepWalk, and LINE learn embeddings using higherorder proximities. We also used Locally Linear embedding (LLE) as a baseline in our early experiments. However, LLE failed to converge on several dimensions in link prediction and network reconstruction experiments. Therefore, we did not include it in the final results. All the baselines are learning the embeddings based on FON.
IvC Network Reconstruction
We posit that any embedding method should allow one to reconstruct the original network with sufficient accuracy. This provides an insight into the quality of the embeddings generated by the method. We measure the reconstruction precision for the top evaluated edge pairs as follows:
(7) 
Where when the reconstructed edge is correctly recovered, and otherwise.
Figure 2 (b) displays the network reconstruction results with varying . We notice that although the performance of other baselines is data dependent, HONEM performs significantly better on all data sets. Results on both traffic datasets display similar trends, and methods like LINE which perform relatively well on these datasets fails on the larger datasets (shipping and Wikipedia). HONEM not only performs better than GF which preserves the firstorder proximity, but also outperforms Node2Vec, DeepWalk and LINE which preserve the higherorder proximity based on FON. With the increase in , all of the actual edges are recovered but the number of possible pairs of edges becomes too large, and thus almost all methods converge to a small value. However, there is still a large gap between HONEM and other baselines even at the largest on all datasets.
IvD Link Prediction
4  8  16  32  64  128  256  

Node2Vec  0.079  0.152  0.165  0.184  0.195  0.1536  0.141 
HONEM  0.316  0.364  0.409  0.528  0.529  0.543  0.591 
We posit that embeddings derived from HON perform better in the link prediction task. Methods based on FON do not capture the temporal higherorder distance of the nodes, which creates a potential for link formation. For example, suppose there is a directed edge in HON from to , denoted by (corresponding to the path ) and another directed edge from secondorder node to , denoted by (corresponding to the path ) . In this structure, node can be reached within three steps from node . In FON, however, we only have , and . Therefore, FON might miss the potential interesting edge between and , or and . To validate our argument, we remove 20% of the edges from the current network, and derive node embeddings on the remaining network using HONEM. We then predict the missing edges by calculating the pairwise distance between embedding vectors and select the top highest values as potential edges.
We use
as the link prediction evaluation metrics.
is the average precision over all the nodes, and is defined as:(8) 
Where:
(9) 
Where . is the number of evaluated edges, when the reconstructed edge for node exists in the original network, and otherwise. We evaluated link prediction using the measure on dim=128 as well (refer to the supplementary for details). However, since we are interested to analyze the effect of dimension, we provide as a precision measure for all nodes. The results are displayed in figure 2 (a). We notice that score is generally lower in larger datasets, namely Shipping and Wiki (due to sparsity). In the traffic datasets (Bari and Rome) the HONEM shows a monotonically increasing performance with increasing the embedding dimension, while the performance of other methods either saturates after a certain dimension or deteriorates.
Effect of dimension: Overall, HONEM provides superior performance in dimensions of 64 or larger. We notice that while Node2Vec provides a better MAP score on the traffic datasets in lower dimensions (smaller than 64 in Bari and smaller than 32 in Rome), it fails to improve over higher dimensions. We further investigated our results by visualizing the node precision ,, over various dimensions on the Rome city map. The results are shown in figure 3. We realize that nodes with the highest precision (darker color) are located in the hightraffic city zones (green lines show the major highways of the city). Based on our analysis, nodes located in the hightraffic zones are 80.56% more likely to have a dependency of secondorder or more. As a result, we observe that in lower dimensions, HONEM consistently exhibits high precision for these higherorder nodes. As the dimension increase, the precision of the lowerorder nodes also increases. On the other hand, node precision obtained by Node2Vec is not related to the node location. In dim=32 and dim=64, HONEM provides an overall better coverage and better precision than Node2Vec. A comparison of the topk (k=1024) prediction between Node2Vec and HONEM is provided in table II. Even though Node2Vec provides better scores in lower dimensions, HONEM provides better precision for the topk predictions (for more details on refer to the supplementary materials, section VIIB). Looking back at data characteristic, we notice that this phenomenon only happens for the traffic dataset, where is significantly larger than the other two datasets. Therefore, in datasets with significant higherorder dependencies resulting in a large gap between HON and FON, our method provides the best precision for the potentially most important nodes (i.e., those of higherorder).
IvE Node classification
We hypothesize that higherorder dependencies can reveal important node structural roles. In this section, we validate this hypothesis using experiments on realworld datasets. Our goal is to find out whether HONEM can improve the node classification accuracy by encoding the knowledge of higherorder dependencies.
We answer the above question by comparing stateoftheart node embedding methods based on FON and our proposed embedding method, HONEM, which captures temporal higherorder dependencies. We evaluate our method on four different datasets and compare the performance with stateoftheart embedding methods based on FON. In the traffic data, nodes are labeled given the likelihood of having accidents (i.e., “Low” or “High”). In Wikipedia, the nodes are labeled based on whether or not they are reachable within less than 5 clicks in the network. In the shipping data nodes are labeled given the volume of the shipping traffic (i.e., “Low” or “High”). Our experiments show that compared to five stateoftheart embedding method, HONEM
yields significantly more accurate results across all datasets regardless of the type of classifier used.
We evaluated the node classification performance using AUROC across four different classifiers: Logistic Regression, Random Forest, Decision Tree, and AdaBoost. The results are shown in figure
4. We observe that HONEM performs consistently better than other embedding methods. Specifically, we analyzed the HONEM advantage in each dataset. We noticed that in the traffic datasets, nodes with more higherorder dependencies are more likely to have an accident (Pearson correlation: 0.7535, value ). In the Wikipedia data, reachable nodes are more likely to have higherorder dependencies (Pearson correlation: 0.6845, value ). In the shipping data, nodes with higher shipping traffic contain more higherorder dependencies (Pearson correlation: 0.8612, value ). Such timedependent signals do not emerge in methods based on FON (regardless of the method complexity). Furthermore, we notice that HONEM is fairly robust to the type of classifier. However, Decision Tree performs poorly regardless of the embedding method, as it picks a subset of features which do not fully capture the node representation in the network. In line with expectations, ensemble methods perform better overall, even though Logistic Regression offers competitive performance on the Wikipedia dataset.IvF Visualization
To provide a more intuitive interpretation for the improvement offered by HONEM, we compare visualizations of the produced embeddings against those of the baseline methods. As a case example, we visualize the subgraphs corresponding to two different topics from the Wikipedia dataset. This is shown in figure 5. Topics were selected from standard Wikipedia categories. Here we show results for Mathematics and Geography, as they arguably represent two topics that are comparable in terms of generality but are also distinct enough to allow for meaningful interpretation. We use tSNE [16] to map 128dimensional embeddings to the 2dimensional coordinates. Figure 5 shows two separate clusters for the embeddings derived from HONEM. However, it is possible to notice that a number of Mathematics entries are interspersed with Geography entries. These are the nodes of encyclopedic entries such as Sphere, Quantity, Arithmetic, Measurement which, albeit primarily categorized under Mathematics, are also related to many other topics — including Geography.
Figure 5 shows also the visualization results for the baselines. We observe that for many methods the clusters are not as neatly distinguishable as those produced by HONEM. Specifically, DeepWalk, Node2Vec, and LINE display separate clusters, but there are many misclassified nodes within each cluster. With GF and LAP it is even more difficult to identify proper clustering among the articles. This indicates that higherorder patterns are important to distinguish clusters and capture node concepts within the network.
V Analysis of running time
The running time of HONEM consists of the time required for extracting the higherorder dependencies and the time required for factorizing the higherorder local neighborhood matrix. In practice, this is dominated by the time complexity of extracting higherorder dependencies. To analyze this complexity, suppose the size of raw sequential data is , and is the number of unique entities in the raw data. Then, the time complexity of the algorithm is , where is the actual number of higherorder dependencies for order : all observations will be traversed at least once. Testing whether adding a previous step significantly changes the probability distribution of the next step (using KullbackLeibler divergence) takes up to time [32].
We compare the running time of HONEM with the stateoftheart baselines on the shipping data. We tested the running time on other datasets and found the shipping data to be the most challenging, both in terms of the number of nodes and edges, and network density. All the experiments run on the same machine (Intel(R) Xeon(R) CPU E74850 v4 @ 2.10GHz). The results are shown in figure 6. The running time of HONEM is robust to the embedding dimension. We notice that GF is the only method having better running time than HONEM. This is understandable since GF directly factorizes the firstorder adjacency matrix of the network, while HONEM requires extra time for extracting the higherorder neighborhood. However, the difference in running time of HONEM and GF translates to significantly better performance in link prediction, network reconstruction and node classification. Moreover, higherorder dependencies only need to be extracted once for each dataset (regardless of the embedding dimension). However, for fair comparison, we added this time for experiments over all dimensions.
Vi Conclusion
In this paper, we developed HONEM, a network embedding method that captures higherorder patterns of the underlying sequential data. We show that current embedding methods fail to capture temporal higherorder dependencies, resulting in missing important information or misleading conclusions based on the firstorder network (FON). HONEM, on the other hand, extracts the significant higherorder proximities from the sequential data to construct the higherorder neighborhood matrix of the network. The node embeddings are obtained by applying truncated SVD on the higherorder neighborhood matrix. We demonstrate that compared to five stateoftheart methods, HONEM performs better in node classification, link prediction, network reconstruction, and visualization tasks. We evaluate model robustness against different classifiers and across different dimensions. Finally, we compare the running time of our method against other baselines using the global shipping data.
There are several directions for future improvements. In particular, different weighting mechanism for modeling the effect of distance matrix for various orders can be explored. Furthermore, HONEM can be generalized as a bridge between firstorder and higherorder networks. The HONEM framework creates a new path for the exploration of higherorder networks. In the context of network embedding, various decomposition methods –other than truncated SVD– can be applied to learn the node embeddings from the proposed higherorder neighborhood matrix.
References
 [1] (2013) Distributed largescale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pp. 37–48. Cited by: §II, 4th item.
 [2] (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §II, 5th item.
 [3] (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900. Cited by: item 1, §II.
 [4] (2016) Deep neural networks for learning graph representations.. In AAAI, pp. 1145–1152. Cited by: §II.
 [5] (1936) The approximation of one matrix by another of lower rank. Psychometrika 1 (3), pp. 211–218. Cited by: §IIID.
 [6] (2018) Graph embedding techniques, applications, and performance: a survey. KnowledgeBased Systems 151, pp. 78–94. Cited by: §I.
 [7] (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: item 1, §I, §I, §II, §IIIB, 2nd item.
 [8] (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §II.
 [9] (2015) Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163. Cited by: §II.
 [10] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §II.
 [11] (1978) Multidimensional scaling. number 07–011 in sage university paper series on quantitative applications in the social sciences. Sage Publications, Beverly Hills. Cited by: §II.
 [12] (1951) On information and sufficiency. The annals of mathematical statistics 22 (1), pp. 79–86. Cited by: item 3.
 [13] (2018) Understanding complex systems: from networks to optimal higherorder models. arXiv preprint arXiv:1806.05977. Cited by: §I.
 [14] (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §II.
 [15] (2018) DepthLGP: learning embeddings of outofsample nodes in dynamic networks. Cited by: §II.
 [16] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §IVF.
 [17] (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §II.
 [18] (2016) Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1105–1114. Cited by: item 1, §I, §I, §II, §IIIB.
 [19] (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §II, §IIIB, 1st item.
 [20] (2016) Walklets: multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115. Cited by: §II.
 [21] (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §II.
 [22] (2018) Higherorder network representation learning. In Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 3–4. Cited by: §I, §II.
 [23] (2014) Memory in network flows and its effects on spreading dynamics and community detection. Nature communications 5, pp. 4630. Cited by: §I.
 [24] (2000) Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. Cited by: §II.
 [25] (2014) Causalitydriven slowdown and speedup of diffusion in nonmarkovian temporal networks. Nature communications 5, pp. 5024. Cited by: §I.
 [26] (2017) When is a network a network?: multiorder graphical model selection in pathways and temporal networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1046. Cited by: §I.
 [27] (2015) Pte: predictive text embedding through largescale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165–1174. Cited by: §II.
 [28] (2015) Line: largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. Cited by: item 1, §I, §I, §II, §IIIB, 3rd item.
 [29] (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §II.
 [30] (2012) Human wayfinding in information networks. In Proceedings of the 21st international conference on World Wide Web, pp. 619–628. Cited by: 4th item.
 [31] (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (13), pp. 37–52. Cited by: §II.
 [32] (2017) Detecting anomalies in sequential data with higherorder networks. arXiv preprint arXiv:1712.09658. Cited by: §V.
 [33] (2016) Representing higherorder dependencies in networks. Science advances 2 (5), pp. e1600028. External Links: Link Cited by: §I, §I, §IIIB, §IIIB, §IIIC, §VIIA.
 [34] (2017) Local higherorder graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. Cited by: §II.
 [35] (2018) MOHONE: modeling higher order network effects in knowledgegraphs via network infused embeddings. arXiv preprint arXiv:1811.00198. Cited by: §II.
 [36] (2018) Arbitraryorder proximity preserved network embedding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2778–2786. Cited by: §I, §I, §II.
 [37] (2018) Dynamic network embedding by modeling triadic closure process.. In AAAI, Cited by: §II.
 [38] (2018) Embedding temporal network via neighborhood formation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2857–2866. Cited by: §II.
Vii Supplementary materials
Viia Extracting higherorder dependencies
Below we provide an example to clarify the higherorder rule extraction step. We encourage readers to review the HON paper [33] for more details. Given the raw sequential data: , we can extract the higherorder dependencies using procedure explained in IIIB. An example of this procedure is provided in table III. In this example, the probability distribution of the next steps from changes significantly if the previous step (coming to from or ) is known, but knowing more previous steps (coming to from or ) does not make a difference; therefore, and demonstrate secondorder dependencies. Note that the probability distribution of the next steps from does not change, no matter how many previous steps are known. Therefore, and only have a firstorder dependency. In this example and . As mentioned in the main manuscript, the higherorder dependencies are interpreted as higherorder distance for neighborhood calculation. In this example, firstorder distances include: and secondorder distances are: . These values can be used to populate the higherorder neighborhood matrix.
order  order  order  

(1) Observations  
:2  :1  :1  
:1  :1  :1  
:2  :2  :1  
:1  :1  :1  
:1  :1  :1  
:1  :1  :1  
:1  :1  :1  
(2) Distributions  
:1  :1  :1  
:1  :1  :1  
:0.66  :1  :1  
:0.33  :1  :1  
:0.5  :0.5  :0.5  
:0.5  :0.5  :0.5  
:1  :1  :1  
(3) Higherorder  
dependencies  :1  :1  :1 
:1  :1  :1  
:0.66  :1  :1  
:0.33  :1  :0.5  
:0.5  :0.5  :0.5  
:0.5  :0.5  :0.5  
:1  :1  :1 
ViiB Supplementary results on link prediction
In this section we provide additional materials for our link prediction experiments. These results clarify our claim regarding HONEM link prediction performance on top predictions. This is shown in figure 7. We fixed the dimension at 128 and analyzed link prediction precision on evaluated edge pairs. We observe that while other baselines perform poorly on larger and sparser networks (Shipping and Wikipedia networks), HONEM provides significantly better results on all datasets.
Comments
There are no comments yet.