1 Introduction
Conventional graph mining algorithms [8] have been designed to learn a set of handcrafted features that best performs to conduct a specific downstream task; i.e., link prediction [14], node classification [4], and recommendation [26]. However, present research has steered the direction towards a more effective way to mine largescale graphs: feature learning [3]
. That is, a unified set of features that can effectively generalize over distinct graph miningrelated tasks is exploited. To this end, recent research efforts have focused on designing either unsupervised or semisupervised algorithms to learn node representations. Such efforts have been initiated in the domain of natural language processing (NLP)
[17, 13, 16], where two word2vec [16] models have been proposed, namely continuous bag of words (CBOW) and Skipgram. Inspired by the recent advancements in the NLP domain, and the analogy in the context, various algorithms have been developed to learn graph representations [18, 23, 9]. However, since realworld networks convey more complex relationships comparing to those emerge in corpora, some recent representation learning algorithms algorithms [18, 19, 9] fail to wellpreserve network structure. This in turn impacts the quality of node representations, which compromises the performance of downstream processes. In addition, stateoftheart algorithms share a major stability issue that renders them less robust and applicable, especially for multigraph problems [11, 10]. In other words, it seems that while baseline representation learning algorithms strive to preserve similarities among nodes to generate and learn node representations, they fail to maintain similarities across runs of any of the algorithms, even with using the same data set[10] (e.g., graph similarity [12] and network alignment[2]).The quality of the learned representations is heavily influenced by the preserved local and global structure. Therefore, we need to properly and neatly identify node neighborhood notions. For that, and to the best of our knowledge, we are the first to develop a robust graph embedding algorithm that preserves connectivity patterns unique to undirected and (un)weighted graphs. It employs the concept of network flow represented by connection subgraphs [7]. The connection subgraphs avail the analogy with electrical circuits. That is, a node is assumed to serve as a voltage source, and an edge is assumed to be a resistor, where its conductance is considered as the weight of the edge. Forming a connection subgraph allows us to: (1) Concurrently capture the node local and global connections, (2) Account for the node degree imbalances by downweighing the importance of paths through highdegree nodes (hops), (3) Take into account both low and highweight edges; and (4) Account for metadata that is largely being neglected by existing embedding algorithms. To summarize, our contributions are:

Flowbased Formulation. We propose a graph embedding approach that robustlypreserves network local and global structure by leveraging the notion of network flow to produce approximate but highquality connection subgraphs between pairs of nonadjacent nodes in undirected and (un)weighted largescale graphs. We use the formed connection subgraphs to identify the node neighborhoods and not restrict ourselves just to one or twohop neighbors.

Algorithm for Stable Representations. Contrary to all stateoftheart methods, which involve randomness, reflected on the embeddings and their quality, our proposed robust graph embedding, titled RECS, produces consistent embeddings across independent runs.

Experiments. We extensively evaluate RECS empirically, and we demonstrate that it outperforms the stateoftheart algorithms in two aspects. (1) Effectiveness: RECS outperforms stateoftheart algorithms by up to % on multilabel classification problem, and (2) Robustness: in contrast to baseline algorithms, experiments show that RECS is completely stable by performing a per dimension comparison of embeddings obtained from two runs of the same algorithm using an identical data set.
2 Related Work
Representation Learning. Recent work in network representation learning has been largely motivated by the new progress in natural language processing (NLP) domain [17, 13, 16], due to the existing analogy among the two fields, where a network is represented as a document. One of the NLP leading advancements is rooted to the SkipGram model, due to its efficiency in scaling to largescale networks. However, merely adopting the SkipGram model for graph representation learning seems to be insufficient in capturing local and global connectivity patterns [18, 23, 9], because of the sophisticated connectivity patterns that emerge in networks, but not in text corpora. Specifically, DeepWalk [18], for instance, employs small truncated random walks to approximate the neighborhood of a node in a graph. LINE [23] proposes to preserve the network local and global structure using first and secondorder proximities, respectively. A more recent approach, node2vec [9], proposes to preserve graph unique connectivity patterns, homophily and structural equivalence, using biased random walks. Unlike these works, to satisfactorily define the node neighborhood notions, we propose to preserve linear and nonlinear proximities while generating neighborhood notions, before being learned by the SkipGram model.
Connection Subgraphs. There is a significant body of work addressing the problem of finding the relationships between a set of given nodes in a network. For instance, [1] find simple pathways between a small set of marked nodes by leveraging the Minimum Description Length principle, while [25] defines the centerpiece subgraph problem as finding the most central node with strong connections to a small set of input nodes. The work on connection subgraphs [7], which capture proximity among any two nonadjacent nodes in arbitrary undirected and (un)weighted graphs, is the most relevant to ours. In a nutshell, [7] includes two prime phases: candidate generation, and display generation. In the candidate generation phase, a distancedriven extraction of a much smaller subgraph is performed to generate candidate subgraph. At a high level, candidate subgraph is formed by gradually and neatly ‘expanding’ the neighborhoods of any two nonadjacent nodes until they ‘significantly’ overlap. Therefore, candidate subgraph contains the most prominent paths connecting a pair of nonadjacent nodes in the original undirected and (un)weighted graph. The generated candidate subgraph serves as an input to the next phase, i.e., the display generation. The display generation phase removes any remaining spurious regions in the candidate subgraph. The removal process is currentoriented; it aims to add an endtoend path at a time between the two selected nonadjacent nodes that maximizes the delivered current (network flow) over all paths of its length. Typically, for a largescale graph, the display subgraph is expected to have 2030 nodes. Connection subgraphs have also been employed for graph visualization [20]. Our work is the first to leverage connection subgraphs to define appropriate neighborhood notions for representation learning.
3 Proposed Method: Recs
In this section, we describe our proposed method, RECS
, a deterministic algorithm that is capable of preserving local and global—beyond two hops—connectivity patterns. It consists of two main steps: (1) Neighborhood definition via connection subgraphs, and (2) Node representation vector update. We discuss the two steps in
3.1 and 3.2, respectively. We note that RECS is deterministic, and thus can be applied to multigraph problems, unlike previous works [18, 9, 19] that employ random processes, such as random walks.Our method operates on an (un)weighted and undirected graph , with nodes and edges. For a given node , we define its 1hop neighborhood as (i.e., set of nodes that are directly connected to ).
Connection Subgraph  RECS  
Purpose  Node proximity (for only 2 nodes)  Neighborhood definition (for the whole graph) 
Step 1  Candidate generation (distancedriven)  Neighborhood expansion (distancedriven) 
Step 2  Display generation (delivered currentdriven)  Neighborhood refinement (currentdriven) 
Efficiency  Inefficient (for the whole graph)  More efficient (for the whole graph) 
Source  
Target  Universal sink node 
3.1 Recs Step 1: Neighborhood Definition
The heart of learning node representations is to obtain representative node neighborhoods, which preserve local and global connections simultaneously. Inspired by [7], we propose to define node neighborhoods by leveraging the analogy between graphs and electrical circuits, and adapting the connection subgraph algorithm (discussed in Section. 2) to our setting. In Table 1, we give a qualitative comparison of RECS and the connection subgraph algorithm [7], highlighting our major contributions.
The notion of connection subgraphs is beneficial in our setting, since they allow us to: (1) Better control the search space; (2) Benefit from the actual flow, metadata, that is being neglected by stateoftheart algorithms; (3) Exploit the strength of weak ties; (4) Avoid introducing randomness caused by random/biased walks; (5) Integrate two extreme search strategies, breadthfirst search (BFS) and depthfirst search (DFS) [27]; (6) Address the issue of highdegree nodes; and (7) Better handle nonadjacent nodes that are ubiquitous in realworld largescale graphs.
The neighborhood definition step consists of two phases: (A) Neighborhood expansion, and (B) Neighborhood refinement. We provide an overview of each phase next, and an illustration in Fig. 1. The overall computational complexity of RECS is .
Phase A: Neighborhood Expansion  . Given a node , we propose to gradually expand its neighborhood on a distance basis. Specifically, we employ the analogy with electrical circuits in order to capture the distances between and the other nodes in the network, and then leverage these distances to guide its neighborhood expansion.
Graph Construction. We first construct a modified network from by introducing a universal sink node (grounded, with voltage , and connect all the nodes (except from ) to that, as shown in Fig. 1(a). The newly added edges in for every node are weighted appropriately by the following weight or conductance (based on the circuit analogy):
(1) 
where is the weight or conductance of the edge connecting nodes and , is the set of 1hop neighbors of , and is a scalar (set to for unweighted graphs).
In the modified network , the distance, or proximity, between the given node and every other node is defined as:
(2) 
where is the weighted degree of (i.e., the sum of the weights of its incident edges), and the distance for nonneighboring nodes and is defined as the distance from each one to their nearest common neighbor . This distance computation addresses the issue of highdegree nodes (which could make ‘unrelated’ nodes seem ‘close’) by significantly penalizing their effects in the numerator.
Distancebased Expansion. After constructing the circuitbased graph, we can leverage it to expand ’s neighborhood. Let be the set of expanded nodes that will form the expansion graph (initialized to {}), and be the set of pending nodes, initialized to ’s neighbors, . During the expansion process, we choose the closest node to (except for ), as defined by the distance function in Eq. (2). Intuitively, the closer the expanded node to the source node , the less information flow we lose. Once a node is added to the expansion subgraph, we add its immediate neighbors to , and we repeat the process until we have nodes, where is a constant that represents the desired size of expanded subgraph. We show the neighborhood expansion pseudocode in Algorithm 1a. The procedure of computing the takes time.
Example 1
Figure 2 shows one example of generating for an undirected, unweighted graph , in which the original edges have conductance (weight) equal to 1, and the size of the expanded neighborhood is set to . The conductances for the new edges in (reddotted lines), computed via Eq.(1), are shown in Fig. 2a. Based on the distances between and every other node, which are defined by Eq. (2) and shown in Fig. 2f, the neighborhood of is expanded on a distance basis.
Phase B: Neighborhood Refinement  . As shown in Figure 1b, the neighborhood refinement phase takes an expanded subgraph as an input and returns a refined neighborhood subgraph as an output, which is free of spurious graph regions. Unlike the previous phase that is based on distances, the refined subgraph is generated on a network flow (current) basis.
In a nutshell, in this phase, we first link the nodes of the expansion subgraph (except for node ) to the previously introduced grounded node . Then, we create the refined neighborhood subgraph by adding endtoend paths from node to node one at a time, in decreasing order of total current. The underlying intuition of the refinement phase is to maximize the current reaches to node from the source node . By maximizing the current, we maximize the information flow between the source node and node , which ultimately serves our goal of including proximate nodes to the source node in its . The process stops when the maximum predetermined refined subgraph size, , is reached. Each time a path is added to the refined subgraph, only the nodes that are not already included in the subgraph are added. We use dynamic programming to implement our refinement process, which is like a depth first search (DFS) approach with a slight modification.
To that end, we need to calculate the current flows between any pair of neighbors in the expanded subgraph. In our context, indicates the metadata or network flow that we aim to avail. We compute the current flow from source node to target node using Ohm’s law:
(3) 
where the are the voltages of and , satisfying the downhill constraint (otherwise, there would be current flows in the opposite direction). In order to guarantee this satisfaction, we need to sort subgraph’s nodes in a descending order, based on their calculated voltage values, before we start current computations. The voltage of a node is defined as:
(4) 
where is the conductance or weight of the edge between nodes and , as defined in Eq. (1).
Example 2
Given the expanded neighborhood in Example 1, the second phase of RECS gradually refines it on a current basis, as shown in Fig. 3. We first compute the voltages by solving the linear system in Eq.(4), and include them in the nodes of (b). Then, the current flow of each edge connecting nodes in the expanded neighborhood is computed using Eq.(3) such that the ‘downhill constraint’ is satisfied (current flowing from high to low voltage), as shown over the reddotted edges in (b). Given the current values, we enumerate all possible paths between nodes and , and give their total current flows in (f). The paths are then added in descending order of total current values into until the stopping criterion is satisfied. In (c), we show the first such path. Assuming that the size of the refined neighborhood, , the final neighborhood is given in (d).
Remark 1: RECS neighborhood vs. context in baseline methods. Unlike existing representation learning methods: (1) We preserve the local and global structure of network by accounting for the immediate neighbors and neighbors at increasing distances of the source node to identify its neighborhood; (2) We generate neighborhoods on distance and network flow bases; (3) We address the issue of highnode degree distribution; (4) We concurrently identify neighborhoods while maximizing proximity among nonadjacent nodes, which are abundant in realworld networks; and (5) We design our algorithm such that it yields consistent stable representations that suite single and multigraph problems.
Remark 2: RECS vs. connection subgraph algorithm [7]. It is important to note that the computations of ‘current’ (in RECS) and ‘delivered current’ (in [7]) are different. The computation of current is not as informative as delivered current, but is more efficient. The use of delivered current was not a major struggle in [7], because that algorithm only processes one subgraph. However, we find that it is problematic for generating multiple neighborhoods due to: (1) The large size of the expanded subgraph, ; (2) The large size of refined subgraph, (order of 800), compared to the display generation subgraph size capped at nodes; and (3) The extremely large number of subgraphs (equal to the number of nodes ) that need to be processed, to ultimately generate node neighborhoods.
3.2 Recs Step 2: Node Representation Vector Update
After identifying node neighborhoods in a graph, we aim to learn node representations via the standard SkipGram model [16]. However, since RECS
yields completely deterministic representations, we avoid the randomness implied by the SkipGram model by using the same random seed every time we employ it. The Skipgram objective maximizes the logprobability of observing the neighborhood generated during the neighborhood definition step, given each node’s feature representation:
(5) 
where is the refined neighborhood of node , and is its feature representation.
Following common practice, we make the maximum likelihood optimization tractable by making two assumptions:
Assumption 1 – Conditional independence. We assume that the likelihood of observing node ’s neighborhood is independent of observing any other neighborhood, given its feature representation :
(6) 
where represents any node that belongs to node ’s refined neighborhood.
Assumption 2 – Symmetry in feature space. The source node and any node in its refined neighborhood , have a symmetrical impact on each other in the continuous feature space. Therefore, the conditional probability, , is modeled using the softmax function:
(7) 
Based on the above two assumptions, we can simplify the objective in Eq.(5) as follows:
(8) 
4 Experiments
In this section, we aim to answer the following questions: (Q1) How does RECS perform in multilabel classification compared to baseline representation learning approaches? (Q2) How stable are the representations that RECS and baseline methods learn? (Q3) How sensitive is RECS
to its hyperparameters? Before we answer these questions, we provide an overview of the datasets, and the baseline representation learning algorithms that we use in our evaluation.
Datasets. To showcase the generalization capability of RECS over distinct domains, we use a variety of datasets, which we briefly describe in Table 2.
Dataset  # Vertices  # Edges  # Labels  Network Type 

PPI [5]  3,890  76,584  50  Biological 
Wikipedia [15]  4,777  184,812  40  Language 
BlogCatalog [24]  10,312  333,983  39  Social 
CiteSeer [22]  3,312  4,660  6  Citation 
Flickr [24]  80,513  5,899,882  195  Social 
Baseline Algorithms. We compare RECS with three stateoftheart baselines: DeepWalk [18], node2vec [9], and Walklets [19] . The reason why we choose these stateoftheart methods is the random way they adopt for neighborhood definition using random walks. On the contrary, in RECS, we follow a completely deterministic manner, which makes our method applicable for single and multigraph problems. For all of the methods, we set the number of walks per node to , walk length to , the neighborhood size to , and number of dimensions of the feature representation . For node2vec, we set the return parameter , and the inout parameter , in order to capture the homophily, and the structural equivalence connectivity patterns, respectively . For Walklets, we set the feature representation scale, , which captures the relationships captured at scale .
Experimental Setup. For RECS parameter settings, we set the expansion neighborhood subgraph size . In order to compare with the baseline methods, we set the refinement neighborhood subgraph size, , and the number of dimensions of the feature representation, , in line with the values used for DeepWalk, node2vec, and Walklets.
4.1 Q1. Multilabel Classification
Setup. Multilabel classification is a singlegraph canonical task, where each node in a graph is assigned a single or multiple labels from a finite set
. We input the learned node representations to a onevsrest logistic regression classifier with L2 regularization. We perform
fold cross validation and report the mean MicroF1 score results. We omit the results of other evaluation metrics—i.e., MacroF1 score, because they follow the exact same trend. It is worth mentioning that multilabel classification is a challenging task, especially when the finite set of labels
is large, or the fraction of labeled vertices is small [21].Results. In Table 3, we demonstrate the performance of RECS algorithm and compare it to the three representation learning stateoftheart methods. Our results are statistically significant with a value . Overall, RECS outperforms or is competitive with the baseline methods, while also having the benefit of generalizing to the multinetwork problems that the other methods fail to address. Below we discuss the experimental results by dataset.
PPI: It is remarkable that using various percentages of labeled nodes, RECS outperforms all the baselines. For instance, RECS is more effective than DeepWalk by when the labeled nodes are sparse (), for of labeled nodes, and when the percentage of labeled nodes is .
Wikipedia: We observe that RECS outperforms the three baseline algorithms by up to when using of labeled nodes. In the only case where RECS does not beat node2vec, it is ranked second.
BlogCatalog: We observe that RECS has a comparable or better performance than DeepWalk and Walklets for various percentages of labeled nodes. Specifically, it outperforms DeepWalk by up to and Walklets by up to , when the percentage of labeled nodes is . For more labeled nodes, RECS achieves similar performance to node2vec.
CiteSeer: Similar to Wikipedia, RECS outperforms the stateoftheart algorithms, and achieves a maximum gain of with of labeled nodes.
Flickr: We perceive that RECS outperforms the other three baselines by up to , when using of labeled nodes.
Discussion: From the results, it is evident that RECS mostly outperforms the baseline techniques on PPI, Wikipedia, CiteSeer, and Flickr networks, with exceptions, where RECS was very close to the best method. This can be rooted in the fact that RECS is more capable in preserving the global structure in such networks. On the other hand, although RECS has a very comparable performance with node2vec on BlogCatalog dataset, it might be that the 2 order biased random walks of node2vec are slightly more capable in preserving the homophily, and the structural equivalence connectivity patterns in social networks.
Algorithm  PPI  Wikipedia  BlogCatalog  CiteSeer  Flickr  

10%  50%  90%  10%  50%  90%  10%  50%  90%  10%  50%  90%  10%  50%  90%  
DeepWalk  12.35  18.23  20.39  42.33  44.57  46.19  30.12  34.28  34.83  46.56  52.01  53.32  37.70  39.62  42.36 
node2vec  16.19  20.64  21.75  44.38  48.37  48.85  34.53  36.94  37.99  50.92  52.49  56.72  38.90  41.39  43.91 
Walklets  16.07  21.44  22.10  43.69  44.68  45.17  26.90  29.09  30.41  47.89  52.73  54.83  38.32  40.58  42.62 
RECS  16.91  21.71  23.97  45.68  48.10  49.90  31.02  34.85  36.42  48.80  53.36  57.12  38.98  42.31  44.26 
G.O. DWalk  36.85  19.08  17.55  7.90  7.91  8.03  3.00  1.63  4.55  4.80  2.59  7.13  3.40  6.79  4.49 
G.O. N2vec  4.41  5.16  10.19  2.92    2.14          1.63  0.70  0.21  2.22  0.80 
G.O. Walk  5.19  1.23  8.47  4.53  7.64  10.48  15.27  19.80  19.75  1.87  1.18  4.17  1.72  4.26  3.85 
4.2 Q2. Representation Learning Stability
Setup. Surveying the existing node representation learning methods, we perceive that the tasks for which such algorithms are being evaluated on are limited to single graphrelated tasks—i.e., prediction, recommendation, node classification, and visualization. Since many tasks involve multiple networks (e.g., graph similarity [12], graph alignment [2]
, temporal graph anomaly detection
[12], brain network analysis for a group of subjects [6]), we seek to examine the similarity of representations learning approaches to multinetwork settings. [11] states that existing embedding algorithms are inappropriate for multigraph problems, and attribute this to the fact that different runs of any algorithm yield different representations every time the algorithm is run even if the same dataset is used. To that end, RECS is fully deterministic, with the goal of achieving stable and robust outcomes. We evaluate this stability with respect to the following criteria: (1) Representation Stability, by verifying the similarity of the learned vectors across different independent runs of the algorithms, and (2) Performance Stability, where we use embeddings from different runs in a classification task and we measure the variation in the classification performance. Ideally, a robust embedding should satisfy both criteria.Results. Here we list the results of the two stability experiments.
Representations stability. Figure 4 shows the embeddings of two different runs of each approach against each other for a randomly selected set of nodes. For , we visualize the results for three randomly selected dimensions of node2vec, DeepWalk, and Walklets. For RECS, we intentionally choose the same three dimensions randomly selected for each of the baseline methods. In the interest of space, we only show the visualization results of RECS using the same three dimensions () used for Walklets dataset. The results are equivalent for all the dimensions. If all points fall on (or close to) the diagonal, this indicates stability, which is a desirable attribute of a robust graph embedding. Figures 4(ac) show that, as expected node2vec, DeepWalk, and Walklets, suffer from significant variation across runs. To the contrary, Figure 4d shows that RECS obtain perfectly consistent embeddings across runs, and thus it is robust.
Performance stability. The literature in representation learning has routinely overlooked the effect of instability/randomness of the learned representations and its effect on performance of downstream tasks. In other words, our performance stability hypothesis states that in addition to representation quality, representation stability also matters. For that, we run node2vec, the approach that sometimes outperformed ours in the classification task, 10 times using evaluation datasets to see if unstable embeddings can statistically impact multiclassification task performance. For all the datasets, we get a value . Specifically for Wikipedia, value , which we show in Figure 5. Therefore, in addition to the learned representations quality, performance can be compromised by the learned representations instability. This emphasizes the significance of robustly learning node representations.
4.3 Q3. Parameter Sensitivity
For sensitivity analysis, we use the Wikipedia dataset with labeled nodes. We perform the following three experiments:
Size of the expansion neighborhood subgraph . First; we demonstrate the impact of varying the size of the expanded neighborhood, , in a multilabel classification problem. Therefore, we run RECS by varying the size of from to nodes in increments. We limit the size of the refined neighborhood, . Figure 6a shows the MicroF1 score results. We observe that by increasing the size of , the corresponding MicroF1 score increases up to a certain limit (), while it starts to decrease afterwards. This can be attributed to the fact that enlarging the to more than introduces noise to the generated neighborhood, which ultimately compromises the performance.
Size of the refinement neighborhood subgraph . Fixing the size of expanded neighborhood, , we now examine the impact of altering the size of the refined neighborhood, , in a multilabel classification problem. For that, we run RECS, while varying the size of from to nodes in increments. We set the size of the expanded neighborhood, . Figure 6b shows the MicroF1 results. We observe that increasing the is accompanied by an increase in the MicroF1 score. This is rooted in the fact that enlarging the includes more useful information in the refined neighborhoods, which SkipGram model [16] leverages to learn and update the node representations.
Number of dimensions . Fixing the sizes of the expanded subgraph, , and the refined subgraph, , we demonstrate the impact of varying the representation number of dimensions, , in a multilabel classification problem. For that, we run RECS, while varying from to . Figure 6c shows the MicroF1 results. We note that the MicroF1 score constantly increases by increasing up to , which corresponds to , while it starts to drop afterwards. We root this in the fact that using higher number of dimensions could introduce unrelated dimensions to the representation space, which eventually impact the performance.
5 Conclusion
We propose a novel and stable representation learning algorithm; RECS, using connection subgraphs. In contrast to representation learning baseline algorithms, RECS generates entirely deterministic representations, which makes it more appealing for single and multigraph problems. We empirically demonstrate RECS’s efficacy and stability over stateoftheart approaches. Experiments show that RECS is more or as effective as baselines, and is completely stable. In our future work, we will address the interpretability aspect that is not wellexamined in the representation learning literature. We will also address the issue of embedding update, especially for a recentlyjoined node that has no evident connections. This problem is very related to the “coldstart” problem in the recommendation systems, where a new user joins the system and we seek external information for him, in order to properly compute his profile. Similarly, we will explore different forms of external context and metadata for the recentlyjoined nodes, which can help us address connection sparsity.
References
 [1] Leman Akoglu, Duen Horng Chau, Jilles Vreeken, Nikolaj Tatti, Hanghang Tong, and Christos Faloutsos. Mining Connection Pathways for Marked Nodes in Large Graphs, pages 37–45. 2013.
 [2] Mohsen Bayati, Margot Gerritsen, David F Gleich, Amin Saberi, and Ying Wang. Algorithms for large, sparse network alignment problems. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 705–710. IEEE, 2009.
 [3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
 [4] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. Node classification in social networks. In Social network data analytics, pages 115–148. Springer, 2011.
 [5] BobbyJoe Breitkreutz, Chris Stark, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, Michael Livstone, Rose Oughtred, Daniel H Lackner, Jürg Bähler, Valerie Wood, et al. The biogrid interaction database: 2008 update. Nucleic acids research, 36(suppl_1):D637–D640, 2007.
 [6] Fabrizio De Vico Fallani, Jonas Richiardi, Mario Chavez, and Sophie Achard. Graph analysis of functional brain networks: practical issues in translational neuroscience. Phil. Trans. R. Soc. B, 369(1653):20130521, 2014.
 [7] Christos Faloutsos, Kevin S McCurley, and Andrew Tomkins. Fast discovery of connection subgraphs. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 118–127. ACM, 2004.
 [8] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. arXiv preprint arXiv:1705.02801, 2017.
 [9] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864. ACM, 2016.
 [10] M. Heimann, H. Shen, and D. Koutra. Node Representation Learning for Multiple Networks: The Case of Graph Alignment. ArXiv eprints, 2018.
 [11] Mark Heimann and Danai Koutra. On generalizing neural node embedding methods to multinetwork problems. In ACM SIGKDD International Worshop on Mining and Learning with Graphs (MLG), 2017.
 [12] Danai Koutra, Joshua T Vogelstein, and Christos Faloutsos. Deltacon: A principled massivegraph similarity function. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 162–170. SIAM, 2013.

[13]
Quoc Le and Tomas Mikolov.
Distributed representations of sentences and documents.
In
Proceedings of the 31st International Conference on Machine Learning (ICML14)
, pages 1188–1196, 2014.  [14] David LibenNowell and Jon Kleinberg. The linkprediction problem for social networks. journal of the Association for Information Science and Technology, 58(7):1019–1031, 2007.
 [15] Matt Mahoney. Large text compression benchmark, 2011.
 [16] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
 [17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
 [18] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
 [19] Bryan Perozzi, Vivek Kulkarni, and Steven Skiena. Walklets: Multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115, 2016.
 [20] José F. Rodrigues, Jr., Hanghang Tong, Agma J. M. Traina, Christos Faloutsos, and Jure Leskovec. Gmine: A system for scalable, interactive graph visualization and mining. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB ’06, pages 1195–1198. VLDB Endowment, 2006.
 [21] Ryan A Rossi, Rong Zhou, and Nesreen K Ahmed. Deep feature learning for graphs. arXiv preprint arXiv:1704.08829, 2017.
 [22] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina EliassiRad. Collective classification in network data. AI magazine, 29(3):93, 2008.
 [23] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077. ACM, 2015.
 [24] Lei Tang, Xufei Wang, and Huan Liu. Scalable learning of collective behavior. IEEE Transactions on Knowledge and Data Engineering, 24(6):1080–1091, 2012.
 [25] Hanghang Tong and Christos Faloutsos. Centerpiece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 404–413. ACM, 2006.
 [26] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 283–292. ACM, 2014.

[27]
Rong Zhou and Eric A Hansen.
Breadthfirst heuristic search.
Artificial Intelligence, 170(45):385–408, 2006.