SURREAL: SUbgraph Robust REpresentAtion Learning

05/03/2018 ∙ by Saba A. Al-Sayouri, et al. ∙ University of Michigan Binghamton University University of California, Riverside 0

The success of graph embeddings or node representation learning in a variety of downstream tasks, such as node classification, link prediction, and recommendation systems, has led to their popularity in recent years. Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions. However, many existing algorithms generate embeddings that fail to properly preserve the network structure, or lead to unstable representations due to random processes (e.g., random walks to generate context) and, thus, cannot generate to multi-graph problems. In this paper, we propose a robust graph embedding using connection subgraphs algorithm, entitled: SURREAL, a novel, stable graph embedding algorithmic framework. SURREAL learns graph representations using connection subgraphs by employing the analogy of graphs with electrical circuits. It preserves both local and global connectivity patterns, and addresses the issue of high-degree nodes. Further, it exploits the strength of weak ties and meta-data that have been neglected by baselines. The experiments show that SURREAL outperforms state-of-the-art algorithms by up to 36.85 classification problem. Further, in contrast to baselines, SURREAL, being deterministic, is completely stable.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Conventional graph mining algorithms [8] have been designed to learn a set of hand-crafted features that best performs to conduct a specific downstream task; i.e., link prediction [14], node classification [4], and recommendation [26]. However, present research has steered the direction towards a more effective way to mine large-scale graphs: feature learning [3]

. That is, a unified set of features that can effectively generalize over distinct graph mining-related tasks is exploited. To this end, recent research efforts have focused on designing either unsupervised or semi-supervised algorithms to learn node representations. Such efforts have been initiated in the domain of natural language processing (NLP) 

[17, 13, 16], where two word2vec [16] models have been proposed, namely continuous bag of words (CBOW) and Skipgram. Inspired by the recent advancements in the NLP domain, and the analogy in the context, various algorithms have been developed to learn graph representations [18, 23, 9]. However, since real-world networks convey more complex relationships comparing to those emerge in corpora, some recent representation learning algorithms algorithms [18, 19, 9] fail to well-preserve network structure. This in turn impacts the quality of node representations, which compromises the performance of downstream processes. In addition, state-of-the-art algorithms share a major stability issue that renders them less robust and applicable, especially for multi-graph problems [11, 10]. In other words, it seems that while baseline representation learning algorithms strive to preserve similarities among nodes to generate and learn node representations, they fail to maintain similarities across runs of any of the algorithms, even with using the same data set[10] (e.g., graph similarity [12] and network alignment[2]).
The quality of the learned representations is heavily influenced by the preserved local and global structure. Therefore, we need to properly and neatly identify node neighborhood notions. For that, and to the best of our knowledge, we are the first to develop a robust graph embedding algorithm that preserves connectivity patterns unique to undirected and (un)weighted graphs. It employs the concept of network flow represented by connection subgraphs [7]. The connection subgraphs avail the analogy with electrical circuits. That is, a node is assumed to serve as a voltage source, and an edge is assumed to be a resistor, where its conductance is considered as the weight of the edge. Forming a connection subgraph allows us to: (1) Concurrently capture the node local and global connections, (2) Account for the node degree imbalances by downweighing the importance of paths through high-degree nodes (hops), (3) Take into account both low- and high-weight edges; and (4) Account for metadata that is largely being neglected by existing embedding algorithms. To summarize, our contributions are:

  1. Flow-based Formulation. We propose a graph embedding approach that robustly-preserves network local and global structure by leveraging the notion of network flow to produce approximate but high-quality connection subgraphs between pairs of non-adjacent nodes in undirected and (un)weighted large-scale graphs. We use the formed connection subgraphs to identify the node neighborhoods and not restrict ourselves just to one- or two-hop neighbors.

  2. Algorithm for Stable Representations. Contrary to all state-of-the-art methods, which involve randomness, reflected on the embeddings and their quality, our proposed robust graph embedding, titled RECS, produces consistent embeddings across independent runs.

  3. Experiments. We extensively evaluate RECS empirically, and we demonstrate that it outperforms the state-of-the-art algorithms in two aspects. (1) Effectiveness: RECS outperforms state-of-the-art algorithms by up to % on multi-label classification problem, and (2) Robustness: in contrast to baseline algorithms, experiments show that RECS is completely stable by performing a per dimension comparison of embeddings obtained from two runs of the same algorithm using an identical data set.

2 Related Work

Representation Learning. Recent work in network representation learning has been largely motivated by the new progress in natural language processing (NLP) domain [17, 13, 16], due to the existing analogy among the two fields, where a network is represented as a document. One of the NLP leading advancements is rooted to the SkipGram model, due to its efficiency in scaling to large-scale networks. However, merely adopting the SkipGram model for graph representation learning seems to be insufficient in capturing local and global connectivity patterns [18, 23, 9], because of the sophisticated connectivity patterns that emerge in networks, but not in text corpora. Specifically, DeepWalk [18], for instance, employs small truncated random walks to approximate the neighborhood of a node in a graph. LINE [23] proposes to preserve the network local and global structure using first- and second-order proximities, respectively. A more recent approach, node2vec [9], proposes to preserve graph unique connectivity patterns, homophily and structural equivalence, using biased random walks. Unlike these works, to satisfactorily define the node neighborhood notions, we propose to preserve linear and non-linear proximities while generating neighborhood notions, before being learned by the SkipGram model.

Connection Subgraphs. There is a significant body of work addressing the problem of finding the relationships between a set of given nodes in a network. For instance, [1] find simple pathways between a small set of marked nodes by leveraging the Minimum Description Length principle, while [25] defines the center-piece subgraph problem as finding the most central node with strong connections to a small set of input nodes. The work on connection subgraphs [7], which capture proximity among any two non-adjacent nodes in arbitrary undirected and (un)weighted graphs, is the most relevant to ours. In a nutshell, [7] includes two prime phases: candidate generation, and display generation. In the candidate generation phase, a distance-driven extraction of a much smaller subgraph is performed to generate candidate subgraph. At a high level, candidate subgraph is formed by gradually and neatly ‘expanding’ the neighborhoods of any two non-adjacent nodes until they ‘significantly’ overlap. Therefore, candidate subgraph contains the most prominent paths connecting a pair of non-adjacent nodes in the original undirected and (un)weighted graph. The generated candidate subgraph serves as an input to the next phase, i.e., the display generation. The display generation phase removes any remaining spurious regions in the candidate subgraph. The removal process is current-oriented; it aims to add an end-to-end path at a time between the two selected non-adjacent nodes that maximizes the delivered current (network flow) over all paths of its length. Typically, for a large-scale graph, the display subgraph is expected to have 20-30 nodes. Connection subgraphs have also been employed for graph visualization [20]. Our work is the first to leverage connection subgraphs to define appropriate neighborhood notions for representation learning.

3 Proposed Method: Recs

In this section, we describe our proposed method, RECS

, a deterministic algorithm that is capable of preserving local and global—beyond two hops—connectivity patterns. It consists of two main steps: (1) Neighborhood definition via connection subgraphs, and (2) Node representation vector update. We discuss the two steps in 

3.1 and 3.2, respectively. We note that RECS is deterministic, and thus can be applied to multi-graph problems, unlike previous works [18, 9, 19] that employ random processes, such as random walks.

Our method operates on an (un)weighted and undirected graph , with nodes and edges. For a given node , we define its 1-hop neighborhood as (i.e., set of nodes that are directly connected to ).

Table 1: Qualitative comparison of the connection subgraph algorithm [7] vs. RECS.
Connection Subgraph RECS
Purpose Node proximity (for only 2 nodes) Neighborhood definition (for the whole graph)
Step 1 Candidate generation (distance-driven) Neighborhood expansion (distance-driven)
Step 2 Display generation (delivered current-driven) Neighborhood refinement (current-driven)
Efficiency Inefficient (for the whole graph) More efficient (for the whole graph)
Source
Target Universal sink node

3.1 Recs- Step 1: Neighborhood Definition

The heart of learning node representations is to obtain representative node neighborhoods, which preserve local and global connections simultaneously. Inspired by [7], we propose to define node neighborhoods by leveraging the analogy between graphs and electrical circuits, and adapting the connection subgraph algorithm (discussed in Section. 2) to our setting. In Table 1, we give a qualitative comparison of RECS and the connection subgraph algorithm [7], highlighting our major contributions.

The notion of connection subgraphs is beneficial in our setting, since they allow us to: (1) Better control the search space; (2) Benefit from the actual flow, meta-data, that is being neglected by state-of-the-art algorithms; (3) Exploit the strength of weak ties; (4) Avoid introducing randomness caused by random/biased walks; (5) Integrate two extreme search strategies, breadth-first search (BFS) and depth-first search (DFS) [27]; (6) Address the issue of high-degree nodes; and (7) Better handle non-adjacent nodes that are ubiquitous in real-world large-scale graphs.

The neighborhood definition step consists of two phases: (A) Neighborhood expansion, and (B) Neighborhood refinement. We provide an overview of each phase next, and an illustration in Fig. 1. The overall computational complexity of RECS is .

Figure 1: A description of RECS algorithm neighborhood definition step main phases: (a) Neighborhood expansion of node through neighbors to generate on distance basis. Node indicates the grounded universal sink node. (b) Neighborhood refinement of to generate on current basis.

Phase A: Neighborhood Expansion - . Given a node , we propose to gradually expand its neighborhood on a distance basis. Specifically, we employ the analogy with electrical circuits in order to capture the distances between and the other nodes in the network, and then leverage these distances to guide its neighborhood expansion.

Graph Construction. We first construct a modified network from by introducing a universal sink node (grounded, with voltage , and connect all the nodes (except from ) to that, as shown in Fig. 1(a). The newly added edges in for every node are weighted appropriately by the following weight or conductance (based on the circuit analogy):

(1)

where is the weight or conductance of the edge connecting nodes and , is the set of 1-hop neighbors of , and is a scalar (set to for unweighted graphs).

In the modified network , the distance, or proximity, between the given node and every other node is defined as:

(2)

where is the weighted degree of (i.e., the sum of the weights of its incident edges), and the distance for non-neighboring nodes and is defined as the distance from each one to their nearest common neighbor . This distance computation addresses the issue of high-degree nodes (which could make ‘unrelated’ nodes seem ‘close’) by significantly penalizing their effects in the numerator.

Distance-based Expansion. After constructing the circuit-based graph, we can leverage it to expand ’s neighborhood. Let be the set of expanded nodes that will form the expansion graph (initialized to {}), and be the set of pending nodes, initialized to ’s neighbors, . During the expansion process, we choose the closest node to (except for ), as defined by the distance function in Eq. (2). Intuitively, the closer the expanded node to the source node , the less information flow we lose. Once a node is added to the expansion subgraph, we add its immediate neighbors to , and we repeat the process until we have nodes, where is a constant that represents the desired size of expanded subgraph. We show the neighborhood expansion pseudocode in Algorithm 1-a. The procedure of computing the takes time.

Example 1

Figure 2 shows one example of generating for an undirected, unweighted graph , in which the original edges have conductance (weight) equal to 1, and the size of the expanded neighborhood is set to . The conductances for the new edges in (red-dotted lines), computed via Eq.(1), are shown in Fig. 2-a. Based on the distances between and every other node, which are defined by Eq. (2) and shown in Fig. 2-f, the neighborhood of is expanded on a distance basis.

Figure 2: Neighborhood expansion example.

Phase B: Neighborhood Refinement - . As shown in Figure 1-b, the neighborhood refinement phase takes an expanded subgraph as an input and returns a refined neighborhood subgraph as an output, which is free of spurious graph regions. Unlike the previous phase that is based on distances, the refined subgraph is generated on a network flow (current) basis.

In a nutshell, in this phase, we first link the nodes of the expansion subgraph (except for node ) to the previously introduced grounded node . Then, we create the refined neighborhood subgraph by adding end-to-end paths from node to node one at a time, in decreasing order of total current. The underlying intuition of the refinement phase is to maximize the current reaches to node from the source node . By maximizing the current, we maximize the information flow between the source node and node , which ultimately serves our goal of including proximate nodes to the source node in its . The process stops when the maximum predetermined refined subgraph size, , is reached. Each time a path is added to the refined subgraph, only the nodes that are not already included in the subgraph are added. We use dynamic programming to implement our refinement process, which is like a depth first search (DFS) approach with a slight modification.

To that end, we need to calculate the current flows between any pair of neighbors in the expanded subgraph. In our context, indicates the meta-data or network flow that we aim to avail. We compute the current flow from source node to target node using Ohm’s law:

(3)

where the are the voltages of and , satisfying the downhill constraint (otherwise, there would be current flows in the opposite direction). In order to guarantee this satisfaction, we need to sort subgraph’s nodes in a descending order, based on their calculated voltage values, before we start current computations. The voltage of a node is defined as:

(4)

where is the conductance or weight of the edge between nodes and , as defined in Eq. (1).

Example 2

Given the expanded neighborhood in Example 1, the second phase of RECS gradually refines it on a current basis, as shown in Fig. 3. We first compute the voltages by solving the linear system in Eq.(4), and include them in the nodes of (b). Then, the current flow of each edge connecting nodes in the expanded neighborhood is computed using Eq.(3) such that the ‘downhill constraint’ is satisfied (current flowing from high to low voltage), as shown over the red-dotted edges in (b). Given the current values, we enumerate all possible paths between nodes and , and give their total current flows in (f). The paths are then added in descending order of total current values into until the stopping criterion is satisfied. In (c), we show the first such path. Assuming that the size of the refined neighborhood, , the final neighborhood is given in (d).

Figure 3: Neighborhood refinement example.

Remark 1: RECS neighborhood vs. context in baseline methods. Unlike existing representation learning methods: (1) We preserve the local and global structure of network by accounting for the immediate neighbors and neighbors at increasing distances of the source node to identify its neighborhood; (2) We generate neighborhoods on distance and network flow bases; (3) We address the issue of high-node degree distribution; (4) We concurrently identify neighborhoods while maximizing proximity among non-adjacent nodes, which are abundant in real-world networks; and (5) We design our algorithm such that it yields consistent stable representations that suite single and multi-graph problems.

Remark 2: RECS vs. connection subgraph algorithm [7]. It is important to note that the computations of ‘current’ (in RECS) and ‘delivered current’ (in [7]) are different. The computation of current is not as informative as delivered current, but is more efficient. The use of delivered current was not a major struggle in [7], because that algorithm only processes one subgraph. However, we find that it is problematic for generating multiple neighborhoods due to: (1) The large size of the expanded subgraph, ; (2) The large size of refined subgraph, (order of 800), compared to the display generation subgraph size capped at nodes; and (3) The extremely large number of subgraphs (equal to the number of nodes ) that need to be processed, to ultimately generate node neighborhoods.

3.2 Recs- Step 2: Node Representation Vector Update

After identifying node neighborhoods in a graph, we aim to learn node representations via the standard SkipGram model [16]. However, since RECS

yields completely deterministic representations, we avoid the randomness implied by the SkipGram model by using the same random seed every time we employ it. The Skipgram objective maximizes the log-probability of observing the neighborhood generated during the neighborhood definition step, given each node’s feature representation:

(5)

where is the refined neighborhood of node , and is its feature representation. Following common practice, we make the maximum likelihood optimization tractable by making two assumptions:

Assumption 1 – Conditional independence. We assume that the likelihood of observing node ’s neighborhood is independent of observing any other neighborhood, given its feature representation :

(6)

where represents any node that belongs to node ’s refined neighborhood.

Assumption 2 – Symmetry in feature space. The source node and any node in its refined neighborhood , have a symmetrical impact on each other in the continuous feature space. Therefore, the conditional probability, , is modeled using the softmax function:

(7)

Based on the above two assumptions, we can simplify the objective in Eq.(5) as follows:

(8)

It is important to note that performing such calculations for each node in large-scale graphs is computationally expensive. Therefore, we approximate the function using negative sampling [17]. We optimize the objective shown in Eq.8 using stochastic gradient decent.

4 Experiments

In this section, we aim to answer the following questions: (Q1) How does RECS perform in multi-label classification compared to baseline representation learning approaches? (Q2) How stable are the representations that RECS and baseline methods learn? (Q3) How sensitive is RECS

to its hyperparameters? Before we answer these questions, we provide an overview of the datasets, and the baseline representation learning algorithms that we use in our evaluation.

Datasets. To showcase the generalization capability of RECS over distinct domains, we use a variety of datasets, which we briefly describe in Table 2.

Dataset # Vertices # Edges # Labels  Network Type
PPI [5] 3,890 76,584 50 Biological
Wikipedia [15] 4,777 184,812 40 Language
BlogCatalog [24] 10,312 333,983 39 Social
CiteSeer [22] 3,312 4,660 6 Citation
Flickr [24] 80,513 5,899,882 195 Social
Table 2: A brief description of evaluation datasets.

Baseline Algorithms. We compare RECS with three state-of-the-art baselines: DeepWalk [18], node2vec [9], and Walklets [19] . The reason why we choose these state-of-the-art methods is the random way they adopt for neighborhood definition using random walks. On the contrary, in RECS, we follow a completely deterministic manner, which makes our method applicable for single and multi-graph problems. For all of the methods, we set the number of walks per node to , walk length to , the neighborhood size to , and number of dimensions of the feature representation . For node2vec, we set the return parameter , and the in-out parameter , in order to capture the homophily, and the structural equivalence connectivity patterns, respectively . For Walklets, we set the feature representation scale, , which captures the relationships captured at scale .

Experimental Setup. For RECS parameter settings, we set the expansion neighborhood subgraph size . In order to compare with the baseline methods, we set the refinement neighborhood subgraph size, , and the number of dimensions of the feature representation, , in line with the values used for DeepWalk, node2vec, and Walklets.

4.1 Q1. Multi-label Classification

Setup. Multi-label classification is a single-graph canonical task, where each node in a graph is assigned a single or multiple labels from a finite set

. We input the learned node representations to a one-vs-rest logistic regression classifier with L2 regularization. We perform

-fold cross validation and report the mean Micro-F1 score results. We omit the results of other evaluation metrics—i.e., Macro-F1 score, because they follow the exact same trend. It is worth mentioning that multi-label classification is a challenging task, especially when the finite set of labels

is large, or the fraction of labeled vertices is small [21].

Results. In Table 3, we demonstrate the performance of RECS algorithm and compare it to the three representation learning state-of-the-art methods. Our results are statistically significant with a -value . Overall, RECS outperforms or is competitive with the baseline methods, while also having the benefit of generalizing to the multi-network problems that the other methods fail to address. Below we discuss the experimental results by dataset.
PPI: It is remarkable that using various percentages of labeled nodes, RECS outperforms all the baselines. For instance, RECS is more effective than DeepWalk by when the labeled nodes are sparse (), for of labeled nodes, and when the percentage of labeled nodes is .
Wikipedia: We observe that RECS outperforms the three baseline algorithms by up to when using of labeled nodes. In the only case where RECS does not beat node2vec, it is ranked second.
BlogCatalog: We observe that RECS has a comparable or better performance than DeepWalk and Walklets for various percentages of labeled nodes. Specifically, it outperforms DeepWalk by up to and Walklets by up to , when the percentage of labeled nodes is . For more labeled nodes, RECS achieves similar performance to node2vec.
CiteSeer: Similar to Wikipedia, RECS outperforms the state-of-the-art algorithms, and achieves a maximum gain of with of labeled nodes.
Flickr: We perceive that RECS outperforms the other three baselines by up to , when using of labeled nodes.
Discussion: From the results, it is evident that RECS mostly outperforms the baseline techniques on PPI, Wikipedia, CiteSeer, and Flickr networks, with exceptions, where RECS was very close to the best method. This can be rooted in the fact that RECS is more capable in preserving the global structure in such networks. On the other hand, although RECS has a very comparable performance with node2vec on BlogCatalog dataset, it might be that the 2 order biased random walks of node2vec are slightly more capable in preserving the homophily, and the structural equivalence connectivity patterns in social networks.

Algorithm PPI Wikipedia BlogCatalog CiteSeer Flickr
10% 50% 90% 10% 50% 90% 10% 50% 90% 10% 50% 90% 10% 50% 90%
DeepWalk 12.35 18.23 20.39 42.33 44.57 46.19 30.12 34.28 34.83 46.56 52.01 53.32 37.70 39.62 42.36
node2vec 16.19 20.64 21.75 44.38 48.37 48.85 34.53 36.94 37.99 50.92 52.49 56.72 38.90 41.39 43.91
Walklets 16.07 21.44 22.10 43.69 44.68 45.17 26.90 29.09 30.41 47.89 52.73 54.83 38.32 40.58 42.62
RECS 16.91 21.71 23.97 45.68 48.10 49.90 31.02 34.85 36.42 48.80 53.36 57.12 38.98 42.31 44.26
G.O. DWalk 36.85 19.08 17.55 7.90 7.91 8.03 3.00 1.63 4.55 4.80 2.59 7.13 3.40 6.79 4.49
G.O. N2vec 4.41 5.16 10.19 2.92 - 2.14 - - - - 1.63 0.70 0.21 2.22 0.80
G.O. Walk 5.19 1.23 8.47 4.53 7.64 10.48 15.27 19.80 19.75 1.87 1.18 4.17 1.72 4.26 3.85
Table 3: Micro-F1 scores for multi-label classification on various datasets. Numbers where RECS outperforms other baselines are bolded. By “G.O.” we denote “gain over”.

4.2 Q2. Representation Learning Stability

Setup. Surveying the existing node representation learning methods, we perceive that the tasks for which such algorithms are being evaluated on are limited to single graph-related tasks—i.e., prediction, recommendation, node classification, and visualization. Since many tasks involve multiple networks (e.g., graph similarity [12], graph alignment [2]

, temporal graph anomaly detection 

[12], brain network analysis for a group of subjects [6]), we seek to examine the similarity of representations learning approaches to multi-network settings.  [11] states that existing embedding algorithms are inappropriate for multi-graph problems, and attribute this to the fact that different runs of any algorithm yield different representations every time the algorithm is run even if the same dataset is used. To that end, RECS is fully deterministic, with the goal of achieving stable and robust outcomes. We evaluate this stability with respect to the following criteria: (1) Representation Stability, by verifying the similarity of the learned vectors across different independent runs of the algorithms, and (2) Performance Stability, where we use embeddings from different runs in a classification task and we measure the variation in the classification performance. Ideally, a robust embedding should satisfy both criteria.

Results. Here we list the results of the two stability experiments.
Representations stability. Figure 4 shows the embeddings of two different runs of each approach against each other for a randomly selected set of nodes. For , we visualize the results for three randomly selected dimensions of node2vec, DeepWalk, and Walklets. For RECS, we intentionally choose the same three dimensions randomly selected for each of the baseline methods. In the interest of space, we only show the visualization results of RECS using the same three dimensions () used for Walklets dataset. The results are equivalent for all the dimensions. If all points fall on (or close to) the diagonal, this indicates stability, which is a desirable attribute of a robust graph embedding. Figures 4(a-c) show that, as expected node2vec, DeepWalk, and Walklets, suffer from significant variation across runs. To the contrary, Figure 4d shows that RECS obtain perfectly consistent embeddings across runs, and thus it is robust.

Figure 4: PPI data: Comparison of embeddings per dimension for a random sample of nodes. Node2vec, DeepWalk, Walklets, and RECS are run two times. The -axis represents first run representations values, and the -axis represents second run representations values. Three dimensions are selected randomly for each algorithm. The RECS-based representations are robust across runs (perfectly fall on a straight line ), which is not the case for node2vec, DeepWalk, and Walklets. The results are consistent for all the datasets.

Performance stability. The literature in representation learning has routinely overlooked the effect of instability/randomness of the learned representations and its effect on performance of downstream tasks. In other words, our performance stability hypothesis states that in addition to representation quality, representation stability also matters. For that, we run node2vec, the approach that sometimes outperformed ours in the classification task, 10 times using evaluation datasets to see if unstable embeddings can statistically impact multi-classification task performance. For all the datasets, we get a -value . Specifically for Wikipedia, -value , which we show in Figure 5. Therefore, in addition to the learned representations quality, performance can be compromised by the learned representations instability. This emphasizes the significance of robustly learning node representations.

Figure 5: Micro-F1 score Boxplots of node2vec on the Wikipedia dataset. Unstable embeddings across multiple runs can statistically impact the performance of the classification task.

4.3 Q3. Parameter Sensitivity

For sensitivity analysis, we use the Wikipedia dataset with labeled nodes. We perform the following three experiments:

Size of the expansion neighborhood subgraph . First; we demonstrate the impact of varying the size of the expanded neighborhood, , in a multi-label classification problem. Therefore, we run RECS by varying the size of from to nodes in increments. We limit the size of the refined neighborhood, . Figure 6-a shows the Micro-F1 score results. We observe that by increasing the size of , the corresponding Micro-F1 score increases up to a certain limit (), while it starts to decrease afterwards. This can be attributed to the fact that enlarging the to more than introduces noise to the generated neighborhood, which ultimately compromises the performance.

Figure 6: Performance sensitivity analysis of RECS.

Size of the refinement neighborhood subgraph . Fixing the size of expanded neighborhood, , we now examine the impact of altering the size of the refined neighborhood, , in a multi-label classification problem. For that, we run RECS, while varying the size of from to nodes in increments. We set the size of the expanded neighborhood, . Figure 6-b shows the Micro-F1 results. We observe that increasing the is accompanied by an increase in the Micro-F1 score. This is rooted in the fact that enlarging the includes more useful information in the refined neighborhoods, which SkipGram model [16] leverages to learn and update the node representations.

Number of dimensions . Fixing the sizes of the expanded subgraph, , and the refined subgraph, , we demonstrate the impact of varying the representation number of dimensions, , in a multi-label classification problem. For that, we run RECS, while varying from to . Figure 6-c shows the Micro-F1 results. We note that the Micro-F1 score constantly increases by increasing up to , which corresponds to , while it starts to drop afterwards. We root this in the fact that using higher number of dimensions could introduce unrelated dimensions to the representation space, which eventually impact the performance.

5 Conclusion

We propose a novel and stable representation learning algorithm; RECS, using connection subgraphs. In contrast to representation learning baseline algorithms, RECS generates entirely deterministic representations, which makes it more appealing for single- and multi-graph problems. We empirically demonstrate RECS’s efficacy and stability over state-of-the-art approaches. Experiments show that RECS is more or as effective as baselines, and is completely stable. In our future work, we will address the interpretability aspect that is not well-examined in the representation learning literature. We will also address the issue of embedding update, especially for a recently-joined node that has no evident connections. This problem is very related to the “cold-start” problem in the recommendation systems, where a new user joins the system and we seek external information for him, in order to properly compute his profile. Similarly, we will explore different forms of external context and meta-data for the recently-joined nodes, which can help us address connection sparsity.

References

  • [1] Leman Akoglu, Duen Horng Chau, Jilles Vreeken, Nikolaj Tatti, Hanghang Tong, and Christos Faloutsos. Mining Connection Pathways for Marked Nodes in Large Graphs, pages 37–45. 2013.
  • [2] Mohsen Bayati, Margot Gerritsen, David F Gleich, Amin Saberi, and Ying Wang. Algorithms for large, sparse network alignment problems. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 705–710. IEEE, 2009.
  • [3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  • [4] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. Node classification in social networks. In Social network data analytics, pages 115–148. Springer, 2011.
  • [5] Bobby-Joe Breitkreutz, Chris Stark, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, Michael Livstone, Rose Oughtred, Daniel H Lackner, Jürg Bähler, Valerie Wood, et al. The biogrid interaction database: 2008 update. Nucleic acids research, 36(suppl_1):D637–D640, 2007.
  • [6] Fabrizio De Vico Fallani, Jonas Richiardi, Mario Chavez, and Sophie Achard. Graph analysis of functional brain networks: practical issues in translational neuroscience. Phil. Trans. R. Soc. B, 369(1653):20130521, 2014.
  • [7] Christos Faloutsos, Kevin S McCurley, and Andrew Tomkins. Fast discovery of connection subgraphs. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 118–127. ACM, 2004.
  • [8] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. arXiv preprint arXiv:1705.02801, 2017.
  • [9] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864. ACM, 2016.
  • [10] M. Heimann, H. Shen, and D. Koutra. Node Representation Learning for Multiple Networks: The Case of Graph Alignment. ArXiv e-prints, 2018.
  • [11] Mark Heimann and Danai Koutra. On generalizing neural node embedding methods to multi-network problems. In ACM SIGKDD International Worshop on Mining and Learning with Graphs (MLG), 2017.
  • [12] Danai Koutra, Joshua T Vogelstein, and Christos Faloutsos. Deltacon: A principled massive-graph similarity function. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 162–170. SIAM, 2013.
  • [13] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In

    Proceedings of the 31st International Conference on Machine Learning (ICML-14)

    , pages 1188–1196, 2014.
  • [14] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks. journal of the Association for Information Science and Technology, 58(7):1019–1031, 2007.
  • [15] Matt Mahoney. Large text compression benchmark, 2011.
  • [16] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  • [17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
  • [18] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
  • [19] Bryan Perozzi, Vivek Kulkarni, and Steven Skiena. Walklets: Multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115, 2016.
  • [20] José F. Rodrigues, Jr., Hanghang Tong, Agma J. M. Traina, Christos Faloutsos, and Jure Leskovec. Gmine: A system for scalable, interactive graph visualization and mining. In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB ’06, pages 1195–1198. VLDB Endowment, 2006.
  • [21] Ryan A Rossi, Rong Zhou, and Nesreen K Ahmed. Deep feature learning for graphs. arXiv preprint arXiv:1704.08829, 2017.
  • [22] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93, 2008.
  • [23] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067–1077. ACM, 2015.
  • [24] Lei Tang, Xufei Wang, and Huan Liu. Scalable learning of collective behavior. IEEE Transactions on Knowledge and Data Engineering, 24(6):1080–1091, 2012.
  • [25] Hanghang Tong and Christos Faloutsos. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 404–413. ACM, 2006.
  • [26] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 283–292. ACM, 2014.
  • [27] Rong Zhou and Eric A Hansen.

    Breadth-first heuristic search.

    Artificial Intelligence, 170(4-5):385–408, 2006.