ccpca
ccPCA: contrasting clusters in PCA.
view repo
Contrastive learning (CL) is an emerging analysis approach that aims to discover unique patterns in one dataset relative to another. By applying this approach to network analysis, we can reveal unique characteristics in one network by contrasting with another. For example, with networks of protein interactions obtained from normal and cancer tissues, we can discover unique types of interactions in cancer tissues. However, existing CL methods cannot be directly applied to networks. To address this issue, we introduce a novel approach called contrastive network representation learning (cNRL). This approach embeds network nodes into a low-dimensional space that reveals the uniqueness of one network compared to another. Within this approach, we also design a method, named i-cNRL, that offers interpretability in the learned results, allowing for understanding which specific patterns are found in one network but not the other. We demonstrate the capability of i-cNRL with multiple network models and real-world datasets. Furthermore, we provide quantitative and qualitative comparisons across i-cNRL and other potential cNRL algorithm designs.
READ FULL TEXT VIEW PDFccPCA: contrasting clusters in PCA.
Networks are commonly used to model various types of relationships in real-world applications, such as social networks (crnovrsanin2014visualization), cellular networks (chen2004content), and communication networks (bhanot2005optimizing). Comparative analysis of networks is an essential task in practice, where we want to identify differentiating factors between two networks or the uniqueness of one network compared to another (emmert2016fifty; tantardini2019comparing). For instance, when a neuroscientist is studying the effect of Alzheimer’s disease on a human brain (gaiteri2016genetic), they want to compare the brain network of a patient with Alzheimer’s disease to that of a healthy subject. Also, for collaboration networks of researchers in different fields (lariviere2006canadian), an analyst in a funding agency may want to discover any unique ways of collaborations in the fields for decision making.
Several approaches have been proposed for network comparison (tantardini2019comparing). When two different networks have the same node-set and the pairwise correspondence between nodes is known, we can compute a similarity between two networks (e.g., a Euclidean distance between two adjacency matrices). When the node-correspondence is unknown or does not exist, a network-statistics based approach is commonly used (e.g., the clustering coefficient, network diameter, or node degree distribution). Another popular approach is using graphlets (tantardini2019comparing)—small, connected, and non-isomorphic subgraph patterns in a graph (e.g., the complete graph of three nodes). The similarities of two networks can be characterized by comparing the frequency of appearance of each graphlet in each network.
While the existing approaches can provide a (dis)similarity between different networks, they compare networks only based on one selected measure (e.g., node degree), which is often insufficient. Also, these approaches only provide network-level similarities, and thus cannot compare networks in more detailed levels (e.g., a node-level). Without such a detailed-level comparison, it is difficult to find which part of a network relates to its uniqueness.
To address these challenges, we introduce a new approach that integrates the concept of contrastive learning (CL) (zou2013contrastive; abid2018exploring) together with network representation learning (NRL), which we call cNRL. Within cNRL, the NRL enables the characterization of networks with comprehensive measures without overwhelming a user with information by embedding nodes into a low-dimensional space; the CL allows for discovering unique patterns in one dataset relative to another (abid2018exploring). By leveraging the benefits of both, we can reveal unique patterns in one network by contrasting with another, in a thorough (i.e., using multiple essential measures to capture the network characteristics) and detailed (i.e., analyzing a node or subnetwork level) manner.
With our approach, we consider the generality and interpretability of cNRL, and contribute a method called i-cNRL. First, i-cNRL is designed not to require node-correspondences or network alignment (emmert2016fifty), and thus is applicable to various networks. Also, unlike many other NRL methods (e.g., node2vec (grover2016node2vec)
and graph neural networks (GNNs)
(zhang2018deep)), i-cNRL offers interpretability (adadi2018peeking), providing information about the meaning of an identified pattern and the reason why that pattern can be seen in only that network.In summary, our main contributions include:
A new approach, called contrastive network representation learning (cNRL), which aims to reveal unique patterns in one network relative to another network.
A method exemplifying cNRL, called i-cNRL, which (1) offers general applicability, including networks without node-correspondence or network alignment, (2) provides interpretability for helping understand revealed patterns, and (3) equips automatic hyperparameter selection for CL.
Experiments with multiple network models and real-world datasets, which demonstrate the capability of comparative network analysis.
Quantitative and qualitative comparisons with other potential designs of cNRL methods.
We here define the problem to be addressed by contrastive network representation learning. Given two different networks, a target network and a background network , we want to seek unique patterns in relative to . Similar to contrastive learning (zou2013contrastive), the unique patterns can be represented as relationships (zou2013contrastive) (e.g., the structural differences among network nodes) that appear in but do not appear in .
For example, when finding unique patterns in a scale-free network (i.e., its node-degree distribution follows a power law (barabasi2016network)) relative to a random network
(i.e., each node pair is connected with a fixed probability
(barabasi2016network)), we should be able to capture the unique patterns related to node degrees since has more variety in node degrees. For practical usage, the unique patterns could relate to more complicated centralities, measures, combinations of them, and many more. Note that, as with the existing work of contrastive learning (zou2013contrastive; abid2018exploring), cNRL does not aim to discriminate from , but to identify unique patterns in .with ‘eigenvector’ as the base feature
(see Table 2).To provide an illustrative example of analysis with cNRL, we begin by comparing two different social networks. We use the Dolphin social network (lusseau2003bottlenose) as and the Zachary’s karate club network (zachary1977information) as . Fig. 1LABEL:sub@fig:dolphin and LABEL:sub@fig:karate depict the network structures of these networks. The statistics of these networks can be found in Table 1 (see N1 and N2). By comparing these two networks, we want to reveal unique patterns in the Dolphin social network and identify which network characteristics relate to the patterns.
We apply our i-cNRL to the two networks and then plot a 2D embedding result with contrastive PCA (cPCA) (abid2018exploring), as shown in Fig. 1LABEL:sub@fig:dolphin_karate_cpca. The - and -directions in Fig. 1LABEL:sub@fig:dolphin_karate_cpca represent the first and second contrastive principal components (cPCs), respectively. Details of i-cNRL and related techniques will be described in Sec. 5. Fig. 1LABEL:sub@fig:dolphin_karate_cpca shows that the nodes in are more widely distributed, whereas the nodes in are placed only around the center, which reveals some patterns specific to when compared with .
Moreover, since i-cNRL offers interpretability to the learned results, we can analyze why the above patterns appear. As shown in Table 2, the method provides contrastive principal component (cPC) loadings (fujiwara2020supporting), of which the absolute value indicates how large each learned feature contributes to each cPC direction. Each learned feature can be represented as a combination of the relational function and the base feature (rossi2018deep) (see Sec. 5 for details). Table 2 indicates that feature F1-10 has the highest contribution to cPC1. From the relational function and the base feature ‘eigenvector’ (newman2018networks), this feature is interpreted as “the mean eigenvector centrality of the neighbors of a node.”
ID | Name | # of nodes | # of links | Directed |
N1 | Dolphin (lusseau2003bottlenose) | 62 | 159 | False |
N2 | Karate (zachary1977information) | 34 | 78 | False |
N3 | Random | 100 | 471 | True |
N4 | Price | 100 | 294 | True |
N5 | p2p-Gnutella08 (leskovec2007graph; ripeanu2002mapping) | 6,301 | 20,777 | True |
N6 | Price 2 | 6,301 | 18,897 | True |
N7 | Enhanced Price | 6,301 | 18,281 | True |
N8 | Combined-AP/MS (collins2007toward; yu2008high) | 1,622 | 9,070 | False |
N9 | LC-multiple (reguly2006comprehensive; yu2008high) | 1,536 | 2,925 | False |
N10 | School-Day1 (stehle2011high) | 236 | 5,899 | False |
N11 | School-Day2 (stehle2011high) | 238 | 5,539 | False |
ID | relational function | base feature | cPC1 | cPC2 |
F1-1 | total-degree | 0.00 | -0.02 | |
F1-2 | betweenness | -0.00 | -0.00 | |
F1-3 | closeness | 0.00 | 0.00 | |
F1-4 | eigenvector | -0.04 | 0.00 | |
F1-5 | PageRank | 0.04 | 0.04 | |
F1-6 | Katz | 0.00 | -0.02 | |
F1-7 | total-degree | -0.06 | -0.08 | |
F1-8 | betweenness | 0.05 | -0.01 | |
F1-9 | closeness | -0.08 | 0.01 | |
F1-10 | eigenvector | 0.26 | 0.02 | |
F1-11 | PageRank | -0.11 | 0.15 | |
F1-12 | Katz | -0.08 | -0.09 | |
F1-13 | PageRank | -0.01 | 0.00 | |
F1-14 | total-degree | -0.06 | -0.00 | |
F1-15 | PageRank | 0.01 | -0.00 | |
To investigate the relationships between this feature and the i-cNRL result, we colorcode the network nodes in Fig. 1LABEL:sub@fig:dolphin, LABEL:sub@fig:karate, and LABEL:sub@fig:dolphin_karate_cpca based on the feature values, as shown in Fig. 1LABEL:sub@fig:dolphin_colored, LABEL:sub@fig:karate_colored, and LABEL:sub@fig:dolphin_karate_cpca_colored. We can see that, in Fig. 1LABEL:sub@fig:dolphin_karate_cpca_colored, the nodes around the top-left corner tend to have smaller feature values while the nodes around the bottom-right tend to have higher values. By comparing with Fig. 1LABEL:sub@fig:dolphin_colored, we notice that these two node groups correspond to the top-left and bottom-right communities in Fig. 1LABEL:sub@fig:dolphin_colored. Since the feature value shows the mean eigenvector centrality of the neighbors of a node, the nodes in the top-left community tend to have a low eigenvector centrality including their neighbors. On the other hand, the nodes in the right-bottom community have neighbors with a high eigenvector centrality. Fig. 1LABEL:sub@fig:karate_colored indicates that does not have such clearly separated communities by the feature values, unlike . Therefore, i-cNRL learns the patterns highly related to the eigenvector centralities of each node’s neighbors, which can clearly separate the two communities in the Dolphin social network.
Notations for CNRL | |
---|---|
, | target and background networks |
, | adjacency matrices of and |
, | matrices of node attributes of and |
, | numbers of nodes in and |
, | numbers of attributes in and |
, | numbers of edges in and |
, | numbers of features learned by NRL and CL |
, | target and background feature matrices |
projection matrix learned by CL | |
, | contrastive representations of and |
Notations for DeepGL | |
base feature (e.g., in-degree) | |
relational function | |
, , | relational feature operators for in-, out-, total neighbors |
summary measure (e.g., mean, sum, and maximum) | |
set of learned features with relational feature operators | |
set of learned features: | |
maximum numbers of relational feature operators to use | |
Notations for cPCA | |
, | covariance matrices |
contrastive parameter | |
Fig. 2 shows a general architecture for cNRL. Notations used for the following sections are listed in Table 3. The current CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) require target and background feature matrices ( and ) sharing the same features as inputs. However, matrices that represent target and background networks ( and ) such as adjacency matrices ( and ) might have a different number of nodes or no correspondence in nodes of and . Thus, we cannot directly apply the CL methods to target and background networks ( and ). To address this issue, our core idea of cNRL consists of two main steps: (1) generating feature matrices and from networks and respectively by using NRL, and (2) applying CL on and .
Below we describe the details of each part of the cNRL architecture with requirements on inputs, NRL, and cNRL algorithms. Here we focus only on node feature learning to provide a simple and clear explanation. However, the architecture is generic enough to be used for link (or edge) feature learning.
Inputs. cNRL takes and as inputs. These networks can be any combination of being undirected or directed, unweighted or weighted, and non-attributed or attributed. The numbers of and nodes (i.e., and ) do not have to be the same. Similarly, the numbers of attributes and may be different.
Network representation learning. The first step in Fig. 2 is applying an NRL method in order to transform the inputs and to feature matrices and , respectively. CL requires that and share the same features by nature of its learning purpose. Therefore, for this process, we need to use an NRL method that can produce the same features across networks.
Contrastive learning. Once we obtain and , which have the same learned features, we can apply any of the CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) using and as target and background datasets, respectively. CL generates a parametric mapping (or a projection matrix ) from features learned by NRL to contrastive features (). With this projection matrix, and can be transformed to contrastive representations and , respectively. As the existing CL works (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) only produced for their analysis, the generation of is optional. However, as demonstrated in Fig. 1LABEL:sub@fig:dolphin_karate_cpca, by visualizing both and in one plot, we can clearly see whether CL has found unique patterns in relative to .
As a specific method using the architecture above, we describe i-cNRL, which employs DeepGL (rossi2018deep) for NRL and cPCA (abid2018exploring) for CL, with the design rationale for the selection of these algorithms.
As stated in Sec. 4, NRL needs to generate and , which have the same features. To achieve this, we can employ any inductive NRL method (rossi2018deep) (e.g., GraphSAGE (hamilton2017inductive) and FastGCN (chen2018fastgcn)). However, we want to provide the interpretability in the contrastive representations obtained by cNRL; thus, an NRL method needs to generate interpretable features as the learned result. As a result, we specifically use DeepGL (rossi2018deep) in the first step of i-cNRL.
The method learns node and link features consisting of the base feature and relational function . For a concise explanation, we describe DeepGL for only node feature learning.
A base feature is any simple feature or measure we can obtain for each node. For example, can be (weighted) in-, out-, total-degree, degeneracy (or -core numbers) (newman2018networks), PageRank (newman2018networks), or a node attribute (e.g., gender of a node in a social network).
A relational function is a combination of relational feature operators, which is applied to a base feature. A relational feature operator summarizes base feature values of one-hop neighbors of a node. For example, the operator can be a computation of the mean, sum, maximum base feature values of one-hop neighbors’ of a node. Also, the neighbors can be either in-, out-, total-neighbors. Together with the summary measure (e.g., mean), the operators can be denoted , , and , respectively. For example, computes the mean of the in-neighbors of a node. Moreover, the relational feature operator can be applied repeatedly. For example, first computes the maximum of in-neighbors for each out-neighbor of a node and then produces the mean of these maximum values. As described with the examples above, and are combinations of simple measures and operators; thus, both are interpretable.
In DeepGL, we can select as many different base features and relational feature operators as we want to consider. The learning process contains number of iterations (indicated by the user), and in the end we obtain all the learned features , each of which is a relational function over a base feature . During each iteration, DeepGL prunes redundant features based on the similarities of the obtained feature values. Table 2 shows an example of learned features from the Dolphin social network (lusseau2003bottlenose).
As described above, the learned features by DeepGL are the combinations of the base features and relational functions. Once we obtain from one network, we can naturally compute
for other networks. That is, DeepGL is inductive and can be used for transfer learning
(rossi2018deep).In cNRL, we need to decide which network(s), and/or , should be used for learning . One possible choice is applying DeepGL for both to learn the features and . Then, we can use the union of these features (i.e., ) for producing feature matrices and . Since cNRL aims to identify unique patterns in relative to , only a set of features capturing ’s characteristics is required. Thus, we apply DeepGL to and use the learned features for both and to generate and . It can also avoid unnecessary computation for learning from .
The above NRL step generates feature matrices and . The remaining step is learning contrastive representations and through CL. While we can use any CL method, one of our goals is to provide interpretability. Since DeepGL generates interpretable features for and , we can provide interpretable and by using a method that reveals interpretable relationships between features learned by NLR and features learned by CL. Among current CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised), only contrastive PCA (cPCA) (abid2018exploring) can provide such relationships by utilizing the linearity of its algorithm in a similar manner to ordinary PCA (jolliffe1986principal). Thus, we select cPCA for the second step of i-cNRL, though, it can be replaced with any other interpretable CL methods developed in the future.
cPCA (abid2018exploring) is a variant of PCA for CL. Similar to the classical PCA (jolliffe1986principal), cPCA first applies centering to each feature of and and then obtains their corresponding covariance matrices and . Let
be any unit vector of
length. Then, with a given direction, the variances for
and can be written as: , . The optimization that finds a direction where has high variance but has low variance can thus be written as:(1) |
where is a contrast parameter . Similar to the classical PCA, we can obtain top- cPCs as the learned features. With projection matrix consisting of cPCs (i.e., is a matrix), we can obtain the contrastive representation of .
The above contrast parameter controls the trade-off between having high target variance and low background variance. When , cPCs only maximize the variance of , the same as those in the classical PCA. As increases, cPCs place greater emphasis on directions that reduce the variance of . Fig. 3 shows the results of cPCA with different values. Because has a strong impact on the result, Abid and Zhang et al. (abid2018exploring) introduced the semi-automatic selection of
utilizing spectral clustering
(ng2002spectral). We go one step further to provide a fully automatic selection of (see Sec. 5.2.3).By applying cPCA to and , we can generate the projection matrix and contrastive representations and . Because each learned feature by DeepGL could have a different scale, as a default, our method applies the standardization to each of and for both learning and projection.
To provide interpretable relationships between NLR features and CL features , we compute contrastive PC loadings (cPC loadings) as introduced in (fujiwara2020supporting). These cPC loadings indicate how strongly each of the input features contributes to the corresponding cPC. Table 2 shows an example of cPC loadings for the first and second cPCs. As demonstrated in Sec. 3, by referring to a list of the learned features via NRL and cPC loadings, we can interpret the obtained representations and .
We now show how to automatically select the parameter in cPCA. Since we want to maximize the variation in the target feature matrix while simultaneously minimizing the variation in the background feature matrix, we can solve the following ratio problem:
(2) |
While directly solving (2) may be difficult, there is a convenient iterative algorithm due to Dinkelbach67. The algorithm consists of two steps. Given , we perform
.
Clearly, is just the objective value of our ratio problem (2) evaluated at the current solution . It is easy to show that monotonically increases to the maximum value, and the convergence is usually very quick (e.g., less than 10 iterations). Conveniently, the second step for finding the next solution is just the original cPCA problem, where we use as our trade-off parameter. We can also regard cPCA as a (crude) one-shot algorithm for the ratio problem (2) where the user specifies . One problem of the method above is that reaches close to infinite when is nearly singular. To avoid this, our method simply adds a small constant value , as a default , to each diagonal element of . We note that the above algorithm of Dinkelbach67 has been used in discriminant analysis (GuoLYSW03; JiaNZ09), whose motivation is entirely different from ours.
The time and space complexities of i-cNRL are comparable to those of DeepGL and cPCA. DeepGL’s time and space complexities for learning from are and , respectively, where is the number of links in . Note that the time and space complexities for computing base features are assumed lower than these. When including the transfer learning step to obtain , the space complexity becomes . For a fixed , cPCA has the similar time and space complexities with PCA, which are and . Even with the automatic selection of in Sec. 5.2.3, we can assume that these complexities stay the same. This is because the automatic selection usually only needs a small number of iterations (e.g., less than 10) and does not require storing of additional information. Thus, in total, i-cNRL has the time complexity and the space complexity . However, in practice, , the number of features learned by NRL, should be much smaller than the numbers of nodes and links of and . Under this assumption, the time and space complexities are and , respectively. This indicates that the computational cost is largely due to DeepGL.
To the best of our knowledge, our work is the first to propose contrastive learning for networks and provide a general and interpretable method under this approach. There exists little work in the exact area. Thus, we here review typical NRL and CL techniques.
Various NRL methods have been developed for learning latent representations of network nodes and/or links. For a comprehensive description of NRL methods, refer to the recent survey papers, such as (cai2018comprehensive; zhang2018deep). Here we focus on describing the closely related work using inductive and cross-network embedding methods.
GraphSAGE (hamilton2017inductive) is an inductive NRL method that share many similar ideas with DeepGL (rossi2018deep). Analogous to the relational functions in DeepGL, GraphSAGE learns aggregator functions
. However, GraphSAGE proposes more complex aggregators using LSTM and max-pooling concepts, compared to DeepGL’s simple aggregators (e.g., mean). Moreover, GraphSAGE tunes parameters required by the aggregators and matrices that decide the weight for each learned feature, instead of the feature pruning in DeepGL. These differences might enable GraphSAGE to better capture complex characteristics of networks without manual parameter tuning; however, the learned features might be difficult to interpret. FastGCN
(chen2018fastgcn) takes a similar approach to GraphSAGE except that FastGCN employs node sampling to save memory space. Also, HetGNN (zhang2019heterogeneous) enhances the aggregators to learn representations of heterogeneous networks. Thus, these methods, including other GNN variants (zhang2018deep) (e.g., GAT (velivckovic2017graph) and h/cGAO (gao2019graph)), still suffer from lack of interpretability in the learned features. Although GNNExplainer (ying2019gnnexplainer) aims to provide interpretable explanations for predictions made by these methods, it does not support explaining the learned features themselves.The inductive methods learn the features that can be generalized for unobserved nodes or other networks from one input network. On the contrary, the cross-network methods generate embeddings directly from multiple input networks. Most of the cross-network methods focus on finding similarities of nodes across networks, such as for node classification (shen2019network), network similarity calculation (ma2019deep), and network alignment (heimann2018regal). While CrossMVA (chu2019cross) is developed mainly for network alignment, it can produce embeddings that contain both similarity and dissimilarity information. However, a major drawback of CrossMVA is that anchor nodes are necessary as inputs (i.e., at least we need to know a small portion of node-correspondence), which we cannot obtain in many cases (e.g., the example in Sec. 3). Also, CrossMVA’s embeddings of the dissimilarity information only preserve discriminative structures across networks; as a result, it cannot find unique patterns in a specific network.
Unlike discriminant analysis, such as linear discriminant analysis (JiaNZ09), which aims to discriminate data points based on their classes, CL (zou2013contrastive) focuses on finding patterns which contrast one dataset with another (abid2018exploring)
. Several extended CL machine learning methods have been developed. For example, there are contrastive versions of latent Dirichlet allocation
(zou2013contrastive)(zou2013contrastive), and regressions (ge2016rich). More recently, including cPCA (abid2018exploring), CL methods for representation learning have been introduced (abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised). For example, Dirie et al. (dirie2019contrastive)proposed contrastive multivariate singular spectrum analysis (cMSSA) for decomposition of time-series data. Similar to cPCA, cMSSA could provide the interpretability by computing the PC loadings; however, cMSSA is not suitable for our case that handles non-time series data. On the other hand, contrastive variational autoencoder (cVAE)
(abid2019contrastive; severson2019unsupervised) can be used as a CL method in cNRL. The strength of cVAE over cPCA is that it can find unique patterns in a target dataset even when its data points and latent features have nonlinear relationships. However, cVAE relies on multiple layers of neural networks (NNs), and thus the results of cVAE are difficult to interpret as similar to other NN-based methods. Therefore, to use cVAE for interpretable cNRL, we need additional effort to help interpret the results.ID | relational function | base feature | cPC 1 | cPC 2 |
: Price, : Random (Sec. 7.1) | ||||
F2-1 | total-degree | 0.55 | 0.00 | |
F2-1 | out-degree | -0.40 | 0.00 | |
F2-3 | Katz | -0.19 | 0.06 | |
: Random, : Price (Sec. 7.1) | ||||
F3-1 | -core | 1.00 | -0.13 | |
F3-2 | total-degree | 0.18 | 0.47 | |
F3-3 | in-degree | -0.10 | -0.25 | |
: p2p-Gnutella08 , : Price 2 (Sec. 7.2.1) | ||||
F4-1 | -core | 1.01 | -0.10 | |
F4-2 | total-degree | 0.22 | 0.30 | |
F4-3 | in-degree | -0.12 | -0.17 | |
: p2p-Gnutella08 , : Enhanced Price (Sec. 7.2.1) | ||||
F5-1 | total-degree | -0.23 | 0.00 | |
F5-2 | in-degree | 0.12 | 0.05 | |
F5-3 | Katz | 0.10 | -0.05 | |
: LC-multiple , : Combined-AP/MS (Sec. 7.2.2) | ||||
F6-1 | Katz | 0.36 | 0.00 | |
F6-2 | eigenvector | -0.19 | -0.01 | |
F6-3 | total-degree | -0.14 | 0.02 | |
: School-Day2 , : School-Day1 (Sec. 7.2.3) | ||||
F7-1 | PageRank | 0.15 | 0.02 | |
F7-2 | closeness | 0.11 | -0.04 | |
F7-3 | betweenness | -0.09 | -0.01 | |
In the previous sections, we have introduced the concepts of cNRL and i-cNRL, as well as the related work. We have also demonstrated the effectiveness of i-cNRL in comparing social networks in Sec. 3. To further evaluate the method, we first test i-cNRL with synthetic datasets that are generated with popular network models. Then, we demonstrate several analysis examples using i-cNRL with publicly available real-world datasets (see Table 1). Lastly, we provide quantitative and qualitative comparisons among i-cNRL and other potential cNRL implementations. In each subsection, we list only the information closely related to our findings. Details of learning parameters and results are provided in Appendix C.
We apply i-cNRL to compare two types of synthetic networks: random and scale-free networks (N3 and N4 in Table 1). We generate the random and scale-free networks with the Gilbert’s random graph (barabasi2016network) and the Price’s preferential attachment models (newman2018networks), respectively. We produce two 2D embedding results, using one network as and the other as (Fig. 4LABEL:sub@fig:ba_rand_cpca and LABEL:sub@fig:rand_ba_cpca). Each of the results shows unique patterns in . The cPC loadings in Table 4 show that the Price network’s unique patterns are related to the degree centralities (e.g., total-degree). This seems to due to the fact that most nodes have the same number of links in a random network while a scale-free network contains hubs with a large number of links. In contrast, we can see that the random network’s uniqueness is mostly related to -core numbers. This is because the Price’s model generates a network by adding a new node and then connecting it to other fixed number of nodes (e.g., 3 nodes) which are selected with a certain computed probability. As a result, all nodes in the network have the same -core numbers (e.g., 3-core).
Designing a network model that can simulate real-world networks is fundamental to understand network formation mechanisms, to perform hypothetical analyses (e.g., if there will be growth of the number of nodes, what will happen?), to generate more available datasets for machine learning, and so on (goldenberg2010survey). In this case study, we demonstrate the usage of i-cNRL to guide a refinement of network models.
Here, we use a peer-to-peer (P2P) network, specifically the Gnutella peer-to-peer file sharing network (ripeanu2002mapping) available in SNAP^{1}^{1}1SNAP, https://snap.stanford.edu/, accessed: 2019-2-11 (N5 in Table 1) as a modeling subject. Once we have a P2P network generation model, we can use it for analyzing network robustness, studying effective searching strategies on a P2P network (liu2009efficient), etc.
P2P networks are often scale-free (liu2009efficient), so we use the Price’s model (newman2018networks) to mimic a P2P network. To identify the characteristics that the Price’s model does not simulate well, we set the P2P network (N5) as and the Price network (N6) as .
The result is shown in Fig. 5LABEL:sub@fig:p2p_price_cpca. From the cPC loadings in Table 4, we notice that the -core number (F4-1) has a strong contribution to cPC1. Thus, we colorcode the result based on the -core number, as shown in Fig. 5LABEL:sub@fig:p2p_price_cpca_colored. We can clearly see that the P2P network has variations in the -core number, but the Price network does not. Because the -core number indicates that a node at least connects to other nodes, the Price network makes a significant difference in the network robustness from the P2P network.
From the result above, we decide to refine the Price model to generate various -core numbers. As discussed in Sec. 7.1, the problem comes from the fact that the Price’s model always adds a new node with a fixed number of links. Similar to the dual-Barabási-Albert model (moshiri2018dual)
, we can avoid the problem by attaching a new node to a variable number of links according to a probability distribution. Specifically, we set the model to select the number of links from 1 to 10 with specified probabilities (for details, refer to
Sec. C.3.2). Then, we generate a network with this model, which is referred to as the Enhanced Price (N7) network in Table 1. Next, we apply i-cNRL to the P2P (as ) and enhanced Price (as ) networks. The resultant cPC loadings are listed in Table 4. While seems to still have the uniqueness in degree centralities, it does not in the -core number. By iteratively performing refinement procedures such as the one above, we can build a better network model to simulate real-world networks.In this case study, we compare “interactome” networks—networks of physical DNA-, RNA-, and protein-protein interactions (yu2008high). Specifically, we compare two interactome networks, Combined-AP/MS (N8 in Table 1) and LC-multiple (N9), available in CCSB Interactome Database^{2}^{2}2CCSB Interactome Database, http://interactome.dfci.harvard.edu/, accessed: 2019-1-28. Both networks represent the interactome of the yeast S. cerevisiae; however, they are obtained through different analysis approaches. Combined-AP/MS is generated from two studies using a “high-throughput” approach, specifically, affinity purification/mass spectrometry (AP/MS) (collins2007toward). In contrast, LC-multiple is the literature-curated (LC) network from multiple “low-throughput” experiments (yu2008high; reguly2006comprehensive). Because each analysis approach has its own strength in identifying the yeast’s interactions, the generated networks may vary (yu2008high). Comparing these networks is essential to understand the quality and characteristics of each approach (yu2008high).
Here we analyze the uniqueness in LC-multiple by using LC-multiple and Combined-AP/MS as and , respectively. The 2D embedding result by i-cNRL is shown in Fig. 6LABEL:sub@fig:lc_collins_cpca. We first notice that, in , there are two distinct regions: one spreading out towards the top-left and the other in the bottom-right quadrant. To understand why this pattern appears, we obtain the cPC loadings (Table 4) and color the nodes based on values of the feature that has the top cPC loading for cPC1 (i.e., F6-1, : and : the Katz centrality). The result is shown in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored. We observe that either going to the left or right side along cPC1 tends to produce a high value of this feature, as annotated with the green and teal rectangles, respectively. While this feature has a strong positive loading for cPC1, another feature in Table 4—F6-2, : and : the eigenvector centrality—has a strong negative loading. Therefore, if a node has a higher value for F6-2, it tends to be placed on the more left side in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored. This indicates that the green rectangle region in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored seems to have high values for both of these features while the teal region has low values for the latter feature (F6-2). This could happen because the eigenvector centrality tends to be low when a node is in a weakly connected region (newman2018networks) while the Katz centrality is high whenever a node is linked by many others.
To visually observe the above patterns, we draw the network structures of and with SFDP (hu2005efficient) and then color them based on the values of F6-1 (Fig. 6LABEL:sub@fig:lc_colored and LABEL:sub@fig:collins_colored). We here only show the largest component (newman2018networks) of each network (i.e., the nodes connected with only several nodes are filtered out). Fig. 6LABEL:sub@fig:collins_colored shows that one strongly connected region around the center contains all nodes with high feature values. On the other hand, in Fig. 6LABEL:sub@fig:lc_colored, multiple regions contain nodes with high feature values. To further investigate this pattern, we select the nodes corresponding to the green and teal regions in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored and then highlight these nodes in Fig. 6LABEL:sub@fig:lc_colored. Afterward, we zoom into the related regions of the highlighted nodes. Fig. 6LABEL:sub@fig:lc_colored1⃝ shows a region related to the nodes in the green rectangle, while Fig. 6LABEL:sub@fig:lc_colored2⃝ and 3⃝ are two example regions related to the teal rectangle region. We can see that the nodes in Fig. 6LABEL:sub@fig:lc_colored1⃝ are strongly connected, but not in Fig. 6LABEL:sub@fig:lc_colored2⃝ and 3⃝. From these observations, i-cNRL reveals that only has two different types of nodes linked to the high Katz centrality node(s) in either strongly or weakly connected region.
As an example of analyzing dynamic networks, we compare two different days of contact networks in a primary school^{3}^{3}3Available in SocioPatterns, http://www.sociopatterns.org/, accessed: 2019-1-28 (stehle2011high). The networks represent face-to-face contact patterns between students and teachers, which are collected with RFID devices. Information of the network at each day is listed in Table 1 (N10 and N11). Fig. 7LABEL:sub@fig:day2 and LABEL:sub@fig:day1 visualize the network structures drawn using SFDP. These networks also have multiple node attributes including genders, grades, and class names. In addition to multiple network centralities, we utilize the attribute information by including gender as the base feature, i.e., encoding ‘male’, ‘female’, and ‘unknown’ as -1, 1, and 0, respectively.
To analyze changes of contact patterns, we set the networks of the second day and the first day as and , respectively. Fig. 7LABEL:sub@fig:day2_day1_cpca shows the 2D embedding result. To interpret ’s unique patterns, we review the cPC loadings listed in Table 4 and color the nodes in Fig. 7LABEL:sub@fig:day2, LABEL:sub@fig:day1, and LABEL:sub@fig:day2_day1_cpca based on the learned feature F7-1—: , : PageRank. The results are shown in Fig. 7LABEL:sub@fig:day2_colored, LABEL:sub@fig:day1_colored, and LABEL:sub@fig:day2_day1_cpca_colored. We can see that i-cNRL discovers that has both strongly (colored with more yellow in Fig. 7LABEL:sub@fig:day2_colored and LABEL:sub@fig:day2_day1_cpca_colored) and weakly connected regions from others (colored with more purple), while all of ’s nodes have relatively strong connections between each other, as seen in the laid-out result in Fig. 7LABEL:sub@fig:day1.
According to the study in (stehle2011high), the students tended to have more contact within the same class than between classes. To relate the class information and the found unique patterns, we colorcode the nodes (i.e., students) based on their class, as shown in Fig. 7LABEL:sub@fig:day2_class, LABEL:sub@fig:day1_class, and LABEL:sub@fig:day2_day1_cpca_class. From these results, we notice that i-cNRL well separates groups of students who have less (e.g., gray, pink, or teal nodes) and more (e.g., orange nodes) contact between classes in .
Our i-cNRL utilizes DeepGL and cPCA for cNRL’s two essential components, NRL and CL, to provide interpretable results. However, if the interpretability is not required, we can replace each of the learning methods with other alternatives. Here we compare three different designs for cNRL: (1) DeepGL & cPCA, (2) GraphSAGE (hamilton2017inductive) & cPCA, and (3) DeepGL & cVAE (abid2019contrastive; severson2019unsupervised).
Here we compare the quality of contrastive representations obtained with each design. A good contrastive representation should more widely distribute nodes in the target network than the background, and it should also show different patterns in the target and background networks. For example, as shown in Fig. 3, cPCA () provides a better contrastive representation than PCA (). To compare the aspects above, we use three different dissimilarity measures: dispersion ratio, Bhattacharyya distance (bi2017uncertainty), and Kullback-Leibler (KL) divergence (wang2009divergence) from a set of nodes in to that in . The dispersion ratio represents how widely nodes in are scattered relative to . The Bhattacharyya distance indicates closeness or overlaps of nodes in and . The KL divergence of from shows the difference between their probability distributions of nodes. For all the above measures, the higher the value, the better the method design.
We calculate the dispersion ratio of to with: , where and are the scaled matrices of and obtained by applying the standardization to a concatenated matrix of and . We use and , instead of and , to avoid the scaling differences in the embedding’s axes across the three designs. For the Bhattacharyya distance and KL divergence, since we do not have the exact probability distributions of and
, we employ the estimation methods described in
(bi2017uncertainty) and (wang2009divergence).For GraphSAGE, we specifically select the GraphSAGE-maxpool model because it produces better results (hamilton2017inductive). We use the default parameter values in (hamilton2017inductive; abid2019contrastive) for GraphSAGE and cVAE, except that we set as the number of features leaned by GraphSAGE (see Sec. C.4). For the input features of GraphSAGE, we set the same base features used for DeepGL (see Table 6 for details). We obtain 2D embeddings with the cPCs (with cPCA) or salient latent variables (with cVAE) (abid2019contrastive). Since cVAE relies on the probabilistic encoders, the results could be different for each trial, and thus we compute the mean value of each measure for 10 trials.
Table 5 shows a comparison of the three methods on different networks using the measures above. We can see that in general DeepGL & cPCA and GraphSAGE & cPCA have better scores than DeepGL & cVAE. Between DeepGL & cPCA and GraphSAGE & cPCA, DeepGL & cPCA tends to provide better results except for the dolphin and Karate networks, which have small numbers of nodes.
dispersion ratio | Bhattacharyya | KL of from | ||||||||
(lr)3-5 (lr)6-8 (lr)9-11 | DG& | GS& | DG& | DG& | GS& | DG& | DG& | GS& | DG& | |
cPCA | cPCA | cVAE | cPCA | cPCA | cVAE | cPCA | cPCA | cVAE | ||
Dolphin | Karate | 174 | 9,754 | 1.78 | 1.40 | 1.73 | 0.93 | 6.82 | 12.76 | 0.83 |
P2P | Price 2 | 21,744 | 1,801 | 2.46 | 7.52 | 4.72 | 1.00 | 45.73 | 14.09 | 36.13 |
LC-multi. | C.-AP/MS | 376 | 54 | 2.95 | 1.52 | 1.76 | 0.29 | 18.49 | 16.61 | 15.01 |
Sch.-Day2 | Sch.-Day1 | 57 | 6 | 1.93 | 1.81 | 0.61 | 0.56 | 5.82 | 1.80 | 0.82 |
*DG=DeepGL, GS=GraphSAGE, P2P=p2p-Gnutella08, C.-AP/MS=Combined-AP/MS |
We visually compare the embedding results to review more detailed differences, as shown in Fig. 8. For cVAE, we show the results that have the longest Bhattacharyya distance from 10 trials. Because GraphSAGE and cVAE do not provide interpretable features, for the comparison, we colorcode the nodes of the target network by the feature values from the DeepGL results. In specific, the left three columns in Fig. 8 are colored based on values of the feature that has the top absolute loadings for cPC1 and the far right column is colored by their class name.
We can see that although the quality of the contrastive representation in Table 5 is different, these different designs seem to identify similar unique patterns. For instance, all the results of P2P and Price 2 show monotonic increase of the feature value (F4-1—-core numbers). Also, for LC-mupltiple and Combined-AP/MS, both DeepGL & cPCA and DeepGL & cVAE depict clearly separated patterns, as indicated with the green rectangles while GraphSAGE & cPCA does not show the same pattern. Furthermore, in each result of the school networks, we can see a distinct group that consists of gray nodes, as annotated with the red rectangles.
From the above quantitative and qualitative comparisons, we can see that DeepGL & cPCA (i.e., our i-cNRL) generates similar quality results when compared with the alternatives. However, the other two designs do not provide interpretable results.
This work introduces contrastive network representation learning (cNRL), which aims to reveal unique patterns in one network relative to another. Furthermore, we demonstrate a method of cNRL, i-cNRL, that is more generic and interpretable. With these contributions, our work provides a new approach for network comparison.
We have demonstrated the usability of i-cNRL with small- or medium-scale networks (less than 10,000 nodes) to provide intelligible examples. As a next step, we plan to apply i-cNRL on larger networks (e.g., networks with millions of nodes). When analyzing such large, complex networks, the linearity of cPCA used in i-cNRL might limit the capability of finding unique patterns. Therefore, we will investigate how to incorporate nonlinear contrastive learning methods (such as cVAE) for cNRL while retaining interpretability.
For the evaluation, we use the datasets in various data repositories including SNAP, CCSB Interactome Database, and SocioPatterns as well as the synthetic datasets that we generated. To allow the reproducibility of this work, we provide links to the original network datasets, processed datasets, and feature matrices learned by DeepGL and GraphSAGE in https://takanori-fujiwara.github.io/s/cnrl/.
We have implemented the cNRL architecture with Python 3. The implemented cNRL architecture allows the user to apply any NRL and CL methods that provide “fit” and “transform” methods (as similar to machine learning methods supported in scikit-learn^{4}^{4}4scikit-learn, https://scikit-learn.org/, accessed 2020-2-10.). For the implementation of i-cNRL, we have integrated DeepGL and cPCA into the cNRL architecture. Because there is no implementation of DeepGL available from Python^{5}^{5}5Implementation using Java with Neo4j database is available from https://github.com/neo4j-graph-analytics/ml-models, accessed 2020-2-10., we have implemented DeepGL with graph-tool^{6}^{6}6graph-tool, https://graph-tool.skewed.de/, accessed 2020-2-10.. For cPCA, we have modified the implementation available online^{7}^{7}7ccPCA, https://github.com/takanori-fujiwara/ccpca, accessed 2020-2-10. to add the automatic contrastive automatic selection described in Sec. 5.2.3.
The source code for generating the experimental results is available in https://takanori-fujiwara.github.io/s/cnrl/.
Because DeepGL is introduced as a comprehensive inductive NRL framework, there are multiple settings we can adjust. The terminologies used here are the same as (rossi2018deep). Refer to (rossi2018deep) for those not explained in this paper (indicated with italic fonts below). For all the cNRL we performed, we have used DeepGL with and the logarithmic binning to transform feature values with as the transformation parameter, but without the feature diffusion. For the other settings, generally, we have used as many different relational feature operators and base features as possible for each network dataset. As for the relational feature operators, for directed networks, we have used all the combinations of with (i.e., 12 operators in total). For undirected networks, we have used where . As for the base feature , we have used all centralities and measures available in graph-tool. However, for each network, some of these features have produced ‘NaN’ values (e.g., closeness). In that case, we have excluded such features from the base features. Table 6 shows the base features we used for each analysis. Additionally, for scoring and pruning of the learned , we have applied the same method used in (rossi2018deep) with the tolerance/feature similarity threshold, . As becomes larger, the number of features learned by NRL (i.e., ) increases. We have set a different value for each analysis, as listed in Table 6. In general, for the undirected networks, we have used relatively higher values () because the number of base features used is smaller when compared with the directed networks.
Dolphin | Karate | {total-degree, betweenness, closeness, eigenvector, PageRank, Katz} | 0.7 |
Price | Random | {in-degree, out-degree, total-degree, PageRank, betweenness, Katz, -core} | 0.3 |
Random | Price | {in-degree, out-degree, total-degree, PageRank, betweenness, Katz, -core} | 0.3 |
p2p-Gnutella08 | Price 2 | {in-degree, out-degree, total-degree, PageRank, betweenness, Katz, -core} | 0.5 |
p2p-Gnutella08 | Enhanced Price | {in-degree, out-degree, total-degree, PageRank, betweenness, Katz, -core} | 0.5 |
LC-multiple | Combined-AP/MS | {total-degree, betweenness, eigenvector, PageRank, Katz} | 0.7 |
School-Day2 | School-Day1 | {gender, total-degree, closeness, betweenness, eigenvector, PageRank, Katz} | 0.7 |
For all results, we have used cPCA with the automatic contrastive parameter selection and default settings. That is, we have applied the standardization to each of and for both learning and projection and the automatic contrastive parameter selection with .
The full sets of cPC loadings obtained with i-cNRL for each analysis in Sec. 7.1 and Sec. 7.2 are listed in Table 7-10.
relational function | base feature | cPC 1 | cPC 2 |
---|---|---|---|
: Price, : Random | |||
in-degree | -0.19 | -0.06 | |
out-degree | -0.40 | -0.00 | |
total-degree | 0.55 | 0.00 | |
PageRank | 0.00 | -0.00 | |
betweenness | -0.00 | 0.00 | |
Katz | -0.19 | 0.06 | |
-core | -0.00 | -0.00 | |
in-degree | 0.01 | -0.00 | |
in-degree | 0.00 | 0.00 | |
: Random, : Price | |||
in-degree | -0.10 | -0.25 | |
out-degree | 0.01 | -0.02 | |
total-degree | 0.18 | 0.47 | |
PageRank | 0.02 | 0.01 | |
betweenness | -0.01 | -0.00 | |
Katz | -0.09 | -0.24 | |
-core | 1.00 | -0.13 | |
in-degree | 0.00 | 0.00 | |
in-degree | -0.00 | 0.00 | |
relational function | base feature | cPC 1 | cPC 2 |
---|---|---|---|
: p2p-Gnutella08 , : Price 2 | |||
in-degree | -0.12 | -0.17 | |
out-degree | 0.04 | -0.00 | |
total-degree | 0.22 | 0.30 | |
PageRank | 0.04 | 0.00 | |
betweenness | -0.00 | -0.00 | |
Katz | -0.11 | -0.13 | |
-core | 1.01 | -0.10 | |
in-degree | -0.00 | 0.00 | |
out-degree | 0.00 | 0.00 | |
betweenness | 0.00 | -0.00 | |
out-degree | -0.00 | -0.00 | |
in-degree | -0.00 | -0.00 | |
out-degree | 0.00 | -0.00 | |
out-degree | 0.00 | 0.00 | |
: p2p-Gnutella08 , : Enhanced Price | |||
in-degree | 0.12 | 0.05 | |
out-degree | 0.05 | -0.00 | |
total-degree | -0.23 | 0.00 | |
PageRank | -0.00 | 0.00 | |
betweenness | 0.00 | -0.00 | |
Katz | 0.10 | -0.05 | |
-core | 0.00 | -0.00 | |
in-degree | -0.00 | 0.00 | |
out-degree | 0.00 | -0.00 | |
betweenness | -0.00 | 0.00 | |
out-degree | 0.00 | 0.00 | |
in-degree | -0.00 | -0.00 | |
out-degree | 0.00 | -0.00 | |
out-degree | 0.00 | 0.00 | |
relational function | base feature | cPC 1 | cPC 2 |
---|---|---|---|
total-degree | -0.02 | 0.13 | |
betweenness | -0.00 | -0.00 | |
eigenvector | 0.01 | 0.12 | |
PageRank | 0.01 | 0.00 | |
Katz | -0.00 | -0.24 | |
total-degree | -0.14 | 0.02 | |
betweenness | 0.00 | 0.00 | |
eigenvector | -0.19 | -0.01 | |
PageRank | -0.00 | -0.00 | |
Katz | 0.36 | 0.00 | |
betweenness | 0.00 | 0.00 | |
PageRank | -0.01 | 0.00 | |
total-degree | -0.03 | -0.01 | |
betweenness | 0.00 | -0.00 | |
PageRank | -0.01 | -0.01 | |
betweenness | 0.00 | 0.00 | |
PageRank | 0.01 | 0.01 | |
betweenness | -0.00 | -0.00 | |
PageRank | 0.00 | -0.00 | |
relational function | base feature | cPC 1 | cPC 2 |
---|---|---|---|
total-degree | 0.05 | 0.03 | |
closeness | -0.01 | 0.00 | |
betweenness | -0.01 | 0.00 | |
eigenvector | -0.06 | 0.02 | |
PageRank | 0.00 | -0.04 | |
Katz | 0.02 | -0.02 | |
gender | -0.01 | -0.00 | |
total-degree | 0.07 | 0.01 | |
betweenness | -0.02 | -0.01 | |
gender | -0.01 | -0.00 | |
gender | 0.01 | 0.00 | |
total-degree | -0.01 | 0.03 | |
closeness | 0.03 | -0.00 | |
betweenness | -0.02 | -0.00 | |
eigenvector | -0.03 | 0.01 | |
PageRank | 0.03 | -0.01 | |
Katz | -0.01 | -0.02 | |
gender | -0.00 | 0.00 | |
gender | 0.01 | 0.00 | |
total-degree | -0.06 | -0.01 | |
betweenness | 0.04 | 0.00 | |
gender | 0.02 | 0.01 | |
total-degree | -0.05 | 0.04 | |
closeness | 0.11 | -0.04 | |
betweenness | -0.09 | -0.01 | |
eigenvector | -0.08 | 0.03 | |
PageRank | 0.15 | 0.02 | |
Katz | -0.06 | -0.05 | |
gender | 0.00 | -0.00 | |
betweenness | -0.00 | 0.00 | |
gender | 0.00 | 0.00 | |
gender | 0.00 | 0.00 | |
gender | 0.00 | 0.00 | |
We have used the Gilbert’s and Price’s network models to generate Random (N3), Price (N4), and Price 2 (N6) in Table 1. Also, in Sec. 7.2.1, we have introduced the enhanced Price’s network model as the solution to generate a network of which nodes have different -core numbers—Enhanced Price (N7) in Table 1. In the following, we explain the details of the parameters we used for the network generation and the enhanced Price’s model.
The Gilbert’s model generating a random network requires the fixed probability of a connection of each pair of nodes. We have set the probability to for generating Random (ID 4). The Price’s model requires the fixed number of out-degree of newly added nodes as its parameter. We have set this parameter to 3 for both Price (N4) and Price 2(N6).
For the enhanced Price’s model, we modify the Price’s model to be able to generate nodes with various the -core numbers. To achieve this, in the enhanced Price’s model, we allow the user to set multiple positive integer numbers of out-degree of newly added nodes. We denote this input as where is the length of the input. To select one number from when a new node is added, we need to set the probability of selecting each number. We denote the probabilities as where .
To generate Enhanced Price (ID 8), we have set these parameters to {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and {0.3, 0.25, 0.15, 0.1, 0.075, 0.05, 0.025, 0.025, 0.0125, 0.0125}.
We describe the detailed settings and parameters of GraphSAGE and cVAE used in Sec. 7.3. We have used the source code provided by the authors of GraphSAGE^{8}^{8}8GraphSAGE: https://github.com/williamleif/GraphSAGE, accessed 2020-2-10. and cVAE^{9}^{9}9Contrastive VAE: https://github.com/abidlabs/contrastive_vae, accessed 2020-2-10.. For GraphSAGE, we have used the unsupervised model graphsage_maxpool with 24 as the number of features leaned (i.e., dim_1=12 and dim_2=12) while we have followed the default values for other parameters (e.g., learning_rate=0.00001 and model_size=‘small’). We have used cVAE with the default parameters (i.e., intermediate_dim = 12, latent_dim = 2, batch_size = 64, and epochs = 500).
Fig. 9 shows transitions of value during the automatic selection in i-cNRL. For all the experiments, we can see that reaches the convergence before 10 iterations.