1. Introduction
Networks are commonly used to model various types of relationships in realworld applications, such as social networks (crnovrsanin2014visualization), cellular networks (chen2004content), and communication networks (bhanot2005optimizing). Comparative analysis of networks is an essential task in practice, where we want to identify differentiating factors between two networks or the uniqueness of one network compared to another (emmert2016fifty; tantardini2019comparing). For instance, when a neuroscientist is studying the effect of Alzheimer’s disease on a human brain (gaiteri2016genetic), they want to compare the brain network of a patient with Alzheimer’s disease to that of a healthy subject. Also, for collaboration networks of researchers in different fields (lariviere2006canadian), an analyst in a funding agency may want to discover any unique ways of collaborations in the fields for decision making.
Several approaches have been proposed for network comparison (tantardini2019comparing). When two different networks have the same nodeset and the pairwise correspondence between nodes is known, we can compute a similarity between two networks (e.g., a Euclidean distance between two adjacency matrices). When the nodecorrespondence is unknown or does not exist, a networkstatistics based approach is commonly used (e.g., the clustering coefficient, network diameter, or node degree distribution). Another popular approach is using graphlets (tantardini2019comparing)—small, connected, and nonisomorphic subgraph patterns in a graph (e.g., the complete graph of three nodes). The similarities of two networks can be characterized by comparing the frequency of appearance of each graphlet in each network.
While the existing approaches can provide a (dis)similarity between different networks, they compare networks only based on one selected measure (e.g., node degree), which is often insufficient. Also, these approaches only provide networklevel similarities, and thus cannot compare networks in more detailed levels (e.g., a nodelevel). Without such a detailedlevel comparison, it is difficult to find which part of a network relates to its uniqueness.
To address these challenges, we introduce a new approach that integrates the concept of contrastive learning (CL) (zou2013contrastive; abid2018exploring) together with network representation learning (NRL), which we call cNRL. Within cNRL, the NRL enables the characterization of networks with comprehensive measures without overwhelming a user with information by embedding nodes into a lowdimensional space; the CL allows for discovering unique patterns in one dataset relative to another (abid2018exploring). By leveraging the benefits of both, we can reveal unique patterns in one network by contrasting with another, in a thorough (i.e., using multiple essential measures to capture the network characteristics) and detailed (i.e., analyzing a node or subnetwork level) manner.
With our approach, we consider the generality and interpretability of cNRL, and contribute a method called icNRL. First, icNRL is designed not to require nodecorrespondences or network alignment (emmert2016fifty), and thus is applicable to various networks. Also, unlike many other NRL methods (e.g., node2vec (grover2016node2vec)
and graph neural networks (GNNs)
(zhang2018deep)), icNRL offers interpretability (adadi2018peeking), providing information about the meaning of an identified pattern and the reason why that pattern can be seen in only that network.In summary, our main contributions include:

A new approach, called contrastive network representation learning (cNRL), which aims to reveal unique patterns in one network relative to another network.

A method exemplifying cNRL, called icNRL, which (1) offers general applicability, including networks without nodecorrespondence or network alignment, (2) provides interpretability for helping understand revealed patterns, and (3) equips automatic hyperparameter selection for CL.

Experiments with multiple network models and realworld datasets, which demonstrate the capability of comparative network analysis.

Quantitative and qualitative comparisons with other potential designs of cNRL methods.
2. Problem Definition
We here define the problem to be addressed by contrastive network representation learning. Given two different networks, a target network and a background network , we want to seek unique patterns in relative to . Similar to contrastive learning (zou2013contrastive), the unique patterns can be represented as relationships (zou2013contrastive) (e.g., the structural differences among network nodes) that appear in but do not appear in .
For example, when finding unique patterns in a scalefree network (i.e., its nodedegree distribution follows a power law (barabasi2016network)) relative to a random network
(i.e., each node pair is connected with a fixed probability
(barabasi2016network)), we should be able to capture the unique patterns related to node degrees since has more variety in node degrees. For practical usage, the unique patterns could relate to more complicated centralities, measures, combinations of them, and many more. Note that, as with the existing work of contrastive learning (zou2013contrastive; abid2018exploring), cNRL does not aim to discriminate from , but to identify unique patterns in .with ‘eigenvector’ as the base feature
(see Table 2).3. Analysis Example
To provide an illustrative example of analysis with cNRL, we begin by comparing two different social networks. We use the Dolphin social network (lusseau2003bottlenose) as and the Zachary’s karate club network (zachary1977information) as . Fig. 1LABEL:sub@fig:dolphin and LABEL:sub@fig:karate depict the network structures of these networks. The statistics of these networks can be found in Table 1 (see N1 and N2). By comparing these two networks, we want to reveal unique patterns in the Dolphin social network and identify which network characteristics relate to the patterns.
We apply our icNRL to the two networks and then plot a 2D embedding result with contrastive PCA (cPCA) (abid2018exploring), as shown in Fig. 1LABEL:sub@fig:dolphin_karate_cpca. The  and directions in Fig. 1LABEL:sub@fig:dolphin_karate_cpca represent the first and second contrastive principal components (cPCs), respectively. Details of icNRL and related techniques will be described in Sec. 5. Fig. 1LABEL:sub@fig:dolphin_karate_cpca shows that the nodes in are more widely distributed, whereas the nodes in are placed only around the center, which reveals some patterns specific to when compared with .
Moreover, since icNRL offers interpretability to the learned results, we can analyze why the above patterns appear. As shown in Table 2, the method provides contrastive principal component (cPC) loadings (fujiwara2020supporting), of which the absolute value indicates how large each learned feature contributes to each cPC direction. Each learned feature can be represented as a combination of the relational function and the base feature (rossi2018deep) (see Sec. 5 for details). Table 2 indicates that feature F110 has the highest contribution to cPC1. From the relational function and the base feature ‘eigenvector’ (newman2018networks), this feature is interpreted as “the mean eigenvector centrality of the neighbors of a node.”
ID  Name  # of nodes  # of links  Directed 
N1  Dolphin (lusseau2003bottlenose)  62  159  False 
N2  Karate (zachary1977information)  34  78  False 
N3  Random  100  471  True 
N4  Price  100  294  True 
N5  p2pGnutella08 (leskovec2007graph; ripeanu2002mapping)  6,301  20,777  True 
N6  Price 2  6,301  18,897  True 
N7  Enhanced Price  6,301  18,281  True 
N8  CombinedAP/MS (collins2007toward; yu2008high)  1,622  9,070  False 
N9  LCmultiple (reguly2006comprehensive; yu2008high)  1,536  2,925  False 
N10  SchoolDay1 (stehle2011high)  236  5,899  False 
N11  SchoolDay2 (stehle2011high)  238  5,539  False 
ID  relational function  base feature  cPC1  cPC2 
F11  totaldegree  0.00  0.02  
F12  betweenness  0.00  0.00  
F13  closeness  0.00  0.00  
F14  eigenvector  0.04  0.00  
F15  PageRank  0.04  0.04  
F16  Katz  0.00  0.02  
F17  totaldegree  0.06  0.08  
F18  betweenness  0.05  0.01  
F19  closeness  0.08  0.01  
F110  eigenvector  0.26  0.02  
F111  PageRank  0.11  0.15  
F112  Katz  0.08  0.09  
F113  PageRank  0.01  0.00  
F114  totaldegree  0.06  0.00  
F115  PageRank  0.01  0.00  
To investigate the relationships between this feature and the icNRL result, we colorcode the network nodes in Fig. 1LABEL:sub@fig:dolphin, LABEL:sub@fig:karate, and LABEL:sub@fig:dolphin_karate_cpca based on the feature values, as shown in Fig. 1LABEL:sub@fig:dolphin_colored, LABEL:sub@fig:karate_colored, and LABEL:sub@fig:dolphin_karate_cpca_colored. We can see that, in Fig. 1LABEL:sub@fig:dolphin_karate_cpca_colored, the nodes around the topleft corner tend to have smaller feature values while the nodes around the bottomright tend to have higher values. By comparing with Fig. 1LABEL:sub@fig:dolphin_colored, we notice that these two node groups correspond to the topleft and bottomright communities in Fig. 1LABEL:sub@fig:dolphin_colored. Since the feature value shows the mean eigenvector centrality of the neighbors of a node, the nodes in the topleft community tend to have a low eigenvector centrality including their neighbors. On the other hand, the nodes in the rightbottom community have neighbors with a high eigenvector centrality. Fig. 1LABEL:sub@fig:karate_colored indicates that does not have such clearly separated communities by the feature values, unlike . Therefore, icNRL learns the patterns highly related to the eigenvector centralities of each node’s neighbors, which can clearly separate the two communities in the Dolphin social network.
Notations for CNRL  

,  target and background networks 
,  adjacency matrices of and 
,  matrices of node attributes of and 
,  numbers of nodes in and 
,  numbers of attributes in and 
,  numbers of edges in and 
,  numbers of features learned by NRL and CL 
,  target and background feature matrices 
projection matrix learned by CL  
,  contrastive representations of and 
Notations for DeepGL  
base feature (e.g., indegree)  
relational function  
, ,  relational feature operators for in, out, total neighbors 
summary measure (e.g., mean, sum, and maximum)  
set of learned features with relational feature operators  
set of learned features:  
maximum numbers of relational feature operators to use  
Notations for cPCA  
,  covariance matrices 
contrastive parameter  
4. cNRL Architecture
Fig. 2 shows a general architecture for cNRL. Notations used for the following sections are listed in Table 3. The current CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) require target and background feature matrices ( and ) sharing the same features as inputs. However, matrices that represent target and background networks ( and ) such as adjacency matrices ( and ) might have a different number of nodes or no correspondence in nodes of and . Thus, we cannot directly apply the CL methods to target and background networks ( and ). To address this issue, our core idea of cNRL consists of two main steps: (1) generating feature matrices and from networks and respectively by using NRL, and (2) applying CL on and .
Below we describe the details of each part of the cNRL architecture with requirements on inputs, NRL, and cNRL algorithms. Here we focus only on node feature learning to provide a simple and clear explanation. However, the architecture is generic enough to be used for link (or edge) feature learning.
Inputs. cNRL takes and as inputs. These networks can be any combination of being undirected or directed, unweighted or weighted, and nonattributed or attributed. The numbers of and nodes (i.e., and ) do not have to be the same. Similarly, the numbers of attributes and may be different.
Network representation learning. The first step in Fig. 2 is applying an NRL method in order to transform the inputs and to feature matrices and , respectively. CL requires that and share the same features by nature of its learning purpose. Therefore, for this process, we need to use an NRL method that can produce the same features across networks.
Contrastive learning. Once we obtain and , which have the same learned features, we can apply any of the CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) using and as target and background datasets, respectively. CL generates a parametric mapping (or a projection matrix ) from features learned by NRL to contrastive features (). With this projection matrix, and can be transformed to contrastive representations and , respectively. As the existing CL works (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised) only produced for their analysis, the generation of is optional. However, as demonstrated in Fig. 1LABEL:sub@fig:dolphin_karate_cpca, by visualizing both and in one plot, we can clearly see whether CL has found unique patterns in relative to .
5. Interpretable cNRL Method
As a specific method using the architecture above, we describe icNRL, which employs DeepGL (rossi2018deep) for NRL and cPCA (abid2018exploring) for CL, with the design rationale for the selection of these algorithms.
5.1. Network Representation Learning
As stated in Sec. 4, NRL needs to generate and , which have the same features. To achieve this, we can employ any inductive NRL method (rossi2018deep) (e.g., GraphSAGE (hamilton2017inductive) and FastGCN (chen2018fastgcn)). However, we want to provide the interpretability in the contrastive representations obtained by cNRL; thus, an NRL method needs to generate interpretable features as the learned result. As a result, we specifically use DeepGL (rossi2018deep) in the first step of icNRL.
5.1.1. DeepGL
The method learns node and link features consisting of the base feature and relational function . For a concise explanation, we describe DeepGL for only node feature learning.
A base feature is any simple feature or measure we can obtain for each node. For example, can be (weighted) in, out, totaldegree, degeneracy (or core numbers) (newman2018networks), PageRank (newman2018networks), or a node attribute (e.g., gender of a node in a social network).
A relational function is a combination of relational feature operators, which is applied to a base feature. A relational feature operator summarizes base feature values of onehop neighbors of a node. For example, the operator can be a computation of the mean, sum, maximum base feature values of onehop neighbors’ of a node. Also, the neighbors can be either in, out, totalneighbors. Together with the summary measure (e.g., mean), the operators can be denoted , , and , respectively. For example, computes the mean of the inneighbors of a node. Moreover, the relational feature operator can be applied repeatedly. For example, first computes the maximum of inneighbors for each outneighbor of a node and then produces the mean of these maximum values. As described with the examples above, and are combinations of simple measures and operators; thus, both are interpretable.
In DeepGL, we can select as many different base features and relational feature operators as we want to consider. The learning process contains number of iterations (indicated by the user), and in the end we obtain all the learned features , each of which is a relational function over a base feature . During each iteration, DeepGL prunes redundant features based on the similarities of the obtained feature values. Table 2 shows an example of learned features from the Dolphin social network (lusseau2003bottlenose).
5.1.2. Use of Transfer Learning with DeepGL for cNRL
As described above, the learned features by DeepGL are the combinations of the base features and relational functions. Once we obtain from one network, we can naturally compute
for other networks. That is, DeepGL is inductive and can be used for transfer learning
(rossi2018deep).In cNRL, we need to decide which network(s), and/or , should be used for learning . One possible choice is applying DeepGL for both to learn the features and . Then, we can use the union of these features (i.e., ) for producing feature matrices and . Since cNRL aims to identify unique patterns in relative to , only a set of features capturing ’s characteristics is required. Thus, we apply DeepGL to and use the learned features for both and to generate and . It can also avoid unnecessary computation for learning from .
5.2. Contrastive Learning
The above NRL step generates feature matrices and . The remaining step is learning contrastive representations and through CL. While we can use any CL method, one of our goals is to provide interpretability. Since DeepGL generates interpretable features for and , we can provide interpretable and by using a method that reveals interpretable relationships between features learned by NLR and features learned by CL. Among current CL methods (zou2013contrastive; abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised), only contrastive PCA (cPCA) (abid2018exploring) can provide such relationships by utilizing the linearity of its algorithm in a similar manner to ordinary PCA (jolliffe1986principal). Thus, we select cPCA for the second step of icNRL, though, it can be replaced with any other interpretable CL methods developed in the future.
5.2.1. Contrastive PCA (cPCA)
cPCA (abid2018exploring) is a variant of PCA for CL. Similar to the classical PCA (jolliffe1986principal), cPCA first applies centering to each feature of and and then obtains their corresponding covariance matrices and . Let
be any unit vector of
length. Then, with a given direction, the variances for
and can be written as: , . The optimization that finds a direction where has high variance but has low variance can thus be written as:(1) 
where is a contrast parameter . Similar to the classical PCA, we can obtain top cPCs as the learned features. With projection matrix consisting of cPCs (i.e., is a matrix), we can obtain the contrastive representation of .
The above contrast parameter controls the tradeoff between having high target variance and low background variance. When , cPCs only maximize the variance of , the same as those in the classical PCA. As increases, cPCs place greater emphasis on directions that reduce the variance of . Fig. 3 shows the results of cPCA with different values. Because has a strong impact on the result, Abid and Zhang et al. (abid2018exploring) introduced the semiautomatic selection of
utilizing spectral clustering
(ng2002spectral). We go one step further to provide a fully automatic selection of (see Sec. 5.2.3).5.2.2. Representation Learning with cPCA in cNRL
By applying cPCA to and , we can generate the projection matrix and contrastive representations and . Because each learned feature by DeepGL could have a different scale, as a default, our method applies the standardization to each of and for both learning and projection.
To provide interpretable relationships between NLR features and CL features , we compute contrastive PC loadings (cPC loadings) as introduced in (fujiwara2020supporting). These cPC loadings indicate how strongly each of the input features contributes to the corresponding cPC. Table 2 shows an example of cPC loadings for the first and second cPCs. As demonstrated in Sec. 3, by referring to a list of the learned features via NRL and cPC loadings, we can interpret the obtained representations and .
5.2.3. Automatic Contrastive Parameter Selection
We now show how to automatically select the parameter in cPCA. Since we want to maximize the variation in the target feature matrix while simultaneously minimizing the variation in the background feature matrix, we can solve the following ratio problem:
(2) 
While directly solving (2) may be difficult, there is a convenient iterative algorithm due to Dinkelbach67. The algorithm consists of two steps. Given , we perform


.
Clearly, is just the objective value of our ratio problem (2) evaluated at the current solution . It is easy to show that monotonically increases to the maximum value, and the convergence is usually very quick (e.g., less than 10 iterations). Conveniently, the second step for finding the next solution is just the original cPCA problem, where we use as our tradeoff parameter. We can also regard cPCA as a (crude) oneshot algorithm for the ratio problem (2) where the user specifies . One problem of the method above is that reaches close to infinite when is nearly singular. To avoid this, our method simply adds a small constant value , as a default , to each diagonal element of . We note that the above algorithm of Dinkelbach67 has been used in discriminant analysis (GuoLYSW03; JiaNZ09), whose motivation is entirely different from ours.
5.3. Complexity Analysis
The time and space complexities of icNRL are comparable to those of DeepGL and cPCA. DeepGL’s time and space complexities for learning from are and , respectively, where is the number of links in . Note that the time and space complexities for computing base features are assumed lower than these. When including the transfer learning step to obtain , the space complexity becomes . For a fixed , cPCA has the similar time and space complexities with PCA, which are and . Even with the automatic selection of in Sec. 5.2.3, we can assume that these complexities stay the same. This is because the automatic selection usually only needs a small number of iterations (e.g., less than 10) and does not require storing of additional information. Thus, in total, icNRL has the time complexity and the space complexity . However, in practice, , the number of features learned by NRL, should be much smaller than the numbers of nodes and links of and . Under this assumption, the time and space complexities are and , respectively. This indicates that the computational cost is largely due to DeepGL.
6. Related Work
To the best of our knowledge, our work is the first to propose contrastive learning for networks and provide a general and interpretable method under this approach. There exists little work in the exact area. Thus, we here review typical NRL and CL techniques.
6.1. Network Representation Learning (NRL)
Various NRL methods have been developed for learning latent representations of network nodes and/or links. For a comprehensive description of NRL methods, refer to the recent survey papers, such as (cai2018comprehensive; zhang2018deep). Here we focus on describing the closely related work using inductive and crossnetwork embedding methods.
6.1.1. Inductive NRL
GraphSAGE (hamilton2017inductive) is an inductive NRL method that share many similar ideas with DeepGL (rossi2018deep). Analogous to the relational functions in DeepGL, GraphSAGE learns aggregator functions
. However, GraphSAGE proposes more complex aggregators using LSTM and maxpooling concepts, compared to DeepGL’s simple aggregators (e.g., mean). Moreover, GraphSAGE tunes parameters required by the aggregators and matrices that decide the weight for each learned feature, instead of the feature pruning in DeepGL. These differences might enable GraphSAGE to better capture complex characteristics of networks without manual parameter tuning; however, the learned features might be difficult to interpret. FastGCN
(chen2018fastgcn) takes a similar approach to GraphSAGE except that FastGCN employs node sampling to save memory space. Also, HetGNN (zhang2019heterogeneous) enhances the aggregators to learn representations of heterogeneous networks. Thus, these methods, including other GNN variants (zhang2018deep) (e.g., GAT (velivckovic2017graph) and h/cGAO (gao2019graph)), still suffer from lack of interpretability in the learned features. Although GNNExplainer (ying2019gnnexplainer) aims to provide interpretable explanations for predictions made by these methods, it does not support explaining the learned features themselves.6.1.2. CrossNetwork Embedding
The inductive methods learn the features that can be generalized for unobserved nodes or other networks from one input network. On the contrary, the crossnetwork methods generate embeddings directly from multiple input networks. Most of the crossnetwork methods focus on finding similarities of nodes across networks, such as for node classification (shen2019network), network similarity calculation (ma2019deep), and network alignment (heimann2018regal). While CrossMVA (chu2019cross) is developed mainly for network alignment, it can produce embeddings that contain both similarity and dissimilarity information. However, a major drawback of CrossMVA is that anchor nodes are necessary as inputs (i.e., at least we need to know a small portion of nodecorrespondence), which we cannot obtain in many cases (e.g., the example in Sec. 3). Also, CrossMVA’s embeddings of the dissimilarity information only preserve discriminative structures across networks; as a result, it cannot find unique patterns in a specific network.
6.2. Contrastive Learning (CL)
Unlike discriminant analysis, such as linear discriminant analysis (JiaNZ09), which aims to discriminate data points based on their classes, CL (zou2013contrastive) focuses on finding patterns which contrast one dataset with another (abid2018exploring)
. Several extended CL machine learning methods have been developed. For example, there are contrastive versions of latent Dirichlet allocation
(zou2013contrastive)(zou2013contrastive), and regressions (ge2016rich). More recently, including cPCA (abid2018exploring), CL methods for representation learning have been introduced (abid2018exploring; dirie2019contrastive; abid2019contrastive; severson2019unsupervised). For example, Dirie et al. (dirie2019contrastive)proposed contrastive multivariate singular spectrum analysis (cMSSA) for decomposition of timeseries data. Similar to cPCA, cMSSA could provide the interpretability by computing the PC loadings; however, cMSSA is not suitable for our case that handles nontime series data. On the other hand, contrastive variational autoencoder (cVAE)
(abid2019contrastive; severson2019unsupervised) can be used as a CL method in cNRL. The strength of cVAE over cPCA is that it can find unique patterns in a target dataset even when its data points and latent features have nonlinear relationships. However, cVAE relies on multiple layers of neural networks (NNs), and thus the results of cVAE are difficult to interpret as similar to other NNbased methods. Therefore, to use cVAE for interpretable cNRL, we need additional effort to help interpret the results.ID  relational function  base feature  cPC 1  cPC 2 
: Price, : Random (Sec. 7.1)  
F21  totaldegree  0.55  0.00  
F21  outdegree  0.40  0.00  
F23  Katz  0.19  0.06  
: Random, : Price (Sec. 7.1)  
F31  core  1.00  0.13  
F32  totaldegree  0.18  0.47  
F33  indegree  0.10  0.25  
: p2pGnutella08 , : Price 2 (Sec. 7.2.1)  
F41  core  1.01  0.10  
F42  totaldegree  0.22  0.30  
F43  indegree  0.12  0.17  
: p2pGnutella08 , : Enhanced Price (Sec. 7.2.1)  
F51  totaldegree  0.23  0.00  
F52  indegree  0.12  0.05  
F53  Katz  0.10  0.05  
: LCmultiple , : CombinedAP/MS (Sec. 7.2.2)  
F61  Katz  0.36  0.00  
F62  eigenvector  0.19  0.01  
F63  totaldegree  0.14  0.02  
: SchoolDay2 , : SchoolDay1 (Sec. 7.2.3)  
F71  PageRank  0.15  0.02  
F72  closeness  0.11  0.04  
F73  betweenness  0.09  0.01  
7. Experimental Evaluation
In the previous sections, we have introduced the concepts of cNRL and icNRL, as well as the related work. We have also demonstrated the effectiveness of icNRL in comparing social networks in Sec. 3. To further evaluate the method, we first test icNRL with synthetic datasets that are generated with popular network models. Then, we demonstrate several analysis examples using icNRL with publicly available realworld datasets (see Table 1). Lastly, we provide quantitative and qualitative comparisons among icNRL and other potential cNRL implementations. In each subsection, we list only the information closely related to our findings. Details of learning parameters and results are provided in Appendix C.
7.1. Evaluation with Network Models
We apply icNRL to compare two types of synthetic networks: random and scalefree networks (N3 and N4 in Table 1). We generate the random and scalefree networks with the Gilbert’s random graph (barabasi2016network) and the Price’s preferential attachment models (newman2018networks), respectively. We produce two 2D embedding results, using one network as and the other as (Fig. 4LABEL:sub@fig:ba_rand_cpca and LABEL:sub@fig:rand_ba_cpca). Each of the results shows unique patterns in . The cPC loadings in Table 4 show that the Price network’s unique patterns are related to the degree centralities (e.g., totaldegree). This seems to due to the fact that most nodes have the same number of links in a random network while a scalefree network contains hubs with a large number of links. In contrast, we can see that the random network’s uniqueness is mostly related to core numbers. This is because the Price’s model generates a network by adding a new node and then connecting it to other fixed number of nodes (e.g., 3 nodes) which are selected with a certain computed probability. As a result, all nodes in the network have the same core numbers (e.g., 3core).
7.2. Case Studies
7.2.1. Case Study 1: Network Model Refinement
Designing a network model that can simulate realworld networks is fundamental to understand network formation mechanisms, to perform hypothetical analyses (e.g., if there will be growth of the number of nodes, what will happen?), to generate more available datasets for machine learning, and so on (goldenberg2010survey). In this case study, we demonstrate the usage of icNRL to guide a refinement of network models.
Here, we use a peertopeer (P2P) network, specifically the Gnutella peertopeer file sharing network (ripeanu2002mapping) available in SNAP^{1}^{1}1SNAP, https://snap.stanford.edu/, accessed: 2019211 (N5 in Table 1) as a modeling subject. Once we have a P2P network generation model, we can use it for analyzing network robustness, studying effective searching strategies on a P2P network (liu2009efficient), etc.
P2P networks are often scalefree (liu2009efficient), so we use the Price’s model (newman2018networks) to mimic a P2P network. To identify the characteristics that the Price’s model does not simulate well, we set the P2P network (N5) as and the Price network (N6) as .
The result is shown in Fig. 5LABEL:sub@fig:p2p_price_cpca. From the cPC loadings in Table 4, we notice that the core number (F41) has a strong contribution to cPC1. Thus, we colorcode the result based on the core number, as shown in Fig. 5LABEL:sub@fig:p2p_price_cpca_colored. We can clearly see that the P2P network has variations in the core number, but the Price network does not. Because the core number indicates that a node at least connects to other nodes, the Price network makes a significant difference in the network robustness from the P2P network.
From the result above, we decide to refine the Price model to generate various core numbers. As discussed in Sec. 7.1, the problem comes from the fact that the Price’s model always adds a new node with a fixed number of links. Similar to the dualBarabásiAlbert model (moshiri2018dual)
, we can avoid the problem by attaching a new node to a variable number of links according to a probability distribution. Specifically, we set the model to select the number of links from 1 to 10 with specified probabilities (for details, refer to
Sec. C.3.2). Then, we generate a network with this model, which is referred to as the Enhanced Price (N7) network in Table 1. Next, we apply icNRL to the P2P (as ) and enhanced Price (as ) networks. The resultant cPC loadings are listed in Table 4. While seems to still have the uniqueness in degree centralities, it does not in the core number. By iteratively performing refinement procedures such as the one above, we can build a better network model to simulate realworld networks.7.2.2. Case Study 2: Comparison of Two Networks
In this case study, we compare “interactome” networks—networks of physical DNA, RNA, and proteinprotein interactions (yu2008high). Specifically, we compare two interactome networks, CombinedAP/MS (N8 in Table 1) and LCmultiple (N9), available in CCSB Interactome Database^{2}^{2}2CCSB Interactome Database, http://interactome.dfci.harvard.edu/, accessed: 2019128. Both networks represent the interactome of the yeast S. cerevisiae; however, they are obtained through different analysis approaches. CombinedAP/MS is generated from two studies using a “highthroughput” approach, specifically, affinity purification/mass spectrometry (AP/MS) (collins2007toward). In contrast, LCmultiple is the literaturecurated (LC) network from multiple “lowthroughput” experiments (yu2008high; reguly2006comprehensive). Because each analysis approach has its own strength in identifying the yeast’s interactions, the generated networks may vary (yu2008high). Comparing these networks is essential to understand the quality and characteristics of each approach (yu2008high).
Here we analyze the uniqueness in LCmultiple by using LCmultiple and CombinedAP/MS as and , respectively. The 2D embedding result by icNRL is shown in Fig. 6LABEL:sub@fig:lc_collins_cpca. We first notice that, in , there are two distinct regions: one spreading out towards the topleft and the other in the bottomright quadrant. To understand why this pattern appears, we obtain the cPC loadings (Table 4) and color the nodes based on values of the feature that has the top cPC loading for cPC1 (i.e., F61, : and : the Katz centrality). The result is shown in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored. We observe that either going to the left or right side along cPC1 tends to produce a high value of this feature, as annotated with the green and teal rectangles, respectively. While this feature has a strong positive loading for cPC1, another feature in Table 4—F62, : and : the eigenvector centrality—has a strong negative loading. Therefore, if a node has a higher value for F62, it tends to be placed on the more left side in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored. This indicates that the green rectangle region in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored seems to have high values for both of these features while the teal region has low values for the latter feature (F62). This could happen because the eigenvector centrality tends to be low when a node is in a weakly connected region (newman2018networks) while the Katz centrality is high whenever a node is linked by many others.
To visually observe the above patterns, we draw the network structures of and with SFDP (hu2005efficient) and then color them based on the values of F61 (Fig. 6LABEL:sub@fig:lc_colored and LABEL:sub@fig:collins_colored). We here only show the largest component (newman2018networks) of each network (i.e., the nodes connected with only several nodes are filtered out). Fig. 6LABEL:sub@fig:collins_colored shows that one strongly connected region around the center contains all nodes with high feature values. On the other hand, in Fig. 6LABEL:sub@fig:lc_colored, multiple regions contain nodes with high feature values. To further investigate this pattern, we select the nodes corresponding to the green and teal regions in Fig. 6LABEL:sub@fig:lc_collins_cpca_colored and then highlight these nodes in Fig. 6LABEL:sub@fig:lc_colored. Afterward, we zoom into the related regions of the highlighted nodes. Fig. 6LABEL:sub@fig:lc_colored1⃝ shows a region related to the nodes in the green rectangle, while Fig. 6LABEL:sub@fig:lc_colored2⃝ and 3⃝ are two example regions related to the teal rectangle region. We can see that the nodes in Fig. 6LABEL:sub@fig:lc_colored1⃝ are strongly connected, but not in Fig. 6LABEL:sub@fig:lc_colored2⃝ and 3⃝. From these observations, icNRL reveals that only has two different types of nodes linked to the high Katz centrality node(s) in either strongly or weakly connected region.
7.2.3. Case Study 3: Analysis of Network Changes
As an example of analyzing dynamic networks, we compare two different days of contact networks in a primary school^{3}^{3}3Available in SocioPatterns, http://www.sociopatterns.org/, accessed: 2019128 (stehle2011high). The networks represent facetoface contact patterns between students and teachers, which are collected with RFID devices. Information of the network at each day is listed in Table 1 (N10 and N11). Fig. 7LABEL:sub@fig:day2 and LABEL:sub@fig:day1 visualize the network structures drawn using SFDP. These networks also have multiple node attributes including genders, grades, and class names. In addition to multiple network centralities, we utilize the attribute information by including gender as the base feature, i.e., encoding ‘male’, ‘female’, and ‘unknown’ as 1, 1, and 0, respectively.
To analyze changes of contact patterns, we set the networks of the second day and the first day as and , respectively. Fig. 7LABEL:sub@fig:day2_day1_cpca shows the 2D embedding result. To interpret ’s unique patterns, we review the cPC loadings listed in Table 4 and color the nodes in Fig. 7LABEL:sub@fig:day2, LABEL:sub@fig:day1, and LABEL:sub@fig:day2_day1_cpca based on the learned feature F71—: , : PageRank. The results are shown in Fig. 7LABEL:sub@fig:day2_colored, LABEL:sub@fig:day1_colored, and LABEL:sub@fig:day2_day1_cpca_colored. We can see that icNRL discovers that has both strongly (colored with more yellow in Fig. 7LABEL:sub@fig:day2_colored and LABEL:sub@fig:day2_day1_cpca_colored) and weakly connected regions from others (colored with more purple), while all of ’s nodes have relatively strong connections between each other, as seen in the laidout result in Fig. 7LABEL:sub@fig:day1.
According to the study in (stehle2011high), the students tended to have more contact within the same class than between classes. To relate the class information and the found unique patterns, we colorcode the nodes (i.e., students) based on their class, as shown in Fig. 7LABEL:sub@fig:day2_class, LABEL:sub@fig:day1_class, and LABEL:sub@fig:day2_day1_cpca_class. From these results, we notice that icNRL well separates groups of students who have less (e.g., gray, pink, or teal nodes) and more (e.g., orange nodes) contact between classes in .
7.3. Comparison with Other Potential Designs
Our icNRL utilizes DeepGL and cPCA for cNRL’s two essential components, NRL and CL, to provide interpretable results. However, if the interpretability is not required, we can replace each of the learning methods with other alternatives. Here we compare three different designs for cNRL: (1) DeepGL & cPCA, (2) GraphSAGE (hamilton2017inductive) & cPCA, and (3) DeepGL & cVAE (abid2019contrastive; severson2019unsupervised).
7.3.1. Quantitative Results
Here we compare the quality of contrastive representations obtained with each design. A good contrastive representation should more widely distribute nodes in the target network than the background, and it should also show different patterns in the target and background networks. For example, as shown in Fig. 3, cPCA () provides a better contrastive representation than PCA (). To compare the aspects above, we use three different dissimilarity measures: dispersion ratio, Bhattacharyya distance (bi2017uncertainty), and KullbackLeibler (KL) divergence (wang2009divergence) from a set of nodes in to that in . The dispersion ratio represents how widely nodes in are scattered relative to . The Bhattacharyya distance indicates closeness or overlaps of nodes in and . The KL divergence of from shows the difference between their probability distributions of nodes. For all the above measures, the higher the value, the better the method design.
We calculate the dispersion ratio of to with: , where and are the scaled matrices of and obtained by applying the standardization to a concatenated matrix of and . We use and , instead of and , to avoid the scaling differences in the embedding’s axes across the three designs. For the Bhattacharyya distance and KL divergence, since we do not have the exact probability distributions of and
, we employ the estimation methods described in
(bi2017uncertainty) and (wang2009divergence).For GraphSAGE, we specifically select the GraphSAGEmaxpool model because it produces better results (hamilton2017inductive). We use the default parameter values in (hamilton2017inductive; abid2019contrastive) for GraphSAGE and cVAE, except that we set as the number of features leaned by GraphSAGE (see Sec. C.4). For the input features of GraphSAGE, we set the same base features used for DeepGL (see Table 6 for details). We obtain 2D embeddings with the cPCs (with cPCA) or salient latent variables (with cVAE) (abid2019contrastive). Since cVAE relies on the probabilistic encoders, the results could be different for each trial, and thus we compute the mean value of each measure for 10 trials.
Table 5 shows a comparison of the three methods on different networks using the measures above. We can see that in general DeepGL & cPCA and GraphSAGE & cPCA have better scores than DeepGL & cVAE. Between DeepGL & cPCA and GraphSAGE & cPCA, DeepGL & cPCA tends to provide better results except for the dolphin and Karate networks, which have small numbers of nodes.
dispersion ratio  Bhattacharyya  KL of from  
(lr)35 (lr)68 (lr)911  DG&  GS&  DG&  DG&  GS&  DG&  DG&  GS&  DG&  
cPCA  cPCA  cVAE  cPCA  cPCA  cVAE  cPCA  cPCA  cVAE  
Dolphin  Karate  174  9,754  1.78  1.40  1.73  0.93  6.82  12.76  0.83 
P2P  Price 2  21,744  1,801  2.46  7.52  4.72  1.00  45.73  14.09  36.13 
LCmulti.  C.AP/MS  376  54  2.95  1.52  1.76  0.29  18.49  16.61  15.01 
Sch.Day2  Sch.Day1  57  6  1.93  1.81  0.61  0.56  5.82  1.80  0.82 
*DG=DeepGL, GS=GraphSAGE, P2P=p2pGnutella08, C.AP/MS=CombinedAP/MS 
7.3.2. Qualitative Results
We visually compare the embedding results to review more detailed differences, as shown in Fig. 8. For cVAE, we show the results that have the longest Bhattacharyya distance from 10 trials. Because GraphSAGE and cVAE do not provide interpretable features, for the comparison, we colorcode the nodes of the target network by the feature values from the DeepGL results. In specific, the left three columns in Fig. 8 are colored based on values of the feature that has the top absolute loadings for cPC1 and the far right column is colored by their class name.
We can see that although the quality of the contrastive representation in Table 5 is different, these different designs seem to identify similar unique patterns. For instance, all the results of P2P and Price 2 show monotonic increase of the feature value (F41—core numbers). Also, for LCmupltiple and CombinedAP/MS, both DeepGL & cPCA and DeepGL & cVAE depict clearly separated patterns, as indicated with the green rectangles while GraphSAGE & cPCA does not show the same pattern. Furthermore, in each result of the school networks, we can see a distinct group that consists of gray nodes, as annotated with the red rectangles.
From the above quantitative and qualitative comparisons, we can see that DeepGL & cPCA (i.e., our icNRL) generates similar quality results when compared with the alternatives. However, the other two designs do not provide interpretable results.
8. Conclusion and Future Work
This work introduces contrastive network representation learning (cNRL), which aims to reveal unique patterns in one network relative to another. Furthermore, we demonstrate a method of cNRL, icNRL, that is more generic and interpretable. With these contributions, our work provides a new approach for network comparison.
We have demonstrated the usability of icNRL with small or mediumscale networks (less than 10,000 nodes) to provide intelligible examples. As a next step, we plan to apply icNRL on larger networks (e.g., networks with millions of nodes). When analyzing such large, complex networks, the linearity of cPCA used in icNRL might limit the capability of finding unique patterns. Therefore, we will investigate how to incorporate nonlinear contrastive learning methods (such as cVAE) for cNRL while retaining interpretability.
Acknowledgements.
This research is sponsored in part by the U.S. National Science Foundation through grants IIS1741536 and IIS1528203.References
Appendix A Datasets
For the evaluation, we use the datasets in various data repositories including SNAP, CCSB Interactome Database, and SocioPatterns as well as the synthetic datasets that we generated. To allow the reproducibility of this work, we provide links to the original network datasets, processed datasets, and feature matrices learned by DeepGL and GraphSAGE in https://takanorifujiwara.github.io/s/cnrl/.
Appendix B Implementation Details
We have implemented the cNRL architecture with Python 3. The implemented cNRL architecture allows the user to apply any NRL and CL methods that provide “fit” and “transform” methods (as similar to machine learning methods supported in scikitlearn^{4}^{4}4scikitlearn, https://scikitlearn.org/, accessed 2020210.). For the implementation of icNRL, we have integrated DeepGL and cPCA into the cNRL architecture. Because there is no implementation of DeepGL available from Python^{5}^{5}5Implementation using Java with Neo4j database is available from https://github.com/neo4jgraphanalytics/mlmodels, accessed 2020210., we have implemented DeepGL with graphtool^{6}^{6}6graphtool, https://graphtool.skewed.de/, accessed 2020210.. For cPCA, we have modified the implementation available online^{7}^{7}7ccPCA, https://github.com/takanorifujiwara/ccpca, accessed 2020210. to add the automatic contrastive automatic selection described in Sec. 5.2.3.
Appendix C Experiment Details
The source code for generating the experimental results is available in https://takanorifujiwara.github.io/s/cnrl/.
c.1. Learning Parameters of icNRL
c.1.1. DeepGL Settings
Because DeepGL is introduced as a comprehensive inductive NRL framework, there are multiple settings we can adjust. The terminologies used here are the same as (rossi2018deep). Refer to (rossi2018deep) for those not explained in this paper (indicated with italic fonts below). For all the cNRL we performed, we have used DeepGL with and the logarithmic binning to transform feature values with as the transformation parameter, but without the feature diffusion. For the other settings, generally, we have used as many different relational feature operators and base features as possible for each network dataset. As for the relational feature operators, for directed networks, we have used all the combinations of with (i.e., 12 operators in total). For undirected networks, we have used where . As for the base feature , we have used all centralities and measures available in graphtool. However, for each network, some of these features have produced ‘NaN’ values (e.g., closeness). In that case, we have excluded such features from the base features. Table 6 shows the base features we used for each analysis. Additionally, for scoring and pruning of the learned , we have applied the same method used in (rossi2018deep) with the tolerance/feature similarity threshold, . As becomes larger, the number of features learned by NRL (i.e., ) increases. We have set a different value for each analysis, as listed in Table 6. In general, for the undirected networks, we have used relatively higher values () because the number of base features used is smaller when compared with the directed networks.
Dolphin  Karate  {totaldegree, betweenness, closeness, eigenvector, PageRank, Katz}  0.7 
Price  Random  {indegree, outdegree, totaldegree, PageRank, betweenness, Katz, core}  0.3 
Random  Price  {indegree, outdegree, totaldegree, PageRank, betweenness, Katz, core}  0.3 
p2pGnutella08  Price 2  {indegree, outdegree, totaldegree, PageRank, betweenness, Katz, core}  0.5 
p2pGnutella08  Enhanced Price  {indegree, outdegree, totaldegree, PageRank, betweenness, Katz, core}  0.5 
LCmultiple  CombinedAP/MS  {totaldegree, betweenness, eigenvector, PageRank, Katz}  0.7 
SchoolDay2  SchoolDay1  {gender, totaldegree, closeness, betweenness, eigenvector, PageRank, Katz}  0.7 
c.1.2. cPCA Settings
For all results, we have used cPCA with the automatic contrastive parameter selection and default settings. That is, we have applied the standardization to each of and for both learning and projection and the automatic contrastive parameter selection with .
c.2. Full Sets of cPC Loadings
The full sets of cPC loadings obtained with icNRL for each analysis in Sec. 7.1 and Sec. 7.2 are listed in Table 710.
relational function  base feature  cPC 1  cPC 2 

: Price, : Random  
indegree  0.19  0.06  
outdegree  0.40  0.00  
totaldegree  0.55  0.00  
PageRank  0.00  0.00  
betweenness  0.00  0.00  
Katz  0.19  0.06  
core  0.00  0.00  
indegree  0.01  0.00  
indegree  0.00  0.00  
: Random, : Price  
indegree  0.10  0.25  
outdegree  0.01  0.02  
totaldegree  0.18  0.47  
PageRank  0.02  0.01  
betweenness  0.01  0.00  
Katz  0.09  0.24  
core  1.00  0.13  
indegree  0.00  0.00  
indegree  0.00  0.00  
relational function  base feature  cPC 1  cPC 2 

: p2pGnutella08 , : Price 2  
indegree  0.12  0.17  
outdegree  0.04  0.00  
totaldegree  0.22  0.30  
PageRank  0.04  0.00  
betweenness  0.00  0.00  
Katz  0.11  0.13  
core  1.01  0.10  
indegree  0.00  0.00  
outdegree  0.00  0.00  
betweenness  0.00  0.00  
outdegree  0.00  0.00  
indegree  0.00  0.00  
outdegree  0.00  0.00  
outdegree  0.00  0.00  
: p2pGnutella08 , : Enhanced Price  
indegree  0.12  0.05  
outdegree  0.05  0.00  
totaldegree  0.23  0.00  
PageRank  0.00  0.00  
betweenness  0.00  0.00  
Katz  0.10  0.05  
core  0.00  0.00  
indegree  0.00  0.00  
outdegree  0.00  0.00  
betweenness  0.00  0.00  
outdegree  0.00  0.00  
indegree  0.00  0.00  
outdegree  0.00  0.00  
outdegree  0.00  0.00  
relational function  base feature  cPC 1  cPC 2 

totaldegree  0.02  0.13  
betweenness  0.00  0.00  
eigenvector  0.01  0.12  
PageRank  0.01  0.00  
Katz  0.00  0.24  
totaldegree  0.14  0.02  
betweenness  0.00  0.00  
eigenvector  0.19  0.01  
PageRank  0.00  0.00  
Katz  0.36  0.00  
betweenness  0.00  0.00  
PageRank  0.01  0.00  
totaldegree  0.03  0.01  
betweenness  0.00  0.00  
PageRank  0.01  0.01  
betweenness  0.00  0.00  
PageRank  0.01  0.01  
betweenness  0.00  0.00  
PageRank  0.00  0.00  
relational function  base feature  cPC 1  cPC 2 

totaldegree  0.05  0.03  
closeness  0.01  0.00  
betweenness  0.01  0.00  
eigenvector  0.06  0.02  
PageRank  0.00  0.04  
Katz  0.02  0.02  
gender  0.01  0.00  
totaldegree  0.07  0.01  
betweenness  0.02  0.01  
gender  0.01  0.00  
gender  0.01  0.00  
totaldegree  0.01  0.03  
closeness  0.03  0.00  
betweenness  0.02  0.00  
eigenvector  0.03  0.01  
PageRank  0.03  0.01  
Katz  0.01  0.02  
gender  0.00  0.00  
gender  0.01  0.00  
totaldegree  0.06  0.01  
betweenness  0.04  0.00  
gender  0.02  0.01  
totaldegree  0.05  0.04  
closeness  0.11  0.04  
betweenness  0.09  0.01  
eigenvector  0.08  0.03  
PageRank  0.15  0.02  
Katz  0.06  0.05  
gender  0.00  0.00  
betweenness  0.00  0.00  
gender  0.00  0.00  
gender  0.00  0.00  
gender  0.00  0.00  
c.3. Network Generation Models and Parameters
We have used the Gilbert’s and Price’s network models to generate Random (N3), Price (N4), and Price 2 (N6) in Table 1. Also, in Sec. 7.2.1, we have introduced the enhanced Price’s network model as the solution to generate a network of which nodes have different core numbers—Enhanced Price (N7) in Table 1. In the following, we explain the details of the parameters we used for the network generation and the enhanced Price’s model.
c.3.1. Parameters for the Gilbert’s and Price’s Models
The Gilbert’s model generating a random network requires the fixed probability of a connection of each pair of nodes. We have set the probability to for generating Random (ID 4). The Price’s model requires the fixed number of outdegree of newly added nodes as its parameter. We have set this parameter to 3 for both Price (N4) and Price 2(N6).
c.3.2. Enhanced Price’s Model
For the enhanced Price’s model, we modify the Price’s model to be able to generate nodes with various the core numbers. To achieve this, in the enhanced Price’s model, we allow the user to set multiple positive integer numbers of outdegree of newly added nodes. We denote this input as where is the length of the input. To select one number from when a new node is added, we need to set the probability of selecting each number. We denote the probabilities as where .
To generate Enhanced Price (ID 8), we have set these parameters to {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and {0.3, 0.25, 0.15, 0.1, 0.075, 0.05, 0.025, 0.025, 0.0125, 0.0125}.
c.4. Settings of GraphSAGE and cVAE
We describe the detailed settings and parameters of GraphSAGE and cVAE used in Sec. 7.3. We have used the source code provided by the authors of GraphSAGE^{8}^{8}8GraphSAGE: https://github.com/williamleif/GraphSAGE, accessed 2020210. and cVAE^{9}^{9}9Contrastive VAE: https://github.com/abidlabs/contrastive_vae, accessed 2020210.. For GraphSAGE, we have used the unsupervised model graphsage_maxpool with 24 as the number of features leaned (i.e., dim_1=12 and dim_2=12) while we have followed the default values for other parameters (e.g., learning_rate=0.00001 and model_size=‘small’). We have used cVAE with the default parameters (i.e., intermediate_dim = 12, latent_dim = 2, batch_size = 64, and epochs = 500).
c.5. Automatic Contrastive Parameter Selection
Fig. 9 shows transitions of value during the automatic selection in icNRL. For all the experiments, we can see that reaches the convergence before 10 iterations.
Comments
There are no comments yet.