Cross-domain Network Representations

08/01/2019 ∙ by Shan Xue, et al. ∙ University of Technology Sydney 0

The purpose of network representation is to learn a set of latent features by obtaining community information from network structures to provide knowledge for machine learning tasks. Recent research has driven significant progress in network representation by employing random walks as the network sampling strategy. Nevertheless, existing approaches rely on domain-specifically rich community structures and fail in the network that lack topological information in its own domain. In this paper, we propose a novel algorithm for cross-domain network representation, named as CDNR. By generating the random walks from a structural rich domain and transferring the knowledge on the random walks across domains, it enables a network representation for the structural scarce domain as well. To be specific, CDNR is realized by a cross-domain two-layer node-scale balance algorithm and a cross-domain two-layer knowledge transfer algorithm in the framework of cross-domain two-layer random walk learning. Experiments on various real-world datasets demonstrate the effectiveness of CDNR for universal networks in an unsupervised way.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Networks generated from mature systems usually have larger numbers of entities such as nodes and edges than the emerging ones. For example, a new born online social media attracts limited numbers of users and hasn’t formed massive interactions among them, from where the it gets extremely scarcer scale than the mature media like Facebook. Furthermore, in some domains such as the biological domain, it’s difficult to collect sufficient data due to the costs, technique barriers, ethic reasons and so on. Traditional industries normally lack historical data when data-driven techniques haven’t brought them benefits. Above scenarios lead to data deficiency which affect network analysis and learning. Previous approaches developed for network representation based on large-scale datasets are not able to be applied.

Figure 1: Illustrations of undirected network structures formed by entities of nodes and edges. (a) Social network is formed by users {A1,B1,C1,D1} and user friendships; (b) Biological network is formed by proteins {A2,B2,C2,D2} and protein-protein interactions; and (c) Citation network is formed by papers {A3,B3,C3,D3} and citations. (d) Information network extracts information flows from (a), (b) and (c) as edges and inherit the nodes {A,B,C,D}.

From the domain-specific view, rich data collected from real-world complex systems with large-scale network datasets. The components in a system are defined as the nodes in a network, direct interactions between nodes are defined as edges, and connection strengths are described by weights on edges. Techniques not only analysis networks but also learn knowledge from network structures which has become a main stream in network research for artificial intelligence purposes xuan2016uncertainty ; wang2018learning . To this end, networks are preliminarily categorized based on real-world systems and their physical properties, such as social network chaker2017social ; choi2012incremental , biological network jancura2010dividing and citation network lu2018structural . As shown in Figure 1, social networks (a) denote users as nodes and friendship as edges; biological networks such as the Protein-Protein Interactions (PPI) network (b) models proteins as nodes and PPI as edges; and citation networks (c) represent papers as nodes and citations as edges.

From all kinds of networks, the information network tang2015line abstract the information flows from the original network structure, where the original nodes like users, proteins and authors are treated as the information users and suppliers and the information interchanges on friendship, PPI and citations as edges. The information network encode the network behaviors and save them into the network structure as shown in Figure 1(d). Information networks help us illustrate the entities in a physical system but raise a question on how to understand the various properties behind the different network categories especially when the topologies seem no difference as shown in Figure 1.

Network representation aims to learn a latent feature/vector space by learning from the information formed by network entities

tang2009relational . It inputs the high-dimensional network structures and outputs relatively low-dimensional representations in encoding as many community properties as possible. For the use of machine learning, network representation should output complex but highly structured latent features, to meet the smoothness requirement in learning function and to overcome the sparsity from input data bengio2013representation

. To this end, a series of network representation approaches have been proposed based on the sampling strategy of random walks and the deep learning technique in the last decade. The random walk is a type of similarity measurement for a variety of problems in community detection

noh2004random ; lai2010enhanced , which computes the local community structure information sub-linear to the size of the input network yang2015defining ; leskovec2012learning . A stream of short random walks is used as a basic tool for extracting information from real-world large-scale information networks liu2010link ; perozzi2014deepwalk .

The typical random walk-based network representation algorithms, such as DeepWalk perozzi2014deepwalk

, learn sequences of nodes as a stream of short random walks to model the network structures of deep features which obviously are highly dependent on the sliced window that controls random walk learning for node sampling purpose. However, when the distance between two nodes is larger than the sliced window size, the random walk jumps to the next round. Although it could be covered by introducing a vast amount of sampling, the repetitions increase computational complexity. This explains the main reason why the networks with small structure scales are barely applicable for these algorithms. Therefore, the previous works on random walk-based network representations are limited in a domain-specific way so that the performance mainly relies on the network topological quality. Our previous work proposed a framework for transferring structures across large-scale information networks (FTLSIN)

IJCNN18 , however only enabled structural knowledge transfer across relational information networks and both networks should have large scales. The cases listed in the beginning of this paper will not be guaranteed satisfying latent feature spaces from the limited network structures for the further machine learning tasks within one domain.

To address above problems, we propose a novel algorithm for universal cross-domain network representations (CDNR) with the following contributions.

  1. CDNR offers an effective learning solution for the network representation, where the network doesn’t have enough entities that causes a random walk failure in structural sampling.

  2. CDNR determines the relationships between two independent networks which would belong to irrelevant domains. Similar network patterns are detected so that links generated between the corresponding communities transfer knowledge in CDNR.

  3. CDNR predicts the potential entities for the scarce network structures by employing the cross-domain two-layer random walk (CD2L-RandomWalk) framework from IJCNN18 and integrating two novel algorithms, cross-domain two-layer node-scale balance (CD2L-NodeBalance) and cross-domain two-layer knowledge transfer (CD2L-KnowlTransfer).

The rest of the paper is arranged as follows. In Section 2, related works are summarized. In Section 3, we state the CDNR problem. In Section 4, the proposed CDNR algorithm is explained in detail. In Section 5, two experiments are designed to evaluate the representations on real-world datasets. Our conclusions are presented in Section 6.

2 Related Works

The previously used per-node partition function bandyopadhyay2008counting is expensive to compute, especially for large information networks. To overcome this disadvantage, a series of sampling strategies have been proposed kurant2011towards ; lelis2013predicting to analyze the statistics within local structures, e.g., communities and sub-networks. These approaches are different from traditional representation learning paalanen2006feature ; zhu2015unsupervised ; ding2015deep . The latent feature learning of the network representation captures neighborhood similarity and community membership in topologies yang2015network ; pan2016tri ; tu2016max .

DeepWalk perozzi2014deepwalk

trains a neural language model on the random walks generated by the network structure. After denoting a random walk that starts from a root node, DeepWalk slides a window and maps the central node to its representation. Hierarchical Softmax factors out the probability distributions corresponding to the random walk and the representation function is updated to maximize the probability. DeepWalk has produced promising results in dealing with sparsity in scalable networks, but has relatively high computational complexity for large-scale information networks. LINE, Node2Vec and Struc2Vec are the other structure-based network representation algorithms that improve the performance of DeepWalk. LINE

tang2015line preserves both the local network structure and the global network structure by first-order proximity and second-order proximity respectively and can be applied to large-scale deep network structures that are directed, undirected, weighted and unweighted. Node2Vec grover2016node2vec explores the diverse neighborhoods of nodes in a biased random walk procedure by employing classic search strategies. Struc2Vec ribeiro2017struc2vec encodes structural similarities and generates the structural context for nodes using random walks. The above-mentioned works has contributed to network analysis by modeling a stream of short random walks.

All the previous works based on random walk to sample networks into a steam of nodes are under a common assumption of power-law distribution. The power-law distribution exists widely in real-world networks. It is a special degree distribution that follows , where is a node degree and is a positive constant newman2005power . A network that follows the power-law distribution is also regarded as a scale-free network with the scale invariance property barabasi2009scale . The social networks, biological networks and citation networks being discussed in this paper are observed to be scale-free in nature barabasi2016network . In - axes, the power-law distribution shows a linear trend on the slope ratio of (Figure 4 and Figure 6), which reflects that numerous edges connect small degree nodes and will not change regardless of network scale adamic2000power . It has been observed in perozzi2014deepwalk that if a network follows the power-law distribution, the frequency at which a node undertakes in a short random walk will also follow the same distribution. Meanwhile, random walks in power-law distribution networks naturally gravitate towards high degree nodes adamic2001search .

In this paper, we propose CDNR which employs biased random walk sampling strategies to learn network structures based on previous works. However, CDNR is different from the deep transfer learning approaches for cross-domain graph-structured data, i.e., context enhanced inductive representation hamilton2017inductive , intrinsic geometric information transfer lee2017transfer and deep inductive graph representation rossi2018deep

. Deep neural network-based network representation usually need to generalize a small set of base feature for deep learning, such as network statistical properties like node degree, which lost valuable information from networks. The link predictions in

CD2L-RandomWalk are therefore leveraged on the power-law distribution as well as the distance calculation between the two independent networks across domains. The network that has small distance to the target network is regarded as the source domain. The scale invariance property should theoretically ensure that power law-based CDNR is robust.

3 Problem Statement

, A source domain and a target domain.
, , A -dimensional target domain latent feature spaces, a -dimensional feature
vectors, and the -th element of .
, A (un)directed unattributed unweighted network from , and a (un)directed
unattributed weighted network from .
, , , The node set of , the node set of , a node in , and a node in .
, , , The edge set of , the edge set of , an edge between and , and an edge
between and .
, The weight set on , and a weight in on .
, , A form of super graph for , a super-node set, and a super-edge set.
, A super node in , and a super edge in connecting and .
, A weight set, and a weight in on .
, The random walk sets on and .
A shortest path between and over .
, , A set of node degree values on , a set of node degree values on and a set of
node degree values on .
, A link set between and across domains, and a weight set on .
, A link in that connects and , and a weight in on .
The slope ratio of power-law distribution.
The function calculates the node degree.
The average function on a value set.
The function counts the number in a set.
Table 1: Summary of notations.

Definition 1 (Domain lu2015transfer ) A domain is denoted as , where X is the feature space and is the marginal probability distribution that .

Definition 2 (Network barabasi2016network ) Let be a given network, where represents one kind of entities known as nodes, represents another kind of entities known as edges reflecting connections between nodes, , and represents the possible weights on .

Definition 3 (Cross-domain Network Representation) Suppose a source domain represented by and a target domain represented by , domains are irrelevant if and ; and relevant if or . CDNR employs structural information on , from to , to improve the target domain representations in a -dimensional latent feature space.

To prepare the structural knowledge from the source domain, CDNR firstly implements a maximum likelihood optimization to generate a set of random walks on in the bottom layer of CD2L-RandomWalk. A neighborhood is clustered rooted at node by the neighborhood sampling strategy on the biased random walks grover2016node2vec . Then, CD2L-RandomWalk constructs links between and . To this end, CD2L-NodeBalance balances the scales of and by clustering into super nodes in which share close node degrees with ; and generate links between and . CD2L-KnowlTransfer trains the maximized similarities across two domains and determines how much value should be transferred across the shortest paths and , where are formed by the super edges and the values are save in . In CDNR, the representations are learned in the top layer of CD2L-RandomWalk and will be evaluated by a standard classification task.

3.1 Bottom-layer Random Walk: Knowledge Preparation

The bottom-layer random walk is designed for knowledge preparation in the source domain. The sampled random walks contains structural knowledge from which will be transferred to the target domain. The bottom-layer random walk introduces a biased random walk strategy to efficiently explore diverse neighborhoods and sample the nodes along the shortest path111The shortest path is a path between two nodes for which the sum of its edge weights is minimized.. Suppose a set of random walks , each root node repeats times for sampling and each random walk is set in a length of . In generating a random walk, suppose we are standing at node which is the -th node in the random walk, , the node denotes the -th node and the -th node is chosen from based on a probability where

is the partition function that ensures a normalized distribution

bengio2013representation and is guided by the search bias . To be specific, follows the searching rules: if the length of the shortest path between nodes and is , then ; , if ; and , if . The sampling strategy on the biased random walks is computationally efficient especially for real-world large-scale networks.

4 Knowledge Transfer in Cross-domain Network Representations

CDNR enables the cross-domain random walk-based network representations and assumes both networks across domains follow the power-law distribution. Representations in CDNR work under the Skip-gram framework and are optimized by maximum likelihood over biased random walks. The contributions of CDNR are realized in this section by

CD2L-RandomWalk with the two components: CD2L-NodeBalance and CD2L-KnowlTransfer.

4.1 Cross-domain Two-layer Node-scale Balance and Link Prediction

By transferring knowledge from an external source domain, CDNR deals with the scenarios that the training sample in the target domain is insufficient to make a good network representation. Such knowledge transfer belongs to a transfer learning task pan2010survey arises two questions: 1) Link prediction: how to construct paths between two networks across domains for CD2L-RandomWalk, and 2) CD2L-NodeBalance: how to solve the problem of unbalanced node scales.

The unbalancedness between and is reflected on the nodes and also on the connections , where and refer to the node scales, and and refer to the average node degrees222The average degree is a mean on the degrees of all nodes in the network.. In this case, CD2L-NodeBalance tries to reform into a smaller size based on the network structures of . For the purpose of discovering sub-graph patterns wang2017incremental , a concept of super node guo2014super is employed and we define the formation for CDNR.

Definition 4 (Super Node in Source Domain) A super node is a sub-graph of the original source network. Denoting the super-node set , a super node consists of a group of nodes and the edges connecting to or from . The nodes that clustered into a have close node degrees.

To cluster a set of nodes in the large-scale network, a super-node learning based on the nodes in the target domain is as follows:


where is a predicted link between and across domains, and is the weight on which indicates the similarity between and and how much knowledge should be transferred from the source domain to the target domain in CD2L-RandomWalk.

CD2L-NodeBalance attempts to pair each node with at least one super node in a minimum super-node scale and a maximum likelihood between and according to Eq. (4). For each pair of , we firstly initialize a link and a weight following,


where denotes the degree of , denotes the degree of , and is initialized on nodes in the same degree.

To optimize in Eq. (4), we analysis the degree ranges over and in and respectively and reorganize including merging and dividing super nodes based on the following three cases.

Denoting the range scales and , there are three possible cases of CD2L-NodeBalance as follows and as shown in Figure 2. Degree sets and are always ranked in a decreasing order. A finds the corresponding that are in the same position in and , denoted as .

Figure 2: An illustration of the three cases in CD2L-NodeBalance and their interconversions: 1) Case 2 to Case 1, 2) Case 2 to Case 2, 3) Case 2 to Case 3, 4) Case 3 to Case 2 and 5) Case 3 to Case 3.

Case 1: If , only one links to . In this case, CD2L-NodeBalance is completed in the initialization stage with and .

Case 2: If , more than one links to . at the current stage is going to be optimized in Eq. (4). If turns to 0, the edge is deleted and the is merged into another super node that linked with and gets the smallest weight.

Case 3: If , there are at least one not linked to any . We add a group of empty super nodes in a number of and evenly insert them into . To fill up the , a few nodes in next to are removed and added to . then is initialized by Eq. (3).

In Case 2 and Case 3, is optimized by maximizing the likelihood between and . Starting from each , a vector weights for each pair of , where , and . If there is link between , ; else wise 0.


where is a vector in size of with the value of 0 or 1, which based on in Cases 1-3. Let in which and are the power-law slope ratio of and respectively. controls the range of the likelihood over CD2L-NodeBalance, where is a parameter for and is a parameter for . The optimized CD2L-NodeBalance results suggest the predicted links where in Case 2 and Case 3.


Algorithm 1 The CD2L-NodeBalance algorithm.
0:     in the target domain and in the source domain.
0:     Cluster by node degrees. , Node degree scales in and . Apply Eq. (3).
1:  while  do
2:      Apply Eq. (4)
3:      Apply Eq. (2)
4:  end while 
5:  return   and

4.2 Cross-domain Two-layer Knowledge Transfer and Target Domain Edge Evolvement


Algorithm 2 The CD2L-KnowlTransfer algorithm.
0:     Random walks of generated in the bottom-layer of CD2L-RandomWalk; in the target domain; and and from Algorithm 1.
1:  for  in  do
2:      Apply Eq. (6).
3:      Construct shortest paths between and .
4:      Update weight on by Eq. (7).
5:      Evolve new edge if and .
6:  end for
7:  return  

CD2L-KnowlTransfer transfers the knowledge saved in weights through the predicted links . The knowledge includes three parts of weights as shown in Figure 3: a weight on the super edge that reflects the knowledge learning from the random walks in the source domain, two weights on the predicted links, and the original weight on (in this paper is 1 or 0). The CD2L-KnowlTransfer follows:


where denotes the super graph.

Definition 5 (Super Graph in Source Domain) A super graph reformed from is formed by super nodes , super edges and the super weights on , where . If a random walk belongs to goes through and , there will be an .


where is the distance between nodes and in a random walk, and calculates every random walk going over .

Figure 3: An illustration of weight contributions on a target domain network edge in CD2L-KnowlTransfer.

In having the three parts of weights that contribute to in CD2L-KnowlTransfer, the weight on in the top layer of the CD2L-RandomWalk are denoted as:


where is the original weight that reflects an edge between and or no edge, is for normalization, is the shortest path between and over , is the length of , denotes an edge that consists in , and is the weight on .

In CD2L-KnowlTransfer above, is enriched in network structures by putting extra weights on the original edges and also evolves possible edges.

4.3 Top-layer Random Walk and Network Representations

1:   Random walks generated from in the Bottom-layer Random Walk.
2:  , Apply Algorithm 1.
3:   Apply Algorithm 2.
1:  for  in  do
2:      Search neighborhood of with .
3:      Apply Skip-gram to optimize.
4:  end for
5:  return   A latent feature space of by .
Algorithm 3 The CDNR algorithm.

CDNR represents in the top layer of CD2L-RandomWalk after CD2L-NodeBalance and CD2L-KnowlTransfer. CDNR learns the latent feature space by in the Skip-gram framework.

Given a node in the target domain with the window size , we obtain a cross-domain Skip-gram for by maximizing the following log-likelihood function of in observing a neighborhood of ,


where is learned on that , while is learned on .

In summary, Algorithm 3 of CDNR is formed by CD2L-RandomWalk and Top-layer Feature Learning. The main advantage of CDNR is that when the network representation is poor because it lacks structures, the CD2L-RandomWalk enables knowledge transfer from external domains and CDNR doesn’t need to rebuild a network representation model. CDNR offers an efficient cross-domain learning with a relatively low computational cost of on CD2L-NodeBalance, on CD2L-KnowlTransfer and on network representation in line with Node2Vec grover2016node2vec .

5 Experiments

This section evaluates the effectiveness of the CDNR compared to the baseline algorithms of network representations in both single-label classifications (Section 5.2) and multi-label classifications (Section 5.3).

5.1 Baseline Algorithms

This experiment evaluates the performance of the unsupervised CDNR on the target networks. The representation outputs are applied to a standard supervised learning task, i.e., linear SVM classification


. The experiments choose a simple classifier because we want to put less emphasis on classifiers in evaluate the network representation performance. The baseline algorithms are chosen from the previous domain-specific network representations and a deep inductive graph representation as follows.

  1. DeepWalk (Perozzi et al. 2014) perozzi2014deepwalk is the first random walk-based network representation algorithm. By choosing DeepWalks, we exclude the matrix factorization approaches which have already been demonstrated to be inferior to DeepWalk.

  2. LINE (Tang et al. 2015) tang2015line learns latent feature representations from large-scale information networks by an edge-sampling strategy in two separate phases of first- and second-order proximities. We excluded a recent Graph Factorization algorithm ahmed2013distributed because LINE demonstrated better performance in the previous experiment.

  3. Node2Vec (Grover et al. 2016) grover2016node2vec learns continuous feature representations of nodes using a biased random walk procedure to capture the diversity of connectivity patterns observed in networks with the biased parameter which is controlled by parameters of and .

  4. Struc2Vec (Ribeiro et al. 2017) ribeiro2017struc2vec learns node representations from structural identity by constructing a hierarchical graph to encode structural similarities and generating a structural context for nodes.

  5. DeepGL (Rossi et al. 2018) rossi2018deep learns interpretable inductive graph representations by relational functions for each representing feature and achieve inductive transfer learning across networks. It inputs a 3-dimensional base features to a CNN and outputs the representation in dimensions where depends on learning.

5.2 Experiment on Single-label Dataset

5.2.1 Single-label Datasets

Two academic citation networks are selected as the datasets. Both of them are used for the multi-class classification problem galar2011overview . Nodes are denoted as papers in these networks.

Domain Datasets Num. of Num. of Num. of
Nodes Edges Categories
Source DBLP 60,744 52,890 4
Target M10 10,310 77,218 10
Table 2: Single-label classification dataset statistics.
(a) dblp
(b) dblp RW
(c) M10
(d) M10 RW
Figure 4: Power-law distribution of the single-label classification datasets and their random walks on the networks. The X-axial is denoted as of the network and the Y-axial is denoted as . Each power-law distribution pair of the network and its random walks should follow the same pattern so that random walks over the network can conduct skip-gram based network representations. For example, (a) and (b) are formed as a power-law distribution pair following the same pattern by which random walks on the dblp network are guaranteed a network representation on dblp.
  1. DBLP dataset333 (V4 version is used) (source network) consists of bibliographic data in computer science. Each paper may cite or be cited by other papers, naturally forming a citation network. The network in this dataset abstracts a list of conferences from four research areas, i.e.,

    database, data mining, artificial intelligence and computer vision.

  2. CiteSeer-M10 dataset444 (target network) is a subset of CiteSeerX data which consists of scientific publications from 10 distinct research areas, i.e., agriculture, archaeology, biology, computer science, financial economics, industrial engineering, material science, petroleum chemistry, physics and social science.

5.2.2 Experiment Setup

For the evaluations, we randomly partition the dataset in the target domain into two non-overlapping sets for training and testing by nine groups of training percentages,

. We repeat the above steps 10 times and thus obtain 10 copies of the training data and testing data. The reported experimental results are the average of the 10 runs and their variance.

The parameters of CDNR are set in line with typical values used for DeepWalk perozzi2014deepwalk , LINE tang2015line , Node2Vec grover2016node2vec and Struc2Vec ribeiro2017struc2vec . For networks in both the source domain and the target domain, let the dimensions of feature representation be , the walk length be , the number of walks of every source node be , the window size be , , and the search bias be with and . Let the learning rate start from 0.025 as in tang2015line and the convergence track on 0.1 in our experiment. For Struc2Vec, let OPT1 (reducing the length of degree sequences), OPT2 (reducing the number of pairwise similarity calculations) and OPT3 (reducing the number of layers) all in values of True, and the maximum number of layers be 6. The parameters in CD2L-NodeBalance is set as and . In these settings, the total number of random walks over an input network is and the size of the random walks is . For DeepGL rossi2018deep , the operator is chosen from {mean, sum, maximum, Hadamard, Weight , RBF} which gets best results in base feature learning; is set in 1; feature similarity threshold is set in 0.01; maximum depth of layer is set in 10; and convergence for feature diffusion is set in 0.01.

5.2.3 Single-label Classification

(a) Node2Vec on dblp
(b) CDNR on M10
(c) PCA on M10
(d) LLE on M10
(e) Laplacian on M10
Figure 5: Network representation on dblp and M10 in a 2-dimensional latent feature space.

We use Macro-F1 and Micro-F1 yang1999re to evaluate classification performances and the results are shown in Table 3. The F1 scores are designed to evaluate the effectiveness of category assignments hsu2002comparison .


We use the indicators of true positive (tp), false positive (fp) and false negative (fn) to measure the standard recall () and precision (). In , let and . The Micro_F1 score computes the global binary decisions, where is the number of total test nodes, and is the number of categories of binary labels. In , let and . The Macro_F1 score computes the binary decisions on individual categories and then averages the categories.

Representation Analysis. Figure 5 (a) illustrates the feature spaces of dblp by CDNR bottom-layer random walk (Node2Vec) and Figure 5 (b) illustrates the feature spaces of dblp by CDNR. These two illustrations show almost the same distribution and obtain good mappings in a low dimension compared to PCA (Figure 5 (c)), LLE (Figure 5 (d)) and Laplacian (Figure 5 (e)) based network representations.

Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90%
Micro-F1 DeepWalk 0.1758 0.1833 0.1897 0.2049 0.2051 0.2216 0.2236 0.2420 0.2431
0.0086 0.0100 0.0122 0.0126 0.0128 0.0111 0.0170 0.0133 0.0220
LINE 0.2338 0.2362 0.2623 0.2821 0.3269 0.3244 0.3561 0.3508 0.4128
0.0102 0.0170 0.0110 0.0141 0.0150 0.0087 0.0193 0.0184 0.0486
Node2Vec 0.3342 0.4166 0.4714 0.5213 0.5550 0.5843 0.6216 0.6353 0.6535
0.0099 0.0110 0.0153 0.0127 0.0176 0.0092 0.0215 0.0115 0.0324
Struc2Vec 0.1742 0.1673 0.1701 0.1622 0.1695 0.1669 0.1685 0.1626 0.1784
0.0186 0.0194 0.0171 0.0215 0.0145 0.0215 0.0120 0.0188 0.0281
DeepGL 0.6557 0.6465 0.6739 0.6600 0.6787 0.6724 0.6871 0.6786 0.6911
0.0187 0.0176 0.0161 0.0197 0.0164 0.0139 0.0184 0.0160 0.0403
CDNR 0.7507 0.7728 0.8052 0.8245 0.8363 0.8526 0.8587 0.8772 0.8720
dblp2M10 0.0143 0.0114 0.0154 0.0074 0.0051 0.0116 0.0128 0.0173 0.0179
Macro-F1 DeepWalk 0.2523 0.2667 0.2768 0.2945 0.2935 0.3077 0.3101 0.3294 0.3359
0.0117 0.0051 0.0072 0.0120 0.0081 0.0086 0.0158 0.0123 0.0220
LINE 0.3160 0.2984 0.3421 0.3596 0.4070 0.4275 0.4498 0.4277 0.4773
0.0113 0.0127 0.0144 0.0249 0.0382 0.0548 0.0383 0.0302 0.0486
Node2Vec 0.4326 0.4748 0.5338 0.5900 0.6092 0.6388 0.6866 0.6981 0.6568
0.0147 0.0156 0.0153 0.0153 0.0290 0.0314 0.0202 0.0572 0.0261
Struc2Vec 0.2718 0.2129 0.2220 0.1889 0.2067 0.1871 0.1803 0.1723 0.1749
0.0126 0.0092 0.0143 0.0147 0.0058 0.0150 0.0098 0.0271 0.0165
DeepGL 0.6704 0.6003 0.6154 0.5732 0.5848 0.5607 0.5918 0.5910 0.5914
0.0151 0.0222 0.0247 0.0305 0.0309 0.0264 0.0324 0.0388 0.0407
CDNR 0.7558 0.6939 0.7269 0.7174 0.7301 0.7540 0.7679 0.7722 0.7745
dblp2M10 0.0138 0.0168 0.0174 0.0149 0.0256 0.0238 0.0325 0.0609 0.0698
Table 3: CDNR single-label classification results on the target domain of M10.

Effectiveness of search priority in random walks. In Table 3, DeepWalk and Struc2Vec demonstrate worse performance than LINE, Node2Vec and our CDNR, which can be explained by their inability to reuse samples, a feat that can be easily achieved using the random walk. The outstanding performance of Node2Vec among baseline algorithms indicates that the exploration strategy is much better than the uniform random walks learned by DeepWalk and LINE. The parameter of search bias adds flexibility in exploring local neighborhoods prior to the global network. The poor performance of DeepWalk and LINE mainly occurs because the network structure is rather sparse, feature noise, and contains limited information. CDNR performs best on the M10 network, as dblp is also a citation network that naturally share similar network patterns with M10. Such patterns are captured by CDNR and transfered to M10. On average, there are smaller variances in the performance of CDNR on the dblp2M10 learning task.

Importance of information from source domain. Table 3 shows that CDNR outperforms the domain-specific baseline algorithms, which use topological information from the source domain to learn the network representation in the target domain. When a top layer is working base on the CD2L-RandomWalk, the information in the source network is transferred to the source network by adjusting the weights on the edges of the target network. This procedure achieves better performance and shows the significance of transferring topological information from the external domains.

5.3 Experiment on Multi-label Datasets

5.3.1 Datasets

Datasets Network Num. of Num. of Ave. Num. of Labels
Nodes Edges Degree Categories
Blog3 Social 10,312 333,983 64.776 39 Interests
Facebook Social 4,039 88,234 43.691 10 Groups
PPI Biological 3,890 37,845 19.609 50 States
arXivCit-HepPh Citation 34,546 421,578 24.407 11 Years
arXivCit-HepTh Citation 27,777 352,807 25.409 11 Years
Table 4: Multi-label classification dataset statistics.
(a) Blog3
(b) Blog3 RW
(c) Facebook
(d) Facebook RW
(e) PPI
(f) arXivHepPh
(g) arXivHepPh RW
(h) arXivHepTh
(i) arXivHepTh RW
(j) PPI RW
Figure 6: Power-law distributions of multi-label classification datasets and their random walks.

We select five real-world large-scale networks of different kinds as the experimental datasets, consisting of three online social networks (Blog3, Facebook), two citation networks (arXivCit-HepPh, arXivCit-HepTh) and one biological network (PPI). All of them are for the multi-class multi-label classification problem. In the online social networks, nodes represent users and the users’ relationships are denoted as edges. In the citation networks, papers are denoted as nodes and edges describe the citations in this experiment. In the biological network, genes are denoted as nodes and edges represent the relationships between the genes.

  1. Blog3 (BlogCatalog3) dataset555 is a social blog directory which manages bloggers and their blogs. Both the contact network and selected group membership information is included. The network has 10,312 nodes, 333,983 undirected edges and 39 different labels. Nodes are classified according to the interests of bloggers.

  2. Facebook dataset666 consists of circles (i.e., friends lists) from Facebook. This dataset contains user profiles as node features, and circles as edge features and ego networks. The network has 4,039 nodes, 88,234 undirected edges and 10 different labels representing groups of users.

  3. PPI (Protein-Protein Interactions) dataset777 is a subgraph of the PPI network for Homo Sapiens, which obtains labels from hallmark gene sets and represents biological states. The network has 3,890 nodes, 76,584 undirected edges and 50 different labels.

  4. arXivCit-HepPh (arXiv High-energy Physics Citation Network) dataset888 and arXivCit-HepTh (arXiv High-energy Physics Theory Citation Network) dataset999 are abstracted from the e-print arXiv. arXivCit-HepPh covers all the citations within a dataset of 34,546 papers (regarded as nodes) with 421,578 directed edges. arXivCit-HepTh covers all the citations within a dataset of 27,777 papers (regarded as nodes) with 352,807 directed edges. If a paper cites paper , the graph contains a directed edge from to . The data consist of papers from the period January 1993 to April 2003, categorized by year.

The networks chosen in the experiment follow the power-law distribution adamic2000power , as do the random walks on the networks perozzi2014deepwalk , as shown in Figure 6.

5.3.2 Experiment Setup

Source Domain Target Domain
Blog3 PPI
arXivCit-HepTh PPI
arXivCit-HepPh PPI
Facebook PPI
Blog3 Facebook
Table 5: Networks selected as the source domain and target domain for CDNR by distance.

This experiment summarizes the network statistics in Table 4. Node degree reflects the connection capability of the node. A network is selected as a source domain or a target domain follows and . These selections are shown in Table 5. The experiment setup for the multi-label classification evaluation is as same as the setup in the single-label dataset experiment.

5.3.3 Multi-label Classification

In the multi-label classification setting, every node is assigned one or more labels from a finite set . In the training phase of the CDNR node feature representations, we observe a fraction of the nodes and all their labels, and predict the labels for the remaining nodes. The multi-label classification in our experiment inputs the network representations to a one-against-all linear SVM classifier hsu2002comparison . We use the F1 score of Macro-F1 and Micro-F1 to compare performance yang1999re in Tables 6-9.

Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90%
DeepWalk 0.2849 0.2854 0.2845 0.2803 0.2725 0.2736 0.2629 0.2778 0.2621
0.0181 0.0116 0.0193 0.0170 0.0168 0.0200 0.0241 0.0215 0.0344
LINE 0.2900 0.2772 0.2807 0.2715 0.2702 0.2649 0.2710 0.2494 0.2398
0.0062 0.0077 0.0083 0.0104 0.0113 0.0166 0.0163 0.0251 0.0195
Node2Vec 0.3073 0.2955 0.3024 0.3028 0.3028 0.2995 0.3021 0.2967 0.3005
0.0171 0.0104 0.0139 0.0120 0.0102 0.0186 0.0288 0.0197 0.0283
Struc2Vec 0.2693 0.2713 0.2696 0.2515 0.2603 0.2499 0.2493 0.2419 0.2338
0.0228 0.0187 0.0188 0.0187 0.0133 0.0212 0.0148 0.0156 0.0287
DeepGL 0.3055 0.3063 0.3028 0.2947 0.2987 0.2975 0.2911 0.2890 0.2764
0.0062 0.0083 0.0083 0.0054 0.0063 0.0128 0.0128 0.0180 0.0178
CDNR 0.3386 0.3390 0.3423 0.3420 0.3404 0.3414 0.3350 0.3371 0.3312
Blog3 0.0062 0.0086 0.0061 0.0072 0.0079 0.0073 0.0125 0.0199 0.0238
arXivCit 0.3412 0.3425 0.3410 0.3431 0.3460 0.3474 0.3431 0.3429 0.3330
-HepPh 0.0041 0.0075 0.0052 0.0057 0.0069 0.0112 0.0103 0.0107 0.0192
arXivCit 0.3420 0.3426 0.3434 0.3462 0.3441 0.3553 0.3450 0.3457 0.3521
-HepTh 0.0036 0.0057 0.0044 0.0049 0.0042 0.0101 0.0106 0.0150 0.0260
CDNR 0.3415 0.3442 0.3454 0.3448 0.3468 0.3410 0.3447 0.3443 0.3444
Facebook 0.0053 0.0035 0.0035 0.0066 0.0065 0.0102 0.0119 0.0151 0.0189
Table 6: CDRN multi-label classification results of Micro-F1 on the target domain network of PPI.
Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90%
DeepWalk 0.3416 0.3378 0.3364 0.3406 0.3306 0.3336 0.2949 0.2825 0.2041
0.0140 0.0138 0.0208 0.0171 0.0159 0.0241 0.0288 0.0185 0.0386
LINE 0.3058 0.3003 0.3008 0.2940 0.2868 0.2826 0.2733 0.2462 0.1822
0.0094 0.0113 0.0069 0.0120 0.0138 0.0176 0.0173 0.0262 0.0198
Node2Vec 0.3490 0.3442 0.3510 0.3500 0.3432 0.3414 0.3274 0.3006 0.2310
0.0193 0.0141 0.0205 0.0126 0.0140 0.0201 0.0240 0.0248 0.0385
Struc2Vec 0.2892 0.2926 0.3019 0.2784 0.2851 0.2626 0.2589 0.2399 0.1712
0.0197 0.0232 0.0227 0.0267 0.0152 0.0202 0.0177 0.0287 0.0262
DeepGL 0.3213 0.3290 0.3235 0.3155 0.3136 0.3086 0.2970 0.2783 0.2115
0.0065 0.0072 0.0107 0.0067 0.0095 0.0117 0.0147 0.0220 0.0132
CDNR 0.3519 0.3551 0.3539 0.3514 0.3469 0.3431 0.3282 0.3063 0.2389
Blog3 0.0108 0.0105 0.0073 0.0060 0.0097 0.0060 0.0154 0.0223 0.0276
arXivCit 0.3532 0.3582 0.3536 0.3509 0.3531 0.3456 0.3360 0.3130 0.2512
-HepPh 0.0106 0.0099 0.0085 0.0057 0.0098 0.0085 0.0128 0.0144 0.0270
arXivCit 0.3570 0.3575 0.3568 0.3565 0.3523 0.3556 0.3368 0.3234 0.2682
-HepTh 0.0079 0.0091 0.0044 0.0091 0.0090 0.0137 0.0150 0.0249 0.0330
CDNR 0.3576 0.3595 0.3574 0.3553 0.3573 0.3432 0.3423 0.3164 0.2578
Facebook 0.0086 0.0065 0.0052 0.0068 0.0080 0.0079 0.0145 0.0164 0.0217
Table 7: CDRN multi-label classification results of Macro-F1 on the target domain network of PPI.
Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90%
DeepWalk 0.8078 0.8727 0.8933 0.9050 0.9153 0.9198 0.9307 0.9301 0.9334
0.0449 0.0177 0.0062 0.0059 0.0060 0.0061 0.0039 0.0103 0.0175
LINE 0.4627 0.4654 0.4719 0.4739 0.4765 0.4761 0.4760 0.4787 0.4755
0.0026 0.0104 0.0026 0.0035 0.0035 0.0033 0.0067 0.0066 0.0075
Node2Vec 0.9352 0.9401 0.9398 0.9419 0.9442 0.9454 0.9468 0.9466 0.9502
0.0072 0.0032 0.0051 0.0047 0.0057 0.0063 0.0092 0.0079 0.0098
Struc2Vec 0.4152 0.4521 0.4716 0.4994 0.5161 0.5381 0.5461 0.5639 0.5530
0.0237 0.0144 0.0061 0.0059 0.0078 0.0096 0.0115 0.0241 0.0175
DeepGL 0.9535 0.9483 0.9531 0.9515 0.9552 0.9546 0.9555 0.9612 0.9640
0.0161 0.0114 0.0059 0.0099 0.0087 0.0087 0.0039 0.0089 0.0072
CDNR 0.9584 0.9550 0.9561 0.9565 0.9568 0.9617 0.9606 0.9627 0.9623
Blog3 0.0038 0.0036 0.0049 0.0044 0.0032 0.0050 0.0042 0.0067 0.0085
Table 9: CDNR multi-label classification results of Macro-F1 on the target domain network of Facebook.
Algorithm 10% 20% 30% 40% 50% 60% 70% 80% 90%
DeepWalk 0.7655 0.7915 0.7858 0.8052 0.7902 0.8138 0.8213 0.7678 0.7822
0.0185 0.0242 0.0331 0.0308 0.0306 0.0327 0.0504 0.0317 0.0378
LINE 0.5063 0.5040 0.5083 0.5129 0.5091 0.5040 0.5020 0.4981 0.4961
0.0053 0.0189 0.0093 0.0061 0.0092 0.0077 0.0137 0.0117 0.0109
Node2Vec 0.8310 0.8331 0.8206 0.8373 0.8343 0.8214 0.8192 0.8018 0.8104
0.0256 0.0226 0.0262 0.0359 0.0354 0.0479 0.0487 0.0277 0.0498
Struc2Vec 0.3701 0.3937 0.3926 0.4160 0.4377 0.4525 0.4532 0.4755 0.4583
0.0156 0.0157 0.0174 0.0155 0.0235 0.0131 0.0144 0.0260 0.0347
DeepGL 0.8810 0.8660 0.8724 0.8748 0.8794 0.8856 0.8578 0.8732 0.8757
0.0330 0.0328 0.0381 0.0343 0.0395 0.0313 0.0350 0.0514 0.0537
CDNR 0.8749 0.8831 0.8866 0.8766 0.8890 0.8876 0.8910 0.8443 0.8415
Blog3 0.0301 0.0135 0.0304 0.0360 0.0291 0.0294 0.0334 0.0468 0.0482
Table 8: CDNR multi-label classification results of Micro-F1 on the target domain network of Facebook.

Experimental results from the algorithmic perspective. A general observation drawn from the results is that the learned feature representations from other networks improve or maintain performance compared to the domain-specific network representation baseline algorithms. CDNR outperforms DeepWalk, LINE, Node2Vec, Struc2Vec and DeepGL in all datasets with a gain of 19.29%, 49.57%, 15.66%, 58.83% and 10.06%, respectively. CDNR outperforms DeepWalk, LINE, Node2Vec and Struc2Vec on the PPI dataset and the Facebook dataset in 100% of the experiment, and outperforms DeepGL on the PPI dataset in 100% and the Facebook dataset in 88.89% of the experiment. The losses of CDNR to DeepGL on the training percentages of {80%,90%} might caused by classifier and training sample selection and NN-based DeepGL shows robustness than other algorithms.

Experimental results from the dataset perspective. The general results on the PPI dataset (Tables 6 and 7) reflect the difficulty of cross-domain learning. Considering the domain similarities, a cross-domain adaption from either the social networks or the citation networks to the biological network as shown in our experiment would not be recommended in transfer learning. However, CDNR is capable of capturing useful structural information from network topologies and removing noise from the source domain networks in an unsupervised feature-learning environment, so CDNR on PPI still shows a slight improvement and almost retains its representation performances. Therefore, cross-domain network knowledge transfer learning works in unsupervised network representations. CDNR is less influenced by domain selections when the transferable knowledge is mainly contributed by network topologies.

Examining the results in detail shows that the source domain networks of arXivCit-HepTh and Facebook provide a larger volume of information to the PPI target domain network than other pairs of CDNR experiments, which promote knowledge transfer across domains. The citation networks of arXivCit-HepPh and arXivCit-HepTh transfer 11 categories of Years to PPI (biological network, 50 categories of States, network average degree of 19.609) with a network average degree of 24.407 and 25.409 respectively. The social networks of Blog3 and Facebook transfers 39 categories of Interests with the network average degree of 64.776 and 43.691 respectively. The show that unsupervised CDNR works especially well in dense networks, however, domains share rare natural similarities still can’t guarantee a good knowledge transfer (Blog32PPI: Interests to States).

In addition, the general results on the Facebook dataset (Tables 9 and 9) show promising improvements by CDNR compared to other baseline algorithms. Unsupervised representations of CDNR allow learning from small categories to large categories, and in a heterogeneous label space. CDNR uses its CD2L-RandomWalk learning algorithm to capture the useful topologies in a large-scale information network.

5.4 Statistical Significance

CDNRdblp2M10 CDNRBlog32PPI
DeepWalk LINE Node2Vec Struc2Vec DeepGL DeepWalk LINE Node2Vec Struc2Vec DeepGL
Micro-F1 3.78E-13 7.29E-12 7.39E-07 8.13E-11 5.47E-07 3.66E-09 2.73E-07 1.03E-08 1.88E-08 8.98E-08
Macro-F1 1.04E-11 1.15E-08 4.31E-04 6.22E-10 5.07E-06 2.92E-04 3.17E-10