1 Introduction
Networks generated from mature systems usually have larger numbers of entities such as nodes and edges than the emerging ones. For example, a new born online social media attracts limited numbers of users and hasn’t formed massive interactions among them, from where the it gets extremely scarcer scale than the mature media like Facebook. Furthermore, in some domains such as the biological domain, it’s difficult to collect sufficient data due to the costs, technique barriers, ethic reasons and so on. Traditional industries normally lack historical data when datadriven techniques haven’t brought them benefits. Above scenarios lead to data deficiency which affect network analysis and learning. Previous approaches developed for network representation based on largescale datasets are not able to be applied.
From the domainspecific view, rich data collected from realworld complex systems with largescale network datasets. The components in a system are defined as the nodes in a network, direct interactions between nodes are defined as edges, and connection strengths are described by weights on edges. Techniques not only analysis networks but also learn knowledge from network structures which has become a main stream in network research for artificial intelligence purposes xuan2016uncertainty ; wang2018learning . To this end, networks are preliminarily categorized based on realworld systems and their physical properties, such as social network chaker2017social ; choi2012incremental , biological network jancura2010dividing and citation network lu2018structural . As shown in Figure 1, social networks (a) denote users as nodes and friendship as edges; biological networks such as the ProteinProtein Interactions (PPI) network (b) models proteins as nodes and PPI as edges; and citation networks (c) represent papers as nodes and citations as edges.
From all kinds of networks, the information network tang2015line abstract the information flows from the original network structure, where the original nodes like users, proteins and authors are treated as the information users and suppliers and the information interchanges on friendship, PPI and citations as edges. The information network encode the network behaviors and save them into the network structure as shown in Figure 1(d). Information networks help us illustrate the entities in a physical system but raise a question on how to understand the various properties behind the different network categories especially when the topologies seem no difference as shown in Figure 1.
Network representation aims to learn a latent feature/vector space by learning from the information formed by network entities
tang2009relational . It inputs the highdimensional network structures and outputs relatively lowdimensional representations in encoding as many community properties as possible. For the use of machine learning, network representation should output complex but highly structured latent features, to meet the smoothness requirement in learning function and to overcome the sparsity from input data bengio2013representation. To this end, a series of network representation approaches have been proposed based on the sampling strategy of random walks and the deep learning technique in the last decade. The random walk is a type of similarity measurement for a variety of problems in community detection
noh2004random ; lai2010enhanced , which computes the local community structure information sublinear to the size of the input network yang2015defining ; leskovec2012learning . A stream of short random walks is used as a basic tool for extracting information from realworld largescale information networks liu2010link ; perozzi2014deepwalk .The typical random walkbased network representation algorithms, such as DeepWalk perozzi2014deepwalk
, learn sequences of nodes as a stream of short random walks to model the network structures of deep features which obviously are highly dependent on the sliced window that controls random walk learning for node sampling purpose. However, when the distance between two nodes is larger than the sliced window size, the random walk jumps to the next round. Although it could be covered by introducing a vast amount of sampling, the repetitions increase computational complexity. This explains the main reason why the networks with small structure scales are barely applicable for these algorithms. Therefore, the previous works on random walkbased network representations are limited in a domainspecific way so that the performance mainly relies on the network topological quality. Our previous work proposed a framework for transferring structures across largescale information networks (FTLSIN)
IJCNN18 , however only enabled structural knowledge transfer across relational information networks and both networks should have large scales. The cases listed in the beginning of this paper will not be guaranteed satisfying latent feature spaces from the limited network structures for the further machine learning tasks within one domain.To address above problems, we propose a novel algorithm for universal crossdomain network representations (CDNR) with the following contributions.

CDNR offers an effective learning solution for the network representation, where the network doesn’t have enough entities that causes a random walk failure in structural sampling.

CDNR determines the relationships between two independent networks which would belong to irrelevant domains. Similar network patterns are detected so that links generated between the corresponding communities transfer knowledge in CDNR.

CDNR predicts the potential entities for the scarce network structures by employing the crossdomain twolayer random walk (CD2LRandomWalk) framework from IJCNN18 and integrating two novel algorithms, crossdomain twolayer nodescale balance (CD2LNodeBalance) and crossdomain twolayer knowledge transfer (CD2LKnowlTransfer).
The rest of the paper is arranged as follows. In Section 2, related works are summarized. In Section 3, we state the CDNR problem. In Section 4, the proposed CDNR algorithm is explained in detail. In Section 5, two experiments are designed to evaluate the representations on realworld datasets. Our conclusions are presented in Section 6.
2 Related Works
The previously used pernode partition function bandyopadhyay2008counting is expensive to compute, especially for large information networks. To overcome this disadvantage, a series of sampling strategies have been proposed kurant2011towards ; lelis2013predicting to analyze the statistics within local structures, e.g., communities and subnetworks. These approaches are different from traditional representation learning paalanen2006feature ; zhu2015unsupervised ; ding2015deep . The latent feature learning of the network representation captures neighborhood similarity and community membership in topologies yang2015network ; pan2016tri ; tu2016max .
DeepWalk perozzi2014deepwalk
trains a neural language model on the random walks generated by the network structure. After denoting a random walk that starts from a root node, DeepWalk slides a window and maps the central node to its representation. Hierarchical Softmax factors out the probability distributions corresponding to the random walk and the representation function is updated to maximize the probability. DeepWalk has produced promising results in dealing with sparsity in scalable networks, but has relatively high computational complexity for largescale information networks. LINE, Node2Vec and Struc2Vec are the other structurebased network representation algorithms that improve the performance of DeepWalk. LINE
tang2015line preserves both the local network structure and the global network structure by firstorder proximity and secondorder proximity respectively and can be applied to largescale deep network structures that are directed, undirected, weighted and unweighted. Node2Vec grover2016node2vec explores the diverse neighborhoods of nodes in a biased random walk procedure by employing classic search strategies. Struc2Vec ribeiro2017struc2vec encodes structural similarities and generates the structural context for nodes using random walks. The abovementioned works has contributed to network analysis by modeling a stream of short random walks.All the previous works based on random walk to sample networks into a steam of nodes are under a common assumption of powerlaw distribution. The powerlaw distribution exists widely in realworld networks. It is a special degree distribution that follows , where is a node degree and is a positive constant newman2005power . A network that follows the powerlaw distribution is also regarded as a scalefree network with the scale invariance property barabasi2009scale . The social networks, biological networks and citation networks being discussed in this paper are observed to be scalefree in nature barabasi2016network . In  axes, the powerlaw distribution shows a linear trend on the slope ratio of (Figure 4 and Figure 6), which reflects that numerous edges connect small degree nodes and will not change regardless of network scale adamic2000power . It has been observed in perozzi2014deepwalk that if a network follows the powerlaw distribution, the frequency at which a node undertakes in a short random walk will also follow the same distribution. Meanwhile, random walks in powerlaw distribution networks naturally gravitate towards high degree nodes adamic2001search .
In this paper, we propose CDNR which employs biased random walk sampling strategies to learn network structures based on previous works. However, CDNR is different from the deep transfer learning approaches for crossdomain graphstructured data, i.e., context enhanced inductive representation hamilton2017inductive , intrinsic geometric information transfer lee2017transfer and deep inductive graph representation rossi2018deep
. Deep neural networkbased network representation usually need to generalize a small set of base feature for deep learning, such as network statistical properties like node degree, which lost valuable information from networks. The link predictions in
CD2LRandomWalk are therefore leveraged on the powerlaw distribution as well as the distance calculation between the two independent networks across domains. The network that has small distance to the target network is regarded as the source domain. The scale invariance property should theoretically ensure that power lawbased CDNR is robust.3 Problem Statement
,  A source domain and a target domain. 

, ,  A dimensional target domain latent feature spaces, a dimensional feature 
vectors, and the th element of .  
,  A (un)directed unattributed unweighted network from , and a (un)directed 
unattributed weighted network from .  
, , ,  The node set of , the node set of , a node in , and a node in . 
, , ,  The edge set of , the edge set of , an edge between and , and an edge 
between and .  
,  The weight set on , and a weight in on . 
, ,  A form of super graph for , a supernode set, and a superedge set. 
,  A super node in , and a super edge in connecting and . 
,  A weight set, and a weight in on . 
,  The random walk sets on and . 
A shortest path between and over .  
, ,  A set of node degree values on , a set of node degree values on and a set of 
node degree values on .  
,  A link set between and across domains, and a weight set on . 
,  A link in that connects and , and a weight in on . 
The slope ratio of powerlaw distribution.  
The function calculates the node degree.  
The average function on a value set.  
The function counts the number in a set. 
Definition 1 (Domain lu2015transfer ) A domain is denoted as , where X is the feature space and is the marginal probability distribution that .
Definition 2 (Network barabasi2016network ) Let be a given network, where represents one kind of entities known as nodes, represents another kind of entities known as edges reflecting connections between nodes, , and represents the possible weights on .
Definition 3 (Crossdomain Network Representation) Suppose a source domain represented by and a target domain represented by , domains are irrelevant if and ; and relevant if or . CDNR employs structural information on , from to , to improve the target domain representations in a dimensional latent feature space.
To prepare the structural knowledge from the source domain, CDNR firstly implements a maximum likelihood optimization to generate a set of random walks on in the bottom layer of CD2LRandomWalk. A neighborhood is clustered rooted at node by the neighborhood sampling strategy on the biased random walks grover2016node2vec . Then, CD2LRandomWalk constructs links between and . To this end, CD2LNodeBalance balances the scales of and by clustering into super nodes in which share close node degrees with ; and generate links between and . CD2LKnowlTransfer trains the maximized similarities across two domains and determines how much value should be transferred across the shortest paths and , where are formed by the super edges and the values are save in . In CDNR, the representations are learned in the top layer of CD2LRandomWalk and will be evaluated by a standard classification task.
3.1 Bottomlayer Random Walk: Knowledge Preparation
The bottomlayer random walk is designed for knowledge preparation in the source domain. The sampled random walks contains structural knowledge from which will be transferred to the target domain. The bottomlayer random walk introduces a biased random walk strategy to efficiently explore diverse neighborhoods and sample the nodes along the shortest path^{1}^{1}1The shortest path is a path between two nodes for which the sum of its edge weights is minimized.. Suppose a set of random walks , each root node repeats times for sampling and each random walk is set in a length of . In generating a random walk, suppose we are standing at node which is the th node in the random walk, , the node denotes the th node and the th node is chosen from based on a probability where
is the partition function that ensures a normalized distribution
bengio2013representation and is guided by the search bias . To be specific, follows the searching rules: if the length of the shortest path between nodes and is , then ; , if ; and , if . The sampling strategy on the biased random walks is computationally efficient especially for realworld largescale networks.4 Knowledge Transfer in Crossdomain Network Representations
CDNR enables the crossdomain random walkbased network representations and assumes both networks across domains follow the powerlaw distribution. Representations in CDNR work under the Skipgram framework and are optimized by maximum likelihood over biased random walks. The contributions of CDNR are realized in this section by
CD2LRandomWalk with the two components: CD2LNodeBalance and CD2LKnowlTransfer.4.1 Crossdomain Twolayer Nodescale Balance and Link Prediction
By transferring knowledge from an external source domain, CDNR deals with the scenarios that the training sample in the target domain is insufficient to make a good network representation. Such knowledge transfer belongs to a transfer learning task pan2010survey arises two questions: 1) Link prediction: how to construct paths between two networks across domains for CD2LRandomWalk, and 2) CD2LNodeBalance: how to solve the problem of unbalanced node scales.
The unbalancedness between and is reflected on the nodes and also on the connections , where and refer to the node scales, and and refer to the average node degrees^{2}^{2}2The average degree is a mean on the degrees of all nodes in the network.. In this case, CD2LNodeBalance tries to reform into a smaller size based on the network structures of . For the purpose of discovering subgraph patterns wang2017incremental , a concept of super node guo2014super is employed and we define the formation for CDNR.
Definition 4 (Super Node in Source Domain) A super node is a subgraph of the original source network. Denoting the supernode set , a super node consists of a group of nodes and the edges connecting to or from . The nodes that clustered into a have close node degrees.
To cluster a set of nodes in the largescale network, a supernode learning based on the nodes in the target domain is as follows:
(1) 
where is a predicted link between and across domains, and is the weight on which indicates the similarity between and and how much knowledge should be transferred from the source domain to the target domain in CD2LRandomWalk.
CD2LNodeBalance attempts to pair each node with at least one super node in a minimum supernode scale and a maximum likelihood between and according to Eq. (4). For each pair of , we firstly initialize a link and a weight following,
(2) 
(3) 
where denotes the degree of , denotes the degree of , and is initialized on nodes in the same degree.
To optimize in Eq. (4), we analysis the degree ranges over and in and respectively and reorganize including merging and dividing super nodes based on the following three cases.
Denoting the range scales and , there are three possible cases of CD2LNodeBalance as follows and as shown in Figure 2. Degree sets and are always ranked in a decreasing order. A finds the corresponding that are in the same position in and , denoted as .
Case 1: If , only one links to . In this case, CD2LNodeBalance is completed in the initialization stage with and .
Case 2: If , more than one links to . at the current stage is going to be optimized in Eq. (4). If turns to 0, the edge is deleted and the is merged into another super node that linked with and gets the smallest weight.
Case 3: If , there are at least one not linked to any . We add a group of empty super nodes in a number of and evenly insert them into . To fill up the , a few nodes in next to are removed and added to . then is initialized by Eq. (3).
In Case 2 and Case 3, is optimized by maximizing the likelihood between and . Starting from each , a vector weights for each pair of , where , and . If there is link between , ; else wise 0.
(4) 
where is a vector in size of with the value of 0 or 1, which based on in Cases 13. Let in which and are the powerlaw slope ratio of and respectively. controls the range of the likelihood over CD2LNodeBalance, where is a parameter for and is a parameter for . The optimized CD2LNodeBalance results suggest the predicted links where in Case 2 and Case 3.
4.2 Crossdomain Twolayer Knowledge Transfer and Target Domain Edge Evolvement
CD2LKnowlTransfer transfers the knowledge saved in weights through the predicted links . The knowledge includes three parts of weights as shown in Figure 3: a weight on the super edge that reflects the knowledge learning from the random walks in the source domain, two weights on the predicted links, and the original weight on (in this paper is 1 or 0). The CD2LKnowlTransfer follows:
(5) 
where denotes the super graph.
Definition 5 (Super Graph in Source Domain) A super graph reformed from is formed by super nodes , super edges and the super weights on , where . If a random walk belongs to goes through and , there will be an .
(6) 
where is the distance between nodes and in a random walk, and calculates every random walk going over .
In having the three parts of weights that contribute to in CD2LKnowlTransfer, the weight on in the top layer of the CD2LRandomWalk are denoted as:
(7) 
where is the original weight that reflects an edge between and or no edge, is for normalization, is the shortest path between and over , is the length of , denotes an edge that consists in , and is the weight on .
In CD2LKnowlTransfer above, is enriched in network structures by putting extra weights on the original edges and also evolves possible edges.
4.3 Toplayer Random Walk and Network Representations
CDNR represents in the top layer of CD2LRandomWalk after CD2LNodeBalance and CD2LKnowlTransfer. CDNR learns the latent feature space by in the Skipgram framework.
Given a node in the target domain with the window size , we obtain a crossdomain Skipgram for by maximizing the following loglikelihood function of in observing a neighborhood of ,
(8) 
where is learned on that , while is learned on .
In summary, Algorithm 3 of CDNR is formed by CD2LRandomWalk and Toplayer Feature Learning. The main advantage of CDNR is that when the network representation is poor because it lacks structures, the CD2LRandomWalk enables knowledge transfer from external domains and CDNR doesn’t need to rebuild a network representation model. CDNR offers an efficient crossdomain learning with a relatively low computational cost of on CD2LNodeBalance, on CD2LKnowlTransfer and on network representation in line with Node2Vec grover2016node2vec .
5 Experiments
This section evaluates the effectiveness of the CDNR compared to the baseline algorithms of network representations in both singlelabel classifications (Section 5.2) and multilabel classifications (Section 5.3).
5.1 Baseline Algorithms
This experiment evaluates the performance of the unsupervised CDNR on the target networks. The representation outputs are applied to a standard supervised learning task, i.e., linear SVM classification
suykens1999least. The experiments choose a simple classifier because we want to put less emphasis on classifiers in evaluate the network representation performance. The baseline algorithms are chosen from the previous domainspecific network representations and a deep inductive graph representation as follows.

DeepWalk (Perozzi et al. 2014) perozzi2014deepwalk is the first random walkbased network representation algorithm. By choosing DeepWalks, we exclude the matrix factorization approaches which have already been demonstrated to be inferior to DeepWalk.

LINE (Tang et al. 2015) tang2015line learns latent feature representations from largescale information networks by an edgesampling strategy in two separate phases of first and secondorder proximities. We excluded a recent Graph Factorization algorithm ahmed2013distributed because LINE demonstrated better performance in the previous experiment.

Node2Vec (Grover et al. 2016) grover2016node2vec learns continuous feature representations of nodes using a biased random walk procedure to capture the diversity of connectivity patterns observed in networks with the biased parameter which is controlled by parameters of and .

Struc2Vec (Ribeiro et al. 2017) ribeiro2017struc2vec learns node representations from structural identity by constructing a hierarchical graph to encode structural similarities and generating a structural context for nodes.

DeepGL (Rossi et al. 2018) rossi2018deep learns interpretable inductive graph representations by relational functions for each representing feature and achieve inductive transfer learning across networks. It inputs a 3dimensional base features to a CNN and outputs the representation in dimensions where depends on learning.
5.2 Experiment on Singlelabel Dataset
5.2.1 Singlelabel Datasets
Two academic citation networks are selected as the datasets. Both of them are used for the multiclass classification problem galar2011overview . Nodes are denoted as papers in these networks.
Domain  Datasets  Num. of  Num. of  Num. of 

Nodes  Edges  Categories  
Source  DBLP  60,744  52,890  4 
Target  M10  10,310  77,218  10 

DBLP dataset^{3}^{3}3http://arnetminer.org/citation (V4 version is used) (source network) consists of bibliographic data in computer science. Each paper may cite or be cited by other papers, naturally forming a citation network. The network in this dataset abstracts a list of conferences from four research areas, i.e.,
database, data mining, artificial intelligence and computer vision.

CiteSeerM10 dataset^{4}^{4}4http://citeseerx.ist.psu.edu/ (target network) is a subset of CiteSeerX data which consists of scientific publications from 10 distinct research areas, i.e., agriculture, archaeology, biology, computer science, financial economics, industrial engineering, material science, petroleum chemistry, physics and social science.
5.2.2 Experiment Setup
For the evaluations, we randomly partition the dataset in the target domain into two nonoverlapping sets for training and testing by nine groups of training percentages,
. We repeat the above steps 10 times and thus obtain 10 copies of the training data and testing data. The reported experimental results are the average of the 10 runs and their variance.
The parameters of CDNR are set in line with typical values used for DeepWalk perozzi2014deepwalk , LINE tang2015line , Node2Vec grover2016node2vec and Struc2Vec ribeiro2017struc2vec . For networks in both the source domain and the target domain, let the dimensions of feature representation be , the walk length be , the number of walks of every source node be , the window size be , , and the search bias be with and . Let the learning rate start from 0.025 as in tang2015line and the convergence track on 0.1 in our experiment. For Struc2Vec, let OPT1 (reducing the length of degree sequences), OPT2 (reducing the number of pairwise similarity calculations) and OPT3 (reducing the number of layers) all in values of True, and the maximum number of layers be 6. The parameters in CD2LNodeBalance is set as and . In these settings, the total number of random walks over an input network is and the size of the random walks is . For DeepGL rossi2018deep , the operator is chosen from {mean, sum, maximum, Hadamard, Weight , RBF} which gets best results in base feature learning; is set in 1; feature similarity threshold is set in 0.01; maximum depth of layer is set in 10; and convergence for feature diffusion is set in 0.01.
5.2.3 Singlelabel Classification
We use MacroF1 and MicroF1 yang1999re to evaluate classification performances and the results are shown in Table 3. The F1 scores are designed to evaluate the effectiveness of category assignments hsu2002comparison .
(9) 
We use the indicators of true positive (tp), false positive (fp) and false negative (fn) to measure the standard recall () and precision (). In , let and . The Micro_F1 score computes the global binary decisions, where is the number of total test nodes, and is the number of categories of binary labels. In , let and . The Macro_F1 score computes the binary decisions on individual categories and then averages the categories.
Representation Analysis. Figure 5 (a) illustrates the feature spaces of dblp by CDNR bottomlayer random walk (Node2Vec) and Figure 5 (b) illustrates the feature spaces of dblp by CDNR. These two illustrations show almost the same distribution and obtain good mappings in a low dimension compared to PCA (Figure 5 (c)), LLE (Figure 5 (d)) and Laplacian (Figure 5 (e)) based network representations.
Algorithm  10%  20%  30%  40%  50%  60%  70%  80%  90%  

MicroF1  DeepWalk  0.1758  0.1833  0.1897  0.2049  0.2051  0.2216  0.2236  0.2420  0.2431 
0.0086  0.0100  0.0122  0.0126  0.0128  0.0111  0.0170  0.0133  0.0220  
LINE  0.2338  0.2362  0.2623  0.2821  0.3269  0.3244  0.3561  0.3508  0.4128  
0.0102  0.0170  0.0110  0.0141  0.0150  0.0087  0.0193  0.0184  0.0486  
Node2Vec  0.3342  0.4166  0.4714  0.5213  0.5550  0.5843  0.6216  0.6353  0.6535  
0.0099  0.0110  0.0153  0.0127  0.0176  0.0092  0.0215  0.0115  0.0324  
Struc2Vec  0.1742  0.1673  0.1701  0.1622  0.1695  0.1669  0.1685  0.1626  0.1784  
0.0186  0.0194  0.0171  0.0215  0.0145  0.0215  0.0120  0.0188  0.0281  
DeepGL  0.6557  0.6465  0.6739  0.6600  0.6787  0.6724  0.6871  0.6786  0.6911  
0.0187  0.0176  0.0161  0.0197  0.0164  0.0139  0.0184  0.0160  0.0403  
CDNR  0.7507  0.7728  0.8052  0.8245  0.8363  0.8526  0.8587  0.8772  0.8720  
dblp2M10  0.0143  0.0114  0.0154  0.0074  0.0051  0.0116  0.0128  0.0173  0.0179  
MacroF1  DeepWalk  0.2523  0.2667  0.2768  0.2945  0.2935  0.3077  0.3101  0.3294  0.3359 
0.0117  0.0051  0.0072  0.0120  0.0081  0.0086  0.0158  0.0123  0.0220  
LINE  0.3160  0.2984  0.3421  0.3596  0.4070  0.4275  0.4498  0.4277  0.4773  
0.0113  0.0127  0.0144  0.0249  0.0382  0.0548  0.0383  0.0302  0.0486  
Node2Vec  0.4326  0.4748  0.5338  0.5900  0.6092  0.6388  0.6866  0.6981  0.6568  
0.0147  0.0156  0.0153  0.0153  0.0290  0.0314  0.0202  0.0572  0.0261  
Struc2Vec  0.2718  0.2129  0.2220  0.1889  0.2067  0.1871  0.1803  0.1723  0.1749  
0.0126  0.0092  0.0143  0.0147  0.0058  0.0150  0.0098  0.0271  0.0165  
DeepGL  0.6704  0.6003  0.6154  0.5732  0.5848  0.5607  0.5918  0.5910  0.5914  
0.0151  0.0222  0.0247  0.0305  0.0309  0.0264  0.0324  0.0388  0.0407  
CDNR  0.7558  0.6939  0.7269  0.7174  0.7301  0.7540  0.7679  0.7722  0.7745  
dblp2M10  0.0138  0.0168  0.0174  0.0149  0.0256  0.0238  0.0325  0.0609  0.0698 
Effectiveness of search priority in random walks. In Table 3, DeepWalk and Struc2Vec demonstrate worse performance than LINE, Node2Vec and our CDNR, which can be explained by their inability to reuse samples, a feat that can be easily achieved using the random walk. The outstanding performance of Node2Vec among baseline algorithms indicates that the exploration strategy is much better than the uniform random walks learned by DeepWalk and LINE. The parameter of search bias adds flexibility in exploring local neighborhoods prior to the global network. The poor performance of DeepWalk and LINE mainly occurs because the network structure is rather sparse, feature noise, and contains limited information. CDNR performs best on the M10 network, as dblp is also a citation network that naturally share similar network patterns with M10. Such patterns are captured by CDNR and transfered to M10. On average, there are smaller variances in the performance of CDNR on the dblp2M10 learning task.
Importance of information from source domain. Table 3 shows that CDNR outperforms the domainspecific baseline algorithms, which use topological information from the source domain to learn the network representation in the target domain. When a top layer is working base on the CD2LRandomWalk, the information in the source network is transferred to the source network by adjusting the weights on the edges of the target network. This procedure achieves better performance and shows the significance of transferring topological information from the external domains.
5.3 Experiment on Multilabel Datasets
5.3.1 Datasets
Datasets  Network  Num. of  Num. of  Ave.  Num. of  Labels 

Nodes  Edges  Degree  Categories  
Blog3  Social  10,312  333,983  64.776  39  Interests 
Social  4,039  88,234  43.691  10  Groups  
PPI  Biological  3,890  37,845  19.609  50  States 
arXivCitHepPh  Citation  34,546  421,578  24.407  11  Years 
arXivCitHepTh  Citation  27,777  352,807  25.409  11  Years 
We select five realworld largescale networks of different kinds as the experimental datasets, consisting of three online social networks (Blog3, Facebook), two citation networks (arXivCitHepPh, arXivCitHepTh) and one biological network (PPI). All of them are for the multiclass multilabel classification problem. In the online social networks, nodes represent users and the users’ relationships are denoted as edges. In the citation networks, papers are denoted as nodes and edges describe the citations in this experiment. In the biological network, genes are denoted as nodes and edges represent the relationships between the genes.

Blog3 (BlogCatalog3) dataset^{5}^{5}5http://socialcomputing.asu.edu/datasets/BlogCatalog3 is a social blog directory which manages bloggers and their blogs. Both the contact network and selected group membership information is included. The network has 10,312 nodes, 333,983 undirected edges and 39 different labels. Nodes are classified according to the interests of bloggers.

Facebook dataset^{6}^{6}6https://snap.stanford.edu/data/egonetsFacebook.html consists of circles (i.e., friends lists) from Facebook. This dataset contains user profiles as node features, and circles as edge features and ego networks. The network has 4,039 nodes, 88,234 undirected edges and 10 different labels representing groups of users.

PPI (ProteinProtein Interactions) dataset^{7}^{7}7https://downloads.thebiogrid.org/BioGRID is a subgraph of the PPI network for Homo Sapiens, which obtains labels from hallmark gene sets and represents biological states. The network has 3,890 nodes, 76,584 undirected edges and 50 different labels.

arXivCitHepPh (arXiv Highenergy Physics Citation Network) dataset^{8}^{8}8http://snap.stanford.edu/data/citHepPh.html and arXivCitHepTh (arXiv Highenergy Physics Theory Citation Network) dataset^{9}^{9}9http://snap.stanford.edu/data/citHepTh.html are abstracted from the eprint arXiv. arXivCitHepPh covers all the citations within a dataset of 34,546 papers (regarded as nodes) with 421,578 directed edges. arXivCitHepTh covers all the citations within a dataset of 27,777 papers (regarded as nodes) with 352,807 directed edges. If a paper cites paper , the graph contains a directed edge from to . The data consist of papers from the period January 1993 to April 2003, categorized by year.
The networks chosen in the experiment follow the powerlaw distribution adamic2000power , as do the random walks on the networks perozzi2014deepwalk , as shown in Figure 6.
5.3.2 Experiment Setup
Source Domain  Target Domain 

Blog3  PPI 
arXivCitHepTh  PPI 
arXivCitHepPh  PPI 
PPI  
Blog3 
This experiment summarizes the network statistics in Table 4. Node degree reflects the connection capability of the node. A network is selected as a source domain or a target domain follows and . These selections are shown in Table 5. The experiment setup for the multilabel classification evaluation is as same as the setup in the singlelabel dataset experiment.
5.3.3 Multilabel Classification
In the multilabel classification setting, every node is assigned one or more labels from a finite set . In the training phase of the CDNR node feature representations, we observe a fraction of the nodes and all their labels, and predict the labels for the remaining nodes. The multilabel classification in our experiment inputs the network representations to a oneagainstall linear SVM classifier hsu2002comparison . We use the F1 score of MacroF1 and MicroF1 to compare performance yang1999re in Tables 69.
Algorithm  10%  20%  30%  40%  50%  60%  70%  80%  90% 
DeepWalk  0.2849  0.2854  0.2845  0.2803  0.2725  0.2736  0.2629  0.2778  0.2621 
0.0181  0.0116  0.0193  0.0170  0.0168  0.0200  0.0241  0.0215  0.0344  
LINE  0.2900  0.2772  0.2807  0.2715  0.2702  0.2649  0.2710  0.2494  0.2398 
0.0062  0.0077  0.0083  0.0104  0.0113  0.0166  0.0163  0.0251  0.0195  
Node2Vec  0.3073  0.2955  0.3024  0.3028  0.3028  0.2995  0.3021  0.2967  0.3005 
0.0171  0.0104  0.0139  0.0120  0.0102  0.0186  0.0288  0.0197  0.0283  
Struc2Vec  0.2693  0.2713  0.2696  0.2515  0.2603  0.2499  0.2493  0.2419  0.2338 
0.0228  0.0187  0.0188  0.0187  0.0133  0.0212  0.0148  0.0156  0.0287  
DeepGL  0.3055  0.3063  0.3028  0.2947  0.2987  0.2975  0.2911  0.2890  0.2764 
0.0062  0.0083  0.0083  0.0054  0.0063  0.0128  0.0128  0.0180  0.0178  
CDNR  0.3386  0.3390  0.3423  0.3420  0.3404  0.3414  0.3350  0.3371  0.3312 
Blog3  0.0062  0.0086  0.0061  0.0072  0.0079  0.0073  0.0125  0.0199  0.0238 
2PPI  
CDNR  
arXivCit  0.3412  0.3425  0.3410  0.3431  0.3460  0.3474  0.3431  0.3429  0.3330 
HepPh  0.0041  0.0075  0.0052  0.0057  0.0069  0.0112  0.0103  0.0107  0.0192 
2PPI  
CDNR  
arXivCit  0.3420  0.3426  0.3434  0.3462  0.3441  0.3553  0.3450  0.3457  0.3521 
HepTh  0.0036  0.0057  0.0044  0.0049  0.0042  0.0101  0.0106  0.0150  0.0260 
2PPI  
CDNR  0.3415  0.3442  0.3454  0.3448  0.3468  0.3410  0.3447  0.3443  0.3444 
0.0053  0.0035  0.0035  0.0066  0.0065  0.0102  0.0119  0.0151  0.0189  
2PPI 
Algorithm  10%  20%  30%  40%  50%  60%  70%  80%  90% 
DeepWalk  0.3416  0.3378  0.3364  0.3406  0.3306  0.3336  0.2949  0.2825  0.2041 
0.0140  0.0138  0.0208  0.0171  0.0159  0.0241  0.0288  0.0185  0.0386  
LINE  0.3058  0.3003  0.3008  0.2940  0.2868  0.2826  0.2733  0.2462  0.1822 
0.0094  0.0113  0.0069  0.0120  0.0138  0.0176  0.0173  0.0262  0.0198  
Node2Vec  0.3490  0.3442  0.3510  0.3500  0.3432  0.3414  0.3274  0.3006  0.2310 
0.0193  0.0141  0.0205  0.0126  0.0140  0.0201  0.0240  0.0248  0.0385  
Struc2Vec  0.2892  0.2926  0.3019  0.2784  0.2851  0.2626  0.2589  0.2399  0.1712 
0.0197  0.0232  0.0227  0.0267  0.0152  0.0202  0.0177  0.0287  0.0262  
DeepGL  0.3213  0.3290  0.3235  0.3155  0.3136  0.3086  0.2970  0.2783  0.2115 
0.0065  0.0072  0.0107  0.0067  0.0095  0.0117  0.0147  0.0220  0.0132  
CDNR  0.3519  0.3551  0.3539  0.3514  0.3469  0.3431  0.3282  0.3063  0.2389 
Blog3  0.0108  0.0105  0.0073  0.0060  0.0097  0.0060  0.0154  0.0223  0.0276 
2PPI  
CDNR  
arXivCit  0.3532  0.3582  0.3536  0.3509  0.3531  0.3456  0.3360  0.3130  0.2512 
HepPh  0.0106  0.0099  0.0085  0.0057  0.0098  0.0085  0.0128  0.0144  0.0270 
2PPI  
CDNR  
arXivCit  0.3570  0.3575  0.3568  0.3565  0.3523  0.3556  0.3368  0.3234  0.2682 
HepTh  0.0079  0.0091  0.0044  0.0091  0.0090  0.0137  0.0150  0.0249  0.0330 
2PPI  
CDNR  0.3576  0.3595  0.3574  0.3553  0.3573  0.3432  0.3423  0.3164  0.2578 
0.0086  0.0065  0.0052  0.0068  0.0080  0.0079  0.0145  0.0164  0.0217  
2PPI 
Algorithm  10%  20%  30%  40%  50%  60%  70%  80%  90% 
DeepWalk  0.8078  0.8727  0.8933  0.9050  0.9153  0.9198  0.9307  0.9301  0.9334 
0.0449  0.0177  0.0062  0.0059  0.0060  0.0061  0.0039  0.0103  0.0175  
LINE  0.4627  0.4654  0.4719  0.4739  0.4765  0.4761  0.4760  0.4787  0.4755 
0.0026  0.0104  0.0026  0.0035  0.0035  0.0033  0.0067  0.0066  0.0075  
Node2Vec  0.9352  0.9401  0.9398  0.9419  0.9442  0.9454  0.9468  0.9466  0.9502 
0.0072  0.0032  0.0051  0.0047  0.0057  0.0063  0.0092  0.0079  0.0098  
Struc2Vec  0.4152  0.4521  0.4716  0.4994  0.5161  0.5381  0.5461  0.5639  0.5530 
0.0237  0.0144  0.0061  0.0059  0.0078  0.0096  0.0115  0.0241  0.0175  
DeepGL  0.9535  0.9483  0.9531  0.9515  0.9552  0.9546  0.9555  0.9612  0.9640 
0.0161  0.0114  0.0059  0.0099  0.0087  0.0087  0.0039  0.0089  0.0072  
CDNR  0.9584  0.9550  0.9561  0.9565  0.9568  0.9617  0.9606  0.9627  0.9623 
Blog3  0.0038  0.0036  0.0049  0.0044  0.0032  0.0050  0.0042  0.0067  0.0085 
2Facebook 
Algorithm  10%  20%  30%  40%  50%  60%  70%  80%  90% 
DeepWalk  0.7655  0.7915  0.7858  0.8052  0.7902  0.8138  0.8213  0.7678  0.7822 
0.0185  0.0242  0.0331  0.0308  0.0306  0.0327  0.0504  0.0317  0.0378  
LINE  0.5063  0.5040  0.5083  0.5129  0.5091  0.5040  0.5020  0.4981  0.4961 
0.0053  0.0189  0.0093  0.0061  0.0092  0.0077  0.0137  0.0117  0.0109  
Node2Vec  0.8310  0.8331  0.8206  0.8373  0.8343  0.8214  0.8192  0.8018  0.8104 
0.0256  0.0226  0.0262  0.0359  0.0354  0.0479  0.0487  0.0277  0.0498  
Struc2Vec  0.3701  0.3937  0.3926  0.4160  0.4377  0.4525  0.4532  0.4755  0.4583 
0.0156  0.0157  0.0174  0.0155  0.0235  0.0131  0.0144  0.0260  0.0347  
DeepGL  0.8810  0.8660  0.8724  0.8748  0.8794  0.8856  0.8578  0.8732  0.8757 
0.0330  0.0328  0.0381  0.0343  0.0395  0.0313  0.0350  0.0514  0.0537  
CDNR  0.8749  0.8831  0.8866  0.8766  0.8890  0.8876  0.8910  0.8443  0.8415 
Blog3  0.0301  0.0135  0.0304  0.0360  0.0291  0.0294  0.0334  0.0468  0.0482 
2Facebook 
Experimental results from the algorithmic perspective. A general observation drawn from the results is that the learned feature representations from other networks improve or maintain performance compared to the domainspecific network representation baseline algorithms. CDNR outperforms DeepWalk, LINE, Node2Vec, Struc2Vec and DeepGL in all datasets with a gain of 19.29%, 49.57%, 15.66%, 58.83% and 10.06%, respectively. CDNR outperforms DeepWalk, LINE, Node2Vec and Struc2Vec on the PPI dataset and the Facebook dataset in 100% of the experiment, and outperforms DeepGL on the PPI dataset in 100% and the Facebook dataset in 88.89% of the experiment. The losses of CDNR to DeepGL on the training percentages of {80%,90%} might caused by classifier and training sample selection and NNbased DeepGL shows robustness than other algorithms.
Experimental results from the dataset perspective. The general results on the PPI dataset (Tables 6 and 7) reflect the difficulty of crossdomain learning. Considering the domain similarities, a crossdomain adaption from either the social networks or the citation networks to the biological network as shown in our experiment would not be recommended in transfer learning. However, CDNR is capable of capturing useful structural information from network topologies and removing noise from the source domain networks in an unsupervised featurelearning environment, so CDNR on PPI still shows a slight improvement and almost retains its representation performances. Therefore, crossdomain network knowledge transfer learning works in unsupervised network representations. CDNR is less influenced by domain selections when the transferable knowledge is mainly contributed by network topologies.
Examining the results in detail shows that the source domain networks of arXivCitHepTh and Facebook provide a larger volume of information to the PPI target domain network than other pairs of CDNR experiments, which promote knowledge transfer across domains. The citation networks of arXivCitHepPh and arXivCitHepTh transfer 11 categories of Years to PPI (biological network, 50 categories of States, network average degree of 19.609) with a network average degree of 24.407 and 25.409 respectively. The social networks of Blog3 and Facebook transfers 39 categories of Interests with the network average degree of 64.776 and 43.691 respectively. The show that unsupervised CDNR works especially well in dense networks, however, domains share rare natural similarities still can’t guarantee a good knowledge transfer (Blog32PPI: Interests to States).
In addition, the general results on the Facebook dataset (Tables 9 and 9) show promising improvements by CDNR compared to other baseline algorithms. Unsupervised representations of CDNR allow learning from small categories to large categories, and in a heterogeneous label space. CDNR uses its CD2LRandomWalk learning algorithm to capture the useful topologies in a largescale information network.
5.4 Statistical Significance
CDNRdblp2M10  CDNRBlog32PPI  
DeepWalk  LINE  Node2Vec  Struc2Vec  DeepGL  DeepWalk  LINE  Node2Vec  Struc2Vec  DeepGL  
MicroF1  3.78E13  7.29E12  7.39E07  8.13E11  5.47E07  3.66E09  2.73E07  1.03E08  1.88E08  8.98E08 
MacroF1  1.04E11  1.15E08  4.31E04  6.22E10  5.07E06  2.92E04  3.17E10 
Comments
There are no comments yet.