Deep Feature Learning of Multi-Network Topology for Node Classification

09/07/2018 ∙ by Hansheng Xue, et al. ∙ 0

Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has been proved to be helpful on tasks of network analysis, especially node classification. For many real-world systems, multiple types of relations are naturally represented by multiple networks. However, existing network embedding methods mainly focus on single network embedding and neglect the information shared among different networks. In this paper, we propose a novel multiple network embedding method based on semisupervised autoencoder, named DeepMNE, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account. We evaluate DeepMNE on the task of node classification with two real-world datasets. The experimental results demonstrate the superior performance of our method over four state-of-the-art algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Networks are powerful sources for describing and modeling complex systems. Mining knowledge from networks has become a popular yet challenging area. Many researchers have started to focus on this area. One of the most important tasks in network analysis is node classification. In a typical node classification task, the aim is to predict the most probable labels for nodes in a given network 

(Tsoumakas and Katakis, 2007). For example, in a protein-protein interaction network, the aim is to predict functional labels of proteins (Radivojac et al., 2013).

A batch of informative features is required and important for a supervised machine learning method 

(Grover and Leskovec, 2016)

. Node classification problem is always solved as a supervised machine learning problem. Therefore, a feature vector representation for nodes in the network should be appropriately constructed. Recently, learning the feature representation of a node based on its neighbors and structural information of the network has become one of the most active areas 

(Grover and Leskovec, 2016; Perozzi, Al-Rfou, and Skiena, 2014; Zhang et al., 2015). Learning low-dimensional feature representation of nodes is also termed as , which has recently attracted lots of attentions (Goodfellow, Bengio, and Courville, 2016). Besides node classification (Jian, Li, and Liu, 2018), the outputs of network embedding have also been widely used on many important tasks, such as link prediction (Li et al., 2018) and community detection (Yang, Mcauley, and Leskovec, 2013).

For many real-world systems, multiple types of relations are naturally represented by multiple networks. For example, in social network, the relations between people include friendship relation, money transferring relation, colleague relation and so on. Since multiple networks can describe the real-world systems better in many cases, multi-networks analysis have attracted lots of attention in network science community recently. Unfortunately, most existing network embedding methods focus on single network, and few approaches concentrate on learning node representation based on multiple networks. Therefore, it is urgent to develop an algorithm for learning the feature representation of a node by integrating multi-networks appropriately.

One simple solution for multi-network embedding is to summarize multiple networks into a single network and apply the single-network embedding method on the integrated network. Several multi-networks integration methods have been proposed, such as probabilistic methods (Franceschini et al., 2013), kernel-based methods (Yu et al., 2015) or weighted averaging or summing (Mostafavi et al., 2008). However, this type of integration methods often result in information loss problem when integrating multiple networks into a single one (Tsuda, Shin, and Schölkopf, 2005; Lanckriet et al., 2004)

. Some approaches try to train individual classifiers on different networks and combine these predictions to a final result using ensemble learning methods 

(Yan et al., 2010). However, these methods consider different networks as independent ones, ignoring the correlation between different networks. In addition, such methods often suffer from learning time and memory constraints (Gligorijevic, Barot, and Bonneau, 2017).

In multi-network embedding, multiple networks represent different types of relations among the same set of nodes (representing person, commodity, gene and so on). There may be potential correlations between different networks. For multi-networks embedding, one of the most challenging task is how to consider the correlation between different networks. To address this problem, we try to model the correlation between different networks during the feature learning process.

Autoencoder (Rumelhart, Hinton, and Williams, 1986; Baldi, 2011)

is a typical unsupervised deep learning model, which aims to learning a new encoding representation of input data. It has been proved that autoencoder can solve these non-linear feature learning problems effectively. However, existing autoencoder-based methods are not designed for learning multi-network topological features. To benefit from the feature learning power of autoencoder and consider the correlation between multiple networks, we propose a novel multi-network-based feature learning algorithm, named DeepMNE. Considering correlation between multiple networks, DeepMNE applies stacked semi-autoencoder to map input multi-networks into a low-dimension and non-linear space. Here are the major contributions:

  • We propose a novel semi-supervised autoencoder model for learning feature representations of nodes based on multiple networks.

  • To consider the correlation between different networks, we design a communication mechanism among multiple autoencoders corresponding to multiple networks.

  • We empirically evaluate DeepMNE for multi-label classification on two tasks of gene function prediction. The experimental results show that DeepMNE outperforms the existing state-of-the-art methods.

Related Work

Multi-network Embedding

As an extension of single network embedding, multi-network embedding aims to represent nodes using low-dimensional topological information from multi-networks. Current network embedding approaches mainly focus on single-network embedding and utilize topological structure information to represent nodes. For instance, DeepWalk (Perozzi, Al-Rfou, and Skiena, 2014) treats nodes as words and generates short random walks as sentences. Then, it uses Skip-gram, a word representation learning model, on these random walks to represent nodes of networks. Similar, node2vec (Grover and Leskovec, 2016) utilizes a biased random walking procedure to learn topological information, and it uses negative sampling to optimize the Skip-gram model. DNGR (Cao, Lu, and Xu, 2016)

is a novel method which uses random surfing to learn topological information and applies stacked denoising autoencoder to generate low-dimensional node representation. In general, all these methods focus on single-network representation learning, and few can apply on multi-networks directly.

Besides, some multi-network integration methods have been proposed in biological networks area. Mashup (Cho, Berger, and Jian, 2016) is an integrative framework for learning low-dimensional feature representations of genes form multiple networks constructed from heterogeneous data sources. Similarity Network Fusion (Wang et al., 2014) is a widely used networks integration method, which constructs networks for each available data type and then efficiently fuses these networks into one. In addition, there are some other multi-network integration methods, such as Diffusion State Distance (Cao et al., 2014) and Collective Matrix Factorization (Žitnik et al., 2015). However, these methods are linear and shallow approaches which cannot capture complex and highly non-linear structure across all networks.

Gene function prediction

Accurate annotation of gene function is one of the most important and challenging problems in biological area. Predicting gene function aims to assign an unknown gene to the correct functional categories in the annotation database, such as Gene Ontology. To solve this problem, lots of methods based on different types of biological information have been proposed, such as amino acid sequence-based method (Clark and Radivojac, 2011), protein structure-based method (Pal and Eisenberg, 2005) and gene expression-based method (Huttenhower et al., 2006). With the improvement of experimental methods, functional associations between genes or proteins are often represented in terms of networks, such as gene co-expression networks and protein-protein interaction networks. Several network-based gene or protein function prediction methods have been proposed (Lehtinen et al., 2015; Roded, Igor, and Ron, 2007). Multi-networks-based function predictions have been proved better than those methods based on single data source (Re and Valentini, 2010; Cozzetto et al., 2013), because of the complementary nature of different data sources. Thus, lots of algorithms have been proposed for gene function prediction by integrating multiple biological networks (Cho, Berger, and Jian, 2016; Sara and Quaid, 2010; Cao et al., 2014).

Our Proposed Approach

Multiple-network embedding can be formulated as a semi-supervised feature learning problem. In this part, we propose a novel semi-supervised autoencoder, termed as DeepMNE, to learn the node representation based on multi-networks.

Let be a set of nodes . Let be a set if edges between pairs of -nodes . Given networks that include the same set of nodes but different connectivity between nodes, labeled as , a network , each network is represented as , where . Our aim is to learn a low-dimension feature representation for each based on the topological information contained in . DeepMNE contains two main components: obtaining global structure information of each network; learning feature representation of nodes by considering both topology of multiple networks and their correlation.

Step 1. Obtaining global structure information using RWR

It has been proved that random walk with restart (RWR) could capture global associations between nodes in a network (Cho, Berger, and Jian, 2016). Instead of inputting adjacency matrices into DeepMNE directly, we run RWR on each network to capture single network topological information and convert it into feature representations of nodes. The adjacency matrix only describes the relationships between any directly connected nodes, ignoring the global structure of a network. RWR can overcome this drawback, and represent nodes using these high-dimensional network structural information. Besides, we choose the RWR method instead of other recently proposed network embedding methods, such as node2vec (Grover and Leskovec, 2016) and DeepWalk (Perozzi, Al-Rfou, and Skiena, 2014), to capture the topological information, because these methods are computationally intense and require additional hyper-parameter fitting (Gligorijevic, Barot, and Bonneau, 2017).

Let denote the adjacency matrix of a network . The RWR from node can be described as the following recurrence relation.

(1)

where is the restart probability, which balances the effect of local and global topological information in the network; is a n-dimensional distribution vector with and , ; is a n-dimensional distribution (column) vector in which each entry holds the probability of a node being visited after steps in the random walk, starting from node ; is the transition probability matrix, and each entry , which stores the probability from node to node , can be calculated as . Based on RWR, we can obtain a matrix , in which is the relevance score between node and defined by RWR-based steady state probabilities.

Figure 1: The structure of DeepMNE algorithm. The whole process mainly contains two parts, obtaining topological information learning and learning multi-network-based features. The output of DeepMNE could feed to the following machine learning model. We firstly run random walk with restart (RWR) to learn global structure of networks. Then, constraints extraction and application with semi-supervised autoencoder are iteratively implemented on DeepMNE algorithm to integrate multi-networks. After obtaining the integrated representations of multi-networks, we can train machine learning model based on the outputs of DeepMNE to classify nodes.

Step 2. Multi-network embedding with semiAE

In this section, we propose a novel multi-network embedding algorithm, termed as DeepMNE. The main framework is a DNN structure with autoencoder (AE) and Semi-Supervised autoencoder (semiAE) as its building block. The whole process includes two parts: constraints extraction and constraints application. We use constraints to capture the correlation between different networks. Given several networks with same nodes, the input of this step is calculated based on RWR. The main framework is shown in Figure 1LABEL:fig:01.

The first layer of DeepMNE framework is the original autoencoder, which is used for feature extraction and dimension reduction. Starting from the second layer, a revised autoencoder (semiAE) is used for constraint integration and dimension reduction. The dimension of input networks decreases constantly with the extension of the whole iteration model.

Prior Constraints Extraction

The idea of constraints comes from semi-supervised clustering. The pairwise constraints can be typically formatted as must-link and cannot-link constraints (Basu, Bilenko, and Mooney, 2004). The pairwise constraints can be described as follows: a must-link constraint indicates that these nodes are highly similar or belong to the same cluster, while a cannot-link constraint indicates that two points in the pair are highly dissimilar or belong to different clusters.

Given pairs of nodes, we use two strategies to extract constraints. One is to calculate and sort pairwise pearson correlation coefficient (PCC) of all pairs of nodes based on their feature vectors. The top- pairs are considered as the must-link constraints and the bottom- pairs are considered as the cannot-link constraints. The other is to set two thresholds for must-link and cannot-link, labeled as and respectively. In detail, a pairs can be adopted as a must-link constraint if its PCC value is larger than , and a pair is considered as cannot-link constraint if the PCC value is smaller than .

After extracting the constraints from the previous layer ( layer), we can apply the must-link and cannot-link constraints to the next layer (+1 layer) as the prior information.

Novel autoencoder with constraints

The key question of DeepMNE is how to integrate prior constraints into the network representation through autoencoder. We revise the original autoencoder and propose a novel variant of autoencoder, termed as Semi-Supervised AutoEncoder (semiAE). Starting from the second layer, the input includes both low-dimensional representations and constrains from previous layer. It is noted that constraints from previous layers’ building blocks are based on different networks. Therefore, constraints may be conflicting. To solve this problem, we would merge these constraints and take the intersection of all the constraints as the input of semiAE.

Autoencoder is an unsupervised model which is composed of two parts, i.e. the encoder and decoder. The encoder transform the high-dimensional data into a low-dimensional code, and a similar ”decoder” network to recover the data from the low dimensional code. The low-dimensional code is then used as a compressed representation of the original data. Let

be the -th input vector or node representation of network, and and be the activations of the hidden layer and the output layer respectively. We have and , where are the parameters to be learned, and

are the non-linear operators such as the sigmoid function (

) or tanh function (). Then the optimization goal is to minimize the reconstruction error between the original data and the reconstructed data from the new representation .

(2)

The original autoencoder cannot model the constraints obtained from previous layers. We propose semiAE to take these constraints into account. Let be a set of must-link pairwise constraints where implies the strong association between and . Let be a set of cannot-link pairwise constraints where implies and are unrelated. The number of constraints is much less than the size of the network .

The hypothesis is that and should also close based on the low-dimensional space if there is a must-link constraint between them in previous layer. Ideally, after encoding, two must-link nodes should be closer, and two cannot-link nodes may be more distant. Mathematically, let be the error score (difference) between and in the encoded space. For Must-link, should be larger than ; for Cannot-link, should be smaller than .

If the pair

is a must-link constraint, we add a penalty on the loss function. Similarity, if the pair

is a cannot-link constraint, we add a reward on the loss function. The loss function for modeling constraints is defined as follows:

(3)

where matrix and are set of must-link and cannot-link constraints respectively; and are the hidden layer representation of input feature vectors and that are from previous layer; and are the weight coefficient, controlling the influence of penalty and reward respectively.

To combine constraints with autoencoder, we propose a novel semi-supervised autoencoder, which integrates Eq. (2) and Eq. (3) and joint minimizes the following objective function:

(4)

The first part of Equation 4 measures the squared error between input and output node features, and the second part measures error score of constraints in hidden layer.

The DeepMNE multi-networks integration algorithm

To optimize the aforementioned model, the goal is to minimize the loss function Eq. (4). In detail, the key step is to calculate the partial derivative of . And the loss function of can be rephrased as follows:

(5)

where , is a diagonal matrix, = . And is similar as . is the simplified representation of hidden layer. Thus, can be translated as:

(6)

where

is activation function(i.e. sigmoid), and we can obtain the partial derivatives of

. Thus, with an initialization of the parameters, the novel semiAE can be optimized by using stochastic gradient descent (SGD).

The pseudocode for DeepMNE is given in Algorithm 1.

1:Multi-networks with , the number of iteration , the percentage of constraints , initialization parameters;
2:Feature representation of nodes in ;
3:Run Random Walk with Restart on multi-networks ;
4:Train AutoEncoder to obtain novel feature representations of nodes in and extract initial must-link, cannot-link constraints , ;
5:for all  do
6:     for all  do
7:         , = Merge constraints from other networks , ;
8:          = Train semi-AutoEncoder with initial parameters on to optimize Eq. (4);
9:         , = Extract must-link and cannot-link constraints based on ;
10:     end for
11:end for
12:return Feature representation of nodes in
Algorithm 1 The DeepMNE algorithm

In the first phase of DeepMNE algorithm, we run random walk with restart algorithm to learn global structure of single biological network. Then, we train autoencoder to learn low-dimensional feature and extract prior constraints based on the network representation of hidden layer. In the main phase, DeepMNE algorithm use an iterative model to train semi-supervised autoencoder with prior constraints. In each iteration, DeepMNE mainly contains three steps, which are merging constraints, training semi-autoencoder and extracting novel constraints. With the increasing of iterations, the model trend to converge and the constraints trends to be unchanged. Finally, DeepMNE generates several low-dimensional feature representations of nodes.

The DeepMNE algorithm is a scalable framework model, its training complexity is linear to the number of vertexes . The part of extracting constraints need to calculate pair-wise PCC value which requires . Therefore, the training complexity of DeepMNE algorithm is , where is the number of iteration and is the number of multiple-networks.

Experiments

In order to evaluate the performance of DeepMNE, we test our method on a task of gene function prediction. Gene function prediction is a multi-label classification problem, which aims to assign unknown genes to the correct functional categories in the annotation database (Cho, Berger, and Jian, 2016).

We compare our model with the four state-of-the-art network embedding methods (Mashup (Cho, Berger, and Jian, 2016), SNF  (Wang et al., 2014), node2vec (Grover and Leskovec, 2016) and DeepWalk (Perozzi, Al-Rfou, and Skiena, 2014)

) and apply the integrated outputs on the task of gene function prediction using support vector machine. We adopt accuracy, micro-averaged F1, micro-averaged area under precision-recall curve (micro-AUPRC) and micro-averaged area under receiver operating characteristic curve (micro-AUROC) as evaluation metrics. We adopt 5-fold cross-validation to evaluate the performance.

Datasets

In the experimental part, we evaluate the performance of our method on datasets of Yeast and Human respectively, which collected from the STRING database v9.1 (Franceschini et al., 2013).

  • Yeast - consisted of six networks over 6,400 genes. The detailed number of edges are listed on the Table 1, where each edge represents the probability of edge presence and the weight between 0 and 1. The functional labels are obtained from Munich Information Center for Protein Sequences (Ruepp et al., 2004). The functional categories in MIPS are organized in a three-layered hierarchy, and the number of functional categories in each layer are listed on the Table 2.

  • Human - consisted of six networks over 18,362 genes with the number of edges varying from 3,760 to 1,576,332, and the value of every edge are between 0 and 1 (see Table 1). The functional labels are downloaded from the Gene Ontology database (Ashburner et al., 2000). We group the GO terms for human to obtain three distinct levels of functional categories for different specificities. The details are listed in Table 2.

Network type Yeast Human
# edges # edges
co-expression 314,013 98.129 1,576,332 171.695
cooccurence 2,664 0.833 36,128 3.935
database 33,486 10.464 319,004 34.746
experimental 219,995 68.748 618,574 67.375
fusion 1,361 0.425 3,760 0.410
neighborhood 45,610 14.253 104,958 11.432
Table 1: Statistics of datasets. The number of edges in different networks and the average degree of nodes .
Dataset # numbers of labels
Yeast level-1 17
level-2 74
level-3 154
Human BP CC MF
11-30 262 82 153
31-100 100 46 72
101-300 28 20 18
Table 2: Function statistics. The whole dataset contains twelve mini-datasets with different numbers of functions.

Parameter Settings

The parameters vary with different datasets. The dimension of each layer on multi-networks integration framework (DeepMNE) is listed in Table 3.

In our model, we used restart probability of 0.5 for RWR, which is same as Mashup. The final dimension of network representation are 500 and 800 respectively. The whole DeepMNE-based multi-networks integration algorithm is optimized by using stochastic gradient descent (Bottou, 1991)

. The batch size is 128, the initial learning rate is 0.1 for yeast and 0.2 for human, and the epochs are 200 and 400 respectively. For SNF, we generate an emsemble network and we run singular value decomposition to learn low-dimensional feature representation, and the dimension is same with our model. For node2vec and DeepWalk, we all use the default parameters. For all compared algorithms, we use SVM as the classifier to predict the function labels of genes.

Datasets #nodes in each layer
Yeast [6400-5220-4040-2860-1680-500]
Human [18362-9181-4580-2295-1148-800]
Table 3: Neural Network Structures

Experimental Results on Yeast

The gene function prediction mainly contains three parts: RWR-based global structure caption of single network, low-dimensional feature learning, and SVM-based gene function prediction. In this section, we apply all compared approaches to predict functions of yeast genes based on six networks. All approaches are tested on three tasks corresponding to function labels at different levels (level 1 with 17 categories, level 2 with 74 categories and level 3 with 154 categories). The functional classification at level 1 is more general than level 2 and level 3. Similarly, The functional classification at level 2 is more general than level 3.

Figure 2: Performance comparison of different metrics on the task of predicting functional labels for yeast genes.

Overall, comparing DeepMNE with four other approaches, DeepMNE can achieve better performance on yeast dataset at all three functional classification levels. At level 1, DeepMNE can achieve the highest ACC score that is 0.8378 (on average), in constract to 0.8063 for Mashup and 0.6734 for SNF. Besides, the micro-F1 score of DeepMNE algorithm is 0.7096, which is also higher than other four methods. The micro-average AUPRC and AUROC achieved by DeepMNE on level 1 of yeast are 0.7405 and 0.9100 respectively, which are significantly higher than the scores of Mashup and SNF (see Figure 2). The performance ranks of the four compared approaches are different at different levels. For example, Mashup is the second best approach at level 1, but node2vec goes to the second place at level 3. However, DeepMNE consistently achieve the best performance at different levels of labels.

Experimental Results on Human

For further evaluation, we also apply DeepMNE on human dataset to investigate its performance. Instead of RWR-based global structure information, we use the original adjacent matrix as the input of semi-autoencoder directly in this experiment. We test both types of input. Adjacent matrix can achieve better performance in this dataset, since human gene networks includes 18,362 nodes, which may be too large to capture the global structure. As described on the previous section, nodes in human gene networks have three types of labels, termed biological process (BP) , cellular component (CC) and molecular function (MF), corresponding to three respects of gene description. Each type of labels are grouped to three levels, annotating 11-30, 31-100 and 101-300 genes respectively. Thus, we can obtain nine distinct mini datasets. We test all approaches on these datasets.

Figure 3: Performance comparison of different metrics on the task of predicting cellular component labels for human genes.

Overall, DeepMNE also achieve the highest performance on human dataset(see Figure  3). The accuracy of DeepMNE on human CC-11-30 is 0.5779, which is higher than Mashup, SNF, node2vec and DeepWalk (0.5644, 0.3167, 0.5480 and 0.5181 respectively). DeepMNE still achieves the highest micro-F1 (0.3392), which is sightly higher than the other four methods (Mashup, SNF, node2vec, DeepWalk are 0.3388, 0.2176, 0.3383 and 0.3309 respectively). The AUPRC values of DeepMNE implemented on three categories of human CC are 0.5430, 0.5418 and 0.5246, which are all higher than other four methods (0.5284, 0.5291 and 0.5202 for Mashup, 0.1852, 0.2588 and 0.3141 for SNF, 0.5272, 0.5242 and 0.5128 for node2vec, 0.4971, 4891 and 0.5043 for DeepWalk). Besides, the AUROC values of DeepMNE (0.8796, 0.8533 and 0.8350 respectively) are all significantly higher than Mashup (0.8489, 0.8304, 0.8148), SNF (0.7602, 0.7623, 0.7633), node2vec (0.8675, 0.8376, 0.8227) and DeepWalk (0.8644, 0.8493, 0.8254).

The experimental results of Biological Process and Molecular Function categories are listed on Table 4. It is shown that DeepMNE outperforms other methods on MF category of human dataset and also achieves good performance on BP categories.

Parameters Analysis

Biological Process Molecular Function
Acc AUPRC Acc AUPRC
Mashup 0.3825 0.2318 0.4486 0.3836
SNF 0.1972 0.0602 0.3006 0.1662
11-30 node2vec 0.3647 0.2248 0.4482 0.3742
DeepWalk 0.3613 0.2224 0.4466 0.3762
DeepMNE 0.3672 0.1910 0.4751 0.3897
Mashup 0.4113 0.2587 0.4717 0.3666
SNF 0.2376 0.0892 0.2689 0.1546
31-100 node2vec 0.3812 0.2454 0.4355 0.3456
DeepWalk 0.3852 0.2477 0.4488 0.3654
DeepMNE 0.4129 0.2459 0.4936 0.4002
Mashup 0.4809 0.3795 0.5761 0.5236
SNF 0.3374 0.2023 0.4248 0.3473
101-300 node2vec 0.4721 0.3740 0.3782 0.4959
DeepWalk 0.4802 0.3836 0.5365 0.5011
DeepMNE 0.4824 0.3692 0.5882 0.5406
Table 4: The Accuracy, AUPRC of DeepMNE on gene function prediction on human dataset.

To evaluate the effect of restart probability to DeepMNE, we re-run DeepMNE with different numbers of dimensions for function prediction on yeast dataset and fix other parameters. Figure 4(a) shows that performance of DeepMNE is stable over a wide range of number of dimensions. In addition, we also test the robustness of DeepMNE to the restart probabilities by varying restart probabilities and fixing other parameters. From the results, we can find that the performance of DeepMNE is stable on different restart probabilities.

Figure 4: The AUPRC and AUROC score of DeepMNE with different restart probabilities and numbers of dimensions for function prediction on yeast dataset.

The number of integrated layers may impact the performance of DeepMNE.We tested our method on yeast dataset with different number of layers. DeepMNE can achieves the highest performance when the number of layers is 5 (see Table 5).

Layer-3 Layer-5 Layer-7 Layer-10
Accuracy 0.8203 0.8248 0.8086 0.8095
micro-F1 0.5977 0.5994 0.5888 0.5906
AUPRC 0.7366 0.7452 0.7358 0.7346
AUROC 0.9036 0.9097 0.9056 0.9066
Table 5: Results with different numbers of layers

Conclusions

Network Embedding, aiming to learn non-linear and low-dimensional feature representation of nodes in networks, has achieved a huge success on many tasks, such as node classification and link prediction. However, current network embedding methods mainly focus on single-network embedding, and few approaches try to learn multi-networks topological information. In this paper, we propose a novel multi-networks embedding algorithm based on semi-supervised autoencoder, termed as DeepMNE. Our approach captures multi-network topological information and takes the correlation among multi-networks into account. We apply our multi-network embedding method on the task of gene function prediction. The experimental results show that DeepMNE outperforms than other state-of-the-art methods and has strong robustness to the number of dimensions and restart probability.

References

  • Ashburner et al. (2000) Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; and Eppig, J. T. 2000. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nature Genetics 25(1):25–9.
  • Baldi (2011) Baldi, P. 2011.

    Autoencoders, unsupervised learning and deep architectures.

    In

    International Conference on Unsupervised and Transfer Learning Workshop

    , 37–50.
  • Basu, Bilenko, and Mooney (2004) Basu, S.; Bilenko, M.; and Mooney, R. J. 2004. A probabilistic framework for semi-supervised clustering. 59–68.
  • Bottou (1991) Bottou, L. 1991. Stochastic gradient learning in neural networks. Proceedings of Neuro Nimes.
  • Cao et al. (2014) Cao, M.; Pietras, C. M.; Feng, X.; Doroschak, K. J.; Schaffner, T.; Park, J.; Zhang, H.; Cowen, L. J.; and Hescott, B. J. 2014. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics 30(12):i219.
  • Cao, Lu, and Xu (2016) Cao, S.; Lu, W.; and Xu, Q. 2016. Deep neural networks for learning graph representations. In

    Thirtieth AAAI Conference on Artificial Intelligence

    , 1145–1152.
  • Cho, Berger, and Jian (2016) Cho, H.; Berger, B.; and Jian, P. 2016. Compact integration of multi-network topology for functional analysis of genes. Cell Systems 3(6):540.
  • Clark and Radivojac (2011) Clark, W. T., and Radivojac, P. 2011. Analysis of protein function and its prediction from amino acid sequence. Proteins-structure Function & Bioinformatics 79(7):2086.
  • Cozzetto et al. (2013) Cozzetto, D.; Buchan, D. W.; Bryson, K.; and Jones, D. T. 2013. Protein function prediction by massive integration of evolutionary analyses and multiple data sources. Bmc Bioinformatics 14(3):1–11.
  • Franceschini et al. (2013) Franceschini, A.; Szklarczyk, D.; Frankild, S.; Kuhn, M.; Simonovic, M.; Roth, A.; Lin, J.; Minguez, P.; Bork, P.; and Von, M. C. 2013. String v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research 41(Database issue):D808–D815.
  • Gligorijevic, Barot, and Bonneau (2017) Gligorijevic, V.; Barot, M.; and Bonneau, R. 2017. deepnf: Deep network fusion for protein function prediction. Bioinformatics.
  • Goodfellow, Bengio, and Courville (2016) Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
  • Grover and Leskovec (2016) Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864. ACM.
  • Huttenhower et al. (2006) Huttenhower, C.; Hibbs, M.; Myers, C.; and Troyanskaya, O. G. 2006. A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics 22(23):2890.
  • Žitnik et al. (2015) Žitnik, M.; Nam, E. A.; Dinh, C.; Kuspa, A.; Shaulsky, G.; and Zupan, B. 2015. Gene prioritization by compressive data fusion and chaining. Plos Computational Biology 11(10):e1004552.
  • Jian, Li, and Liu (2018) Jian, L.; Li, J.; and Liu, H. 2018. Toward online node classification on streaming networks. Data Mining & Knowledge Discovery 32(6):1–27.
  • Lanckriet et al. (2004) Lanckriet, G. R.; De, B. T.; Cristianini, N.; Jordan, M. I.; and Noble, W. S. 2004. A statistical framework for genomic data fusion. Bioinformatics 20(16):2626–35.
  • Lehtinen et al. (2015) Lehtinen, S.; Lees, J.; Bähler, J.; Shawetaylor, J.; and Orengo, C. 2015. Gene function prediction from functional association networks using kernel partial least squares regression. Plos One 10(8):e0134668.
  • Li et al. (2018) Li, J.; Cheng, K.; Wu, L.; and Liu, H. 2018. Streaming link prediction on dynamic attributed networks. In the Eleventh ACM International Conference, 369–377.
  • Mostafavi et al. (2008) Mostafavi, S.; Ray, D.; Warde-Farley, D.; Grouios, C.; and Morris, Q. 2008. Genemania: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9(Suppl 1):1–15.
  • Pal and Eisenberg (2005) Pal, D., and Eisenberg, D. 2005. Inference of protein function from protein structure. Structure 13(1):121–130.
  • Perozzi, Al-Rfou, and Skiena (2014) Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: online learning of social representations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.
  • Radivojac et al. (2013) Radivojac, P.; Clark, W. T.; Oron, T. R.; Schnoes, A. M.; Wittkop, T.; Sokolov, A.; Graim, K.; Funk, C.; Verspoor, K.; Ben-Hur, A.; et al. 2013. A large-scale evaluation of computational protein function prediction. Nature methods 10(3):221.
  • Re and Valentini (2010) Re, M., and Valentini, G. 2010. Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines. Neurocomputing 73(7–9):1533–1537.
  • Roded, Igor, and Ron (2007) Roded, S.; Igor, U.; and Ron, S. 2007. Network-based prediction of protein function. Molecular Systems Biology 3(1):88.
  • Ruepp et al. (2004) Ruepp, A.; Zollner, A.; Maier, D.; Albermann, K.; Hani, J.; Mokrejs, M.; Tetko, I.; Güldener, U.; Mannhaupt, G.; and Münsterkötter, M. 2004. The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32(18):5539–45.
  • Rumelhart, Hinton, and Williams (1986) Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by back-propagating errors. Readings in Cognitive Science 323(6088):399–421.
  • Sara and Quaid (2010) Sara, M., and Quaid, M. 2010. Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics 26(14):1759–1765.
  • Tsoumakas and Katakis (2007) Tsoumakas, G., and Katakis, I. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3(3):1–13.
  • Tsuda, Shin, and Schölkopf (2005) Tsuda, K.; Shin, H.; and Schölkopf, B. 2005. Fast protein classification with multiple networks. Bioinformatics 21(suppl 2):ii59.
  • Wang et al. (2014) Wang, B.; Mezlini, A. M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibekains, B.; and Goldenberg, A. 2014. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods 11(3):333–337.
  • Yan et al. (2010) Yan, H.; Venkatesan, K.; Beaver, J. E.; Klitgord, N.; Yildirim, M. A.; Hao, T.; Hill, D. E.; Cusick, M. E.; Perrimon, N.; and Roth, F. P. 2010. A genome-wide gene function prediction resource for drosophila melanogaster. Plos One 5(8):e12139.
  • Yang, Mcauley, and Leskovec (2013) Yang, J.; Mcauley, J.; and Leskovec, J. 2013. Community detection in networks with node attributes. In IEEE International Conference on Data Mining, 1151–1156.
  • Yu et al. (2015) Yu, G.; Rangwala, H.; Domeniconi, C.; Zhang, G.; and Zhang, Z. 2015. Predicting protein function using multiple kernels. Computational Biology & Bioinformatics IEEE/ACM Transactions on 12(1):219–233.
  • Zhang et al. (2015) Zhang, M.; Tang, J.; Qu, M.; Yan, J.; and Wang, M. 2015. Line: Large-scale information network embedding. 2(2):1067–1077.