Subgraph Networks with Application to Structural Feature Space Expansion

03/21/2019 ∙ by Qi Xuan, et al. ∙ City University of Hong Kong NetEase, Inc 0

In this paper, the concept of subgraph network (SGN) is introduced and then applied to network models, with algorithms designed for constructing the 1st-order and 2nd-order SGNs, which can be easily extended to build higher-order ones. Furthermore, these SGNs are used to expand the structural feature space of the underlying network, beneficial for network classification. Numerical experiments demonstrate that the network classification model based on the structural features of the original network together with the 1st-order and 2nd-order SGNs always performs the best as compared to the models based only on one or two of such networks. In other words, the structural features of SGNs can complement that of the original network for better network classification, regardless of the feature extraction method used, such as the handcrafted, network embedding and kernel-based methods. More interestingly, it is found that the model based on the handcrafted feature performs even better than those based on automatically generated features, at least for most datasets tested in the present investigation. This indicates that, in general, properly chosen structural features are not only more interpretable due to their clear physical meanings, but also effective in designing structure-based algorithms for network classification.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many real-world systems can be naturally represented by networks, such as biological networks [1, 2], collaboration networks [3, 4], software networks [5, 6], and social networks [7, 8]. Studying the substructure of a network, e.g. its subgraphs, is an efficient way to understand and analyze the network [9]. In fact, subgraphs are basic structural elements of a network, and distinct sets of subgraphs are usually associated with different types of networks, therefore play an important role in network studies. As shown in  [10]

, frequent appearance of subgraphs can reveal topological interaction patterns, each of which performs precisely some specialized functions, therefore they can be used to classify various networks.

Recently, a lot of work on network subgraphs for different problems have been reported. For example, in [11, 12, 13], different algorithms were designed for detecting network subgraphs. In et al. [14], a method for detecting strong ties was proposed using frequent subgraphs in a social network, where it was observed that frequent subgraphs as network structural features could lead to good performances in alleviating the sparse problem for detecting strong ties on the network. By adding time stamp to the topology, temporal frequent subgraphs [15, 16, 17]

were studied for some time-dependent networks, such as social and communication networks, as well as biological and neural networks. Furthermore, subgraphs were also applied to graph clustering and classification. In 

[18], a graph clustering method was developed based on frequent subgraphs, which can effectively detect communities in a network. Last but not least, in [19]

, an incremental subgraph join feature selection algorithm was designed, which forces graph classifiers to join short-pattern subgraphs so as to generate long-pattern subgraph features. Without realizing their functional importance, most existing studies simply consider distinct subgraphs as motifs and use only their numbers to characterize the underlying network. Although these statistics can be used to describe a network model, or even to design some computational algorithms, the role of subgraphs in such applications can be further explored so as to improve the network modeling and algorithms design.

On the other hand, with the rapid development of deep learning, many graph mining and representation methods have been proposed and tested, with practical applications to, e.g., drug design (through studying chemical compound and proteins data) 

[20, 21] and market analysis (through purchase history) [22]. Methods like word2vec [23] and doc2vec [24]

have shown good performances in Natural Language Processing (NLP), bringing some new insights to the field of graph representation. Inspired by these algorithms, graph2vec 

[25] was proposed, which was shown to be outstanding for graph representation. Among the existing graph mining methods, graph kernel [26, 27, 28]

has obtained unanimous praise in recent years, whereas the bottleneck is its high computational cost. As a winner from competitions on a plenty of machine learning problems, Convolutional Neural Network (CNN) has attracted lots of attention, especially in the area of computer vision 

[29], and it has been reformulated by the new convolution operator for graph structure data [30]. It was put forward in [31], referred to as graphconv, the first trial of an analogy of CNN on graphs. Then, GCN, designed in [32] as an extension to the k-localized kernel, resolved the problem of over localization as compared with graphconv. Based on GCN, it was proposed in [33] a generalized graph CNN, namely AGCN, which accepts data from arbitrary graphs (with respect to both structure and size), and learns from a task-driven adaptive graph. In contrast to the above methods, the approach of [34] directly uses topological and label attributes as features for graphs. Moreover, the GE-FSG method proposed in [3] treats frequent subgraphs in a network as word phrases in a language process (e.g., PV-DBOW [24]), thereby obtaining a distributed graph representation.

The emergence of many embedding methods allows to conduct a great number of data mining tasks, such as clustering and classification on networks. Recently, unsupervised embedding methods have been extensively investigated [35, 36, 37]

. However, although the obtained embedding vector is suitable for describing the local structure around a node, it usually ignore the global structure of the entire network that is important for the task of network classification. A simple resolution is to average the embedding vectors of all nodes to get the overall feature vector for the entire network. Moreover, graph2vec was proposed based on WL kernel 

[27], which preserves the relationships between nodes but discards the link labels, consequently missing quite a lot of global information.

More recently, in [8], a given network is transformed to a line graph, based on which an expansion of network attributes is obtained. It was found that this approach can significantly improve the performance of link-weight prediction. Inspired by this work, the present paper takes one step further to propose a new concept of Sub-Graph Network (SGN). Briefly, there are three steps to establish an SGN from an original network: first, detect subgraphs in the original network; second, choose appropriate subgraphs for a task; third, utilize the chosen subgraphs to build an SGN. Line graph thus can be considered as a special SGN, where a link connecting two nodes in the original network is considered as a subgraph, and two subgraphs are connected in the SGN if the corresponding two links share a same terminal node. Clearly more complicated subgraphs can be considered, e.g., three nodes with two links, so as to get a higher-order SGN, as will be further discussed in the next section.

The key point here is that the SGN extracts the representative parts of the original network and then assembles them to reconstruct a new network that preserves the relationship among subgraphs. The structure of SGN thus can complement the original network and, as a result, the integration of their features will benefit the subsequent structure-based algorithms design and applications.

The main contributions of this work are summarized as follows.

  • A new concept of SGN is introduced, along with algorithms designed for constructing the 1st-order and 2nd-order SGNs from a given network. These algorithms can be easily extended to construct higher-order SGNs.

  • SGN is used to obtain a series of handcrafted structural features which, together with the features automatically extracted by using some advanced network-embedding and kernel-based methods, provide complementary features to those extracted from the original network.

  • SGN is applied to network classification. Experiments on six groups of networks are carried out, showing that integrating the features obtained from SGN can indeed significantly improve the classification accuracy in most cases, as compared to the same feature extraction and classification methods based only on the original networks.

The rest of the paper is organized as follows. In Sec. 2, the definition of SGN is introduced and algorithms for constructing the 1st-order and 2nd-order SGNs are designed. In Sec. 3, handcrafted structural features are characterized, for both the original network and SGNs. In Sec. 4, several automatic feature extraction methods are discussed, whereas SGNs are applied to graph classification for some real-world networks. Finally, Sec. 5 concludes the investigation, with a future research outlook.

2 Subgraph networks

In this section, SGN is first introduced, followed by algorithms for constructing the 1st-order and 2nd-order SGNs.

Definition 1 (Network).

An undirected network is represented by , where and denote the sets of nodes and links, respectively. The element in is an unordered pair of nodes and , i.e., , for all , where is the number of nodes, namely the size of the network.

Definition 2 (Subgraph).

Given a network , is a subgraph of , denoted by if and only if and . The sequence of subgraphs is denoted as , .

Definition 3 (SGN: Subgraph Network).

Given a network , the SGN, denoted by , is a mapping from to , with the sets of nodes and links denoted by and , respectively. Two subgraphs and are connected if they share some common nodes or links in the original network, i.e., . Similarly, the element in is an unordered pair of subgraphs and , i.e., , with .

According to the definition of SGN, one can see that: (i) subgraph is a part of the original network; (ii) SGN is derived from a higher-order mapping of the original network ; (iii) the connecting rule between two subgraphs needs to be clarified. Following the approach of [38], where the problem of graph representation in a domain with higher-order relations is discussed, constructing sets of nodes as -chains, corresponding to points (0-chains), lines (1-chains), triangles (2-chains), etc., here the new framework constructs subgraphs as 1st order, 2nd order, etc. For clarity, three steps in building the new framework are outlined as follows.

  • Detecting subgraphs from the original network. Networks are rich of subgraph structures, with some subgraphs occurring frequently, e.g., motifs [12]. Different kinds of networks may have different local structures, captured by different distributions of various subgraphs.

  • Choosing appropriate subgraphs. Generally, subgraphs should not be too large, since in this case SGN may only contain a very small number of nodes, making the subsequent analysis less meaningful. Moreover, the chosen subgraphs should be connected to each other, i.e., they should share some common part (nodes or links) of the original network, so that higher-order structural information can emerge.

  • Utilizing the subgraphs to build SGN. After extracting enough subgraphs from the original network, connections among them are established following certain rules so as to build SGN. Here, for simplicity, consider two subgraphs. They are connected if and only if they share the same nodes or links from the original network. There certainly can be other connecting rules, leading to totally different SGNs, which will be discussed elsewhere in the future.

In this paper, the most fundamental subgraphs, i.e., line and triangle, are chosen as subgraphs, since they are simple and relatively frequently appearing in most networks. Thus, two kinds of SGNs of different orders are constructed as follows.

2.1 First-Order SGN

In the case of first-order, a line, or a link, is chosen as a subgraph, based on which SGN is built, denoted by SGN. The 1st-order SGN is also known as a line graph, where the nodes are the links in the original network, and two nodes are connected if the corresponding links share a same end node.

Fig. 1: The process of building SGN from a given network: (a) the original network, (b) extracting lines as subgraphs, (c) establishing connections among these lines, and (d) forming SGN.
Input: A network (,) with node set and link set .
Output: SGN, denoted by (,).
1 Initialize a node set and a link set ;
2 for each  do
3       get the neighbor set of ;
4       for each  do
5             = sorted([, ]);
6             merge the nodes in list into a string;
7             add the new node into ;
8       end for
9      for  and  do
10            add the link into ;
11       end for
12      add into ;
13 end for
14return (,);
Algorithm 1 Constructing SGN.

The process to build SGN from a given network is shown in Fig. 1. In this example, the original network has 6 nodes connected by 6 links. First, extract lines as subgraphs, labeled them by their corresponding end nodes, as shown in Fig. 1 (b). These lines are treated as nodes in SGN. Then, connect these lines based on their labels, i.e., two lines are connected if they share one same end node, as shown in Fig. 1 (c). Finally, obtain SGN with 6 nodes and 8 links, as shown in Fig. 1 (d). A pseudocode of constructing SGN is given in Algorithm 1. The input of this algorithm is the original network (,) and the output is the constructed SGN, denoted by (,), where and represent the sets of nodes and links in the SGN, respectively.

2.2 Second-Order SGN

Now, construct higher-order subgraphs by considering the connection patterns among three nodes. There are more diverse connection patterns among three nodes than the case of two nodes. In theory, there are 13 possible non-isomorphic connection patterns among three nodes [12] in a directed network, as shown in Fig. 2 (a). This number decreases to 2 in an undirected network, namely only open and closed triangles, as shown in Fig. 2 (b). Here, only connected subgraphs are considered, while those with less than two links are ignored. Compared with lines, triangles can provide more insights about the local structure of a network [39]. For instance, in [40], the evolution of triangles in a Google+ online social network was studied, obtaining some valuable information during the emerging and pruning of various triangles.

Fig. 2: The connection patterns among three nodes for (a) directed and (b) undirected networks.

The open triangles are defined as subgraphs to establish the 2nd-order SGN, denoted by SGN. Here, second-order means that there are two links in each open triangle, and two open triangles are connected in SGN if they share a same link. Note that same link rather than same node is used here to avoid obtaining a very dense SGN

. This is because a dense network, with each pair of nodes connected with a higher probability, tends to provide less structural information in general.

The iterative process to build SGN from an original network is shown in Fig. 3. First, extract lines, labeled by their corresponding end nodes, as shown in Fig. 3 (b), to establish SGN. Then, in the line graph SGN, further extract lines to obtain open triangles as subgraphs, labeled by their corresponding three nodes, as shown in Fig. 3 (c). Finally, obtain SGN with 8 nodes and 14 links, as shown in Fig. 3 (d). A pseudocode of constructing SGN is given in Algorithm 2. The input of this algorithm is the original network (,) and the output is the constructed SGN, denoted by (,), where and represent the sets of nodes and links in the SGN, respectively.

Fig. 3: The process to build SGN from a given network: (a) the original network, (b) extracting lines, (b) building SGN and extracting open triangles as subgraphs, and (d) establishing connections among these open triangles to obtain SGN.
Input: A network (,) with node set and link set .
Output: SGN, denoted by (,).
1 Initialize a node set and a link set ;
2 for each  do
3       get the neighbors set of ;
4       get the full combination of node pairs in the neighbor collection;
5       for each  do
6             = [];
7             merge the nodes in list into a string;
8             add the new node into ;
9       end for
10      for  and  do
11            add the edge () into ;
12       end for
13      add into ;
14 end for
15return (,);
Algorithm 2 Constructing SGN.

Clearly, the new method can be easily extended to construct higher-order SGNs by choosing proper subgraphs and connecting rules. For instance, based on Algorithms 1 and 2, for the network shown in Fig. 3 (d), one can further label each link by the 4 numbers from the end nodes, i.e., these numbers correspond to the 4 nodes in the original network. Then, one can treat each link with a different label as a node, and connect them if they share 3 same numbers, so as to establish the 3rd-order SGN.

It is interesting to investigate such a higher-order SGN. However, as subgraphs become too large, the SGN may contain only few nodes, making the network structure less informative. It may be argued that there might be some functional subgraphs in certain networks, which could be better blocks to be used to build SGNs. However, this may not be true. Take the compound networks in chemistry as examples, e.g. benzene ring, and other functional groups such as hydroxyl group, carboxyl group and aldehyde group, which play an important role in the properties of organic substances. In such networks, however, one usually cannot choose the benzene ring as a building block, since most of these networks are of small sizes and contain a small number of benzene rings, as shown in Fig. 4. In this case, if one uses benzene rings as subgraphs, an SGN will be built containing only three nodes, with one isolated from the other two. As such, this SGN can hardly provide sufficient information to distinguish itself from the other substances, hence will not be useful.

Fig. 4: A compound network, where each node denotes an atom and its corresponding SGN obtained by taking benzene rings as subgraphs.

3 Network Attributes

Now, besides the original network, denoted by SGN for simplicity, there are two SGNs, i.e., SGN and SGN. These networks together may provide more comprehensive structural information for subsequent applications. In this paper, the focus is on its application to network classification. A typical procedure for accomplishing the task consists of two steps: first, extract network structural features; second, design a machine learning method based on these features to realize the classification. Here, the following handcrafted network features are used to design the classifier.

  • Number of Nodes (): Total number of nodes in the network.

  • Number of links (): Total number of links in the network.

  • Average degree (): The mean value of links connected to a node in the network.

  • Percentage of leaf nodes (): A node of degree 1 is defined as a leaf node. Suppose there are totally leaf nodes in the network. Then,

    (1)
  • Average clustering coefficient (): For node , the clustering coefficient represents the probability of a connection between any two neighbors of . Suppose that there are neighbors of and these nodes are connected by links. Then, the average clustering coefficient is defined as

    (2)
  • Largest eigenvalue of the adjacency matrix (): The adjacency matrix of the network is an matrix, with its element if nodes and are connected, and

    otherwise. In this step, calculate all the eigenvalues of

    and choose the largest one.

  • Network density (): Given the number of nodes and the number of links , network density is defined as

    (3)
  • Average betweenness centrality (): Betweenness centrality is a centrality metric based on shortest paths. The average betweenness centrality of the network is defined as

    (4)

    where is the number of shortest paths between and , and is the number of shortest paths between and that pass through .

  • Average closeness centrality (): The closeness centrality of a node in a connected network is defined as the reciprocal of the average shortest path length between this node and the others. The average closeness centrality is defined as

    (5)

    where is the shortest path length between nodes and .

  • Average eigenvector centrality ()

    : Usually, the importance of a node depends not only on its degree but also on the importance of its neighbors. Eigenvector centrality is another measure of the importance of a node based on its neighbors, which is defined as

    (6)

    where represents the importance of node and is calculated based on the following equation:

    (7)

    where is a preset parameter, which should be less than the reciprocal of the maximum eigenvalue of the adjacency matrix .

  • Average neighbor degree (): Neighbor degree of a node is the average degree of all the neighbors of this node, which is defined as

    (8)

    where is a set of the neighbors of node , and is the degree of node .

3.1 Datasets

Experiments were conducted on 6 real-world network datasets, as introduced in the following, with each containing two classes of networks. The first 5 datasets are about bio- and chemo-informatics, while the last one is a social network. The basic statistics of these datasets are presented in TABLE I.

Dataset #Graphs #Classes #Positive #Negative
MUTAG
188
2 125 63
PTC
344
2 152 192
PROTEINS
1113
2 663 450
NCI1
4110
2 2057 2053
NCI109
4127
2 2079 2048
IMDB-BINARY
1000
2 500 500
TABLE I: Basic statistics of the 5 datasets. #Graphs is the number of graphs. #Classes is the number of classes. #Positive and #Negative are the numbers of graphs in the two different classes.
  • MUTAG: This dataset is about heteroaromatic nitro and mutagenic aromatic compounds, with nodes and links representing atoms and the chemical bonds between them, respectively. They are labeled according to whether there is a mutagenic effect on a special bacteria [41].

  • PTC: This dataset includes 344 chemical compound graphs, with nodes and links representing atoms and the chemical bonds between them, respectively. Their labels are determined by their carcinogenicity for rats [42].

  • PROTEINS: This dataset comprises of 1113 graphs. The nodes are Secondary Structure Elements (SSEs) and the links are neighbors in the amino-acid sequence or in the 3D space. These graphs represent either enzyme or non-enzyme proteins [43].

  • NCI1 & NCI109: These two datasets comprise of 4110 and 4127 graphs, respectively. The nodes and links represent atoms and chemical bonds between them, respectively. They are two balanced subsets of the datasets of chemical compounds screened for the activities against non-small cell lung cancer and ovarian cancer cell lines, respectively. The positive and negative samples are distinguished according to whether they are effective against cancer cells [2].

  • IMDB-BINARY: This dataset is about movie collaboration, which is collected from IMDB, containing lots of information about different movies. Each graph is an ego-network, where nodes represent actors or actresses and links indicate whether they appear in the same movie. Each graph is categorized into one of the two genres (Action and Romance) [3].

Fig. 5: SGN, SGN and SGN as well as the numbers of nodes and links for (a) positive and (b) negative samples in the MUTAG dataset.
Fig. 6: The t-SNE visualization of handcrafted network features. The same color of points represent the same class of networks in MUTAG.

3.2 Benefits of SGN

Here, take the MUTAG dataset as an example to show that SGNs of different orders may capture different aspects of a network structure.

First, a positive sample and a negative one are chosen from the MUTAG dataset, with their SGN, SGN and SGN visualized in Fig. 5. To facilitate a comparison, the numbers of nodes and links of these networks are also presented in the figure. Here, a positive sample means that this compound has mutagenic effect on the bacteria; otherwise, it is negative. As can be seen, although the original networks of the two samples have quite similar sizes, their difference is seemingly enlarged in the higher-order SGNs; more precisely, the numbers of nodes and links in SGN increase faster for the positive sample than the negative one as the order increases.

Then, the handcrafted network features are visualized by using t-SNE in Fig. 6, where the networks in MUTAG can indeed be distinguished to a certain extent by these features of the original network, the 1st-order SGN and the 2nd-order SGN, respectively. Moreover, when all the features are put together, it appears that these networks can be better distinguished, indicating that SGNs of different orders and the original network may complement to each other. Therefore, integrating the structural information of all these networks may significantly improve the performances of the subsequent algorithms designed based on network structures.

4 Experiments

With the rapid growth of real-world graph data, network classification is becoming more and more important, and a number of effective network classification methods [44, 45, 46] have been proposed in recent years. Along this line of research, as an application of the proposed SGN, classifiers are designed based on the structural features obtained from SGNs as well as from the original networks.

4.1 Automatic Feature Extraction Methods

Besides those handcrafted features, one can also use some advanced methods, such as network embedding methods, to automatically generate a feature vector of certain dimension from the given network. Under the present framework, such automatically generated feature vectors can also be further expanded based on SGNs.

Two network embedding methods, node2vec and graph2vec, and two graph kernel-based methods, subtree kernel WL and deep WL methods, are chosen as automatic feature extraction methods.

  • Node2vec [36]: It learns continuous feature representations in a lower-dimensional space for the nodes in a network, by optimizing the likelihood of preserving their neighborhoods. As a lower-order substructure embedding technique, it fails to learn global similarities for classifying networks directly. In the framework, first, embedding vectors are generated for all the nodes in the given network, and then the mean of them is used to represent the whole network [25].

  • Graph2vec [25]: This is the first unsupervised embedding approach for an entire network, which is based on the extending word-and-document embedding techniques that has shown great advantages in NLP. Similarly, graph2vec establishes the relationship between a network and the rooted subgraphs using a similar model to doc2vec [24]. Graph2vec first extracts rooted subgraphs and provides corresponding labels into the vocabulary, and then trains a skip-gram model to obtain the representation of the entire network.

  • WL [27]: This is a rapid feature extraction scheme based on the Weisfeiler-Lehman (WL) test for isomorphism on graphs. It maps the original network to a sequence of graphs, with node attributes capturing both topological and label information. The key idea of the algorithm is to augment the node labels by the sorted set of node labels of neighboring nodes, and compress these augmented labels into new and short labels. These steps are then repeated until the node label sets of the two compared networks differ, or the number of iterations reaches a preset value. It should be noted that, to facilitate the expansion of the new model, the sub-structure frequency vectors, instead of the kernel matrix , are used as the inputs to the new classifier.

  • Deep WL [28]: This provides a unified framework that leverages the dependency information of sub-structures by learning latent representations. The differences from the WL kernel generate a corpus of sub-structures by integrating language-modeling and deep-learning techniques [47], where a co-occurrence relationship of sub-structures is preserved and sub-structure vector representations are obtained before the kernel is computed. Then, a sub-structure similarity matrix, , is calculated by the matrix with each column representing a sub-structure vector. Denote by the matrix with each column representing a sub-structure frequency vector. Then, according to the definition of kernel:

    (9)

    one can use the columns in the matrix as the inputs to the classifier.

In this study, the same embedding dimension is adopted for graph2vec and node2vec according to  [25]. For node2vec, the following default parameters are used: the number of random walks is set to 10, the walk length is set to 80, the window size is set to 10 for the skip-gram model, and the embedding dimension is set to 1024. In this method, the biased random walk is determined by two hyper-parameters, and , both are set to 1 for simplicity. For graph2vec, it is based on the rooted subgraphs, which are also adopted in the WL kernel. The parameter height of WL kernel

is set to 3. Since the embedding dimension is predominant for learning performances, a commonly-used value of 1024 is adopted. The other parameters are set to defaults: the learning rate is set to 0.5, the batch size is set to 512 and the epochs is set to 1000. For

WL and deep WL, according to  [28]

, the Weisfelier-Lehman subtree kernel is used to built the corpus and the height of which is set to 2. Then, the Maximum Likelihood Estimation (MLE) is used to compute the kernel in the WL method. Furthermore, the same parameter setting as WL is chosen, with the embedding dimension equal to 10, window size equal to 5 and skip-gram used for the word2vec model in the deep WL method.

Without loss of generality, the well-known logistic regression is chosen as the new classification model. Meanwhile, for each feature extraction method, the feature space is first expanded by using SGNs, and then the dimension of the feature vectors is reduced to the same value as that of the feature vector obtained from the original network using PCA in the experiments, for a fair comparison. Each dataset is randomly split into 9 folds for training and 1 fold for testing. Here, the

- is adopted as the metric to evaluate the classification performance:

(10)

where and

are the precision and recall, respectively. To exclude the random effect of the fold assignment, experiment is repeated for 500 times and then the average

-

and its standard deviation are recorded.

Algorithm Dataset
Handcraft MUTAG PTC PROTEINS NCI1 NCI109 IMDB-BINARY
SGN
SGN
SGN
SGN
SGN
SGN
SGN
Gain 5.78% 6.96% 1.71% 3.50% 3.55% 6.37%
Node2vec MUTAG PTC PROTEINS NCI1 NCI109 IMDB-BINARY
SGN
SGN
SGN
SGN
SGN
SGN
SGN
Gain 20.32% 3.69% 18.18% 13.52% 18.45% 5.20%
Graph2vec MUTAG PTC PROTEINS NCI1 NCI109 IMDB-BINARY
SGN
SGN
SGN
SGN
SGN
SGN
SGN
Gain 4.44% 5.10% 1.56% 5.90% 1.52% 13.73%
WL MUTAG PTC PROTEINS NCI1 NCI109 IMDB-BINARY
SGN
SGN
SGN
SGN
SGN
SGN
SGN
Gain 10.31% 11.63% 7.08% 11.63% 5.28% 5.78%
Deep WL MUTAG PTC PROTEINS NCI1 NCI109 IMDB-BINARY
SGN
SGN
SGN
SGN
SGN
SGN
SGN
Gain 12.93% 11.58% 4.75% 4.77% 6.00% 11.85%
TABLE II: Classification results on the 6 datasets, in terms of -, based on different feature extraction methods and combinations of SGNs.

4.2 Computational Complexity

Now, the computational complexity in building SGNs is analyzed. Denote by and the numbers of nodes and links, respectively, in the original network. The average degree of the network is calculated by

(11)

where is the degree of node . Based on Algorithm 1, the time complexity in transforming the original network to SGN is

(12)

Then, the number of nodes in SGN is equal to and the number of links is [48]. Similarly, one can get the time complexity in transforming SGN to SGN, as

(13)

4.3 Experiment Results

As described in Sec. 2, the proposed SGNs can be used to expand structural feature spaces. To investigate the effectiveness of the 1st-order and the 2nd-order SGNs, i.e., SGN and SGN, for each feature extraction method, the classification results are compared on the basis of only one network, i.e., SGN, SGN and SGN, respectively; on the basis of two networks, i.e., SGN together with SGN and SGN together with SGN, denoted by SGN and SGN, respectively; and on the basis of three networks, i.e., SGN together with SGN and SGN, denoted as SGN. For a fair comparison, PCA is used to compress the feature vectors to the same dimension for each feature extraction method, before they are inputted into the logistic regression model.

The results are shown in TABLE II, where one can see that, for a single network case, the original network seems to provide more structural information, i.e., the classification model based on SGN performs better, in terms of higher -, than those based on SGN or SGN, in most cases. This is reasonable, because there must be information loss in the processes to build SGNs. However, it still appears to be dependent on the feature extraction method used. For instance, when the handcrafted features are used, better classification results can be obtained based on SGN or SGN than SGN for 3 datasets, while when node2vec is adopted, even better classification performance is realized based on the 1st-order or 2nd-order SGNs than the original network in 5 datasets. More interestingly, the classification models based on two networks, i.e., SGN and SGN, perform better than those based on a single network, while the model based on three networks, i.e., SGN, performs the best in most cases.

The gain on - is calculated, when all the three networks are used together, i.e., SGN, compared with that when only the original network is used, i.e., SGN, which is defined to be the relatively difference between their corresponding -:

(14)

The gains are also presented in TABLE II, where one can see that the classification performance is indeed significantly improved in all the 36 cases. Particularly, in 21 cases, the gains are larger than 5%, while in 11 cases, they are even larger than 10%. These results indicate that the 1st-order and the 2nd-order SGNs can indeed complement the original network regarding the structural information, thus benefiting network classification. Surprisingly, it is found that the chosen handcrafted features based on SGN outperforms the other automatically generated features that use more advanced network-embedding or graph-kernel based methods, in 3 out of 6 datasets, i.e., PTC, PROTEINS and IMDE-BINARY. This phenomenon indicates that, compared with those automatically generated ones, properly chosen traditional structural features are of particular advantage in the proposed framework, in the sense that they are not only more interpretable due to their clear physical meanings, but also equally effective in designing subsequent structure-based algorithms, e.g., for network classification.

Fig. 7: The importance of handcrafted features in logistic regression model for network classification using SGN, SGN and SGN together in MUTAG dataset.
Fig. 8: Average - as functions of the size of the training set (represented by the fraction of samples in the training set), for various feature extraction methods on different datasets, based on SGN, SGN, SGN and SGN, respectively.

In addition, the feature importance for the task of network classification is investigated by using logistic regression. Denote by the coefficient of feature in the model, and suppose that there are features in total. Then, the importance of feature is defined as

(15)

Taking MUTAG for example, the results are visualized in Fig. 7. Overall, the features in SGN are most important, since they determine 37.68% of the model, while the features in SGN are more important than those in SGN, since they determine 35.01% and 27.31% of the model, respectively. When focusing on a single feature, it is found that the clustering coefficient , the percentage of leaf nodes , and the average neighbor degree , are the top three most important features, and they together determine more than 50% of the model. Interestingly, it appears that different SGNs address different aspects of the network structure in the classification task. For instance, the most important feature in SGN is the clustering coefficient, while the coefficient for this feature in SGN is zero since there is no triangle in the networks in MUTAG dataset. Moreover, the largest eigenvalue of the adjacency matrix and the average degree in SGN are relatively important, while those in SGN and SGN have less effect on the model. These results confirm once again that SGNs indeed complement the original network to achieve better network classification performance.

To address the robustness of the classification model against the size variation of the training set, the - is calculated for the network classification task, using various sizes of training sets (from 10 to 90 percent, within a 10 percent interval). For each size, the training and test sets are randomly divided, which is repeated for 500 times with the average result recorded. The results are shown in Fig. 8 for various feature extraction methods on different datasets. It can be seen that the classification results based on SGN, SGN and SGN together are always the best, and the results based on SGN and SGN together, or SGN and SGN together, are always better than those based only on the original network SGN. This confirms that the simulation results are quite robust to the variation of the training set size.

5 Conclusions

In this paper, the concept of subgraph network (SGN) is introduced, along with algorithms developed for constructing the 1st-order and 2nd-order SGNs, which can expand the structural feature space. As a multi-order graph representation method, various orders of SGNs can significantly enrich the structural information and thus benefit the network feature extraction methods to capture various aspects of the network structure. Also, the effectiveness of the 1st-order and 2nd-order SGNs are verified. Moreover, the handcrafted features, as well as the features automatically generated by network embedding methods including node2vec and graph2vec and kernel-based methods including Weisfeiler-Lehman (WL) and deep WL methods, are used in experiments for network classification on six real-world datasets.

The experimental results show that the classification model based on the features of the original network together with the 1st-order and 2nd-order SGNs always performs the best, compared with those based only on a single network, either the original one, the 1st-order or the 2nd-order SGN, or those based on a pair of them. This demonstrates that SGNs can indeed complement the original network on structural information and thus benefit the subsequent network classification algorithms, no matter which feature extraction method is adopted. More interestingly, it is found that the model based on handcrafted features performs even better than those based on the features automatically generated by more advanced methods, such as graph2vec, for most datasets. This finding suggests that, in general, properly chosen structural features with clear physical meanings may be effective in designing structure-based algorithms.

Future research may focus on extracting more types of subgraphs to establish SGNs of higher diversity for both static and temporal networks, so as to capture the network structural information more comprehensively, to design consequent algorithms for network classification and perhaps other tasks as well.

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (61572439, 11605154, 11505153) and by the Hong Kong Research Grants Council under the GRF Grant CityU11200317.

References

  • [1] M. Walter, C. Chaban, K. Schütze, O. Batistic, K. Weckermann, C. Näke, D. Blazevic, C. Grefen, K. Schumacher, C. Oecking, K. Harter, and J. Kudla, “Visualization of protein interactions in living plant cells using bimolecular fluorescence complementation,” The Plant Journal, vol. 40, no. 3, pp. 428–438, 2004.
  • [2] N. Wale, I. A. Watson, and G. Karypis, “Comparison of descriptor spaces for chemical compound retrieval and classification,” Knowledge and Information Systems, vol. 14, no. 3, pp. 347–375, 2008.
  • [3] D. Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung, “Learning graph representation via frequent subgraphs,” in Proceedings of the 2018 SIAM International Conference on Data Mining.   SIAM, 2018, pp. 306–314.
  • [4] Q. Xuan, Z.-Y. Zhang, C. Fu, H.-X. Hu, and V. Filkov, “Social synchrony on complex networks,” IEEE transactions on cybernetics, vol. 48, no. 5, pp. 1420–1431, 2018.
  • [5] Q. Xuan, A. Okano, P. Devanbu, and V. Filkov, “Focus-shifting patterns of oss developers and their congruence with call graphs,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.   ACM, 2014, pp. 401–412.
  • [6] A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” ACM Transactions on Software Engineering and Methodology, vol. 11, no. 3, pp. 309–346, 2002.
  • [7] J. Kim and M. Hastak, “Social network analysis: Characteristics of online social networks after a disaster,” International Journal of Information Management, vol. 38, no. 1, pp. 86–96, 2018.
  • [8]

    C. Fu, M. Zhao, L. Fan, X. Chen, J. Chen, Z. Wu, Y. Xia, and Q. Xuan, “Link weight prediction using supervised learning methods and its application to yelp layered network,”

    IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 8, pp. 1507–1518, 2018.
  • [9] J. R. Ullmann, “An algorithm for subgraph isomorphism,” Journal of the ACM (JACM), vol. 23, no. 1, pp. 31–42, 1976.
  • [10] G. Balazsi, A.-L. Barabási, and Z. Oltvai, “Topological units of environmental signal processing in the transcriptional regulatory network of escherichia coli,” Proceedings of the National Academy of Sciences, vol. 102, no. 22, pp. 7841–7846, 2005.
  • [11] M. Thoma, H. Cheng, A. Gretton, J. Han, H.-P. Kriegel, A. Smola, L. Song, P. S. Yu, X. Yan, and K. M. Borgwardt, “Discriminative frequent subgraph mining with optimality guarantees,”

    Statistical Analysis and Data Mining: The ASA Data Science Journal

    , vol. 3, no. 5, pp. 302–318, 2010.
  • [12] S. Wernicke, “Efficient detection of network motifs,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 3, no. 4, pp. 347–359, 2006.
  • [13] ——, “A faster algorithm for detecting network motifs,” in International Workshop on Algorithms in Bioinformatics.   Springer, 2005, pp. 165–177.
  • [14] R. Rotabi, K. Kamath, J. Kleinberg, and A. Sharma, “Detecting strong ties using network motifs,” in Proceedings of the 26th International Conference on World Wide Web Companion.   International World Wide Web Conferences Steering Committee, 2017, pp. 983–992.
  • [15] L. Kovanen, M. Karsai, K. Kaski, J. Kertész, and J. Saramäki, “Temporal motifs in time-dependent networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2011, no. 11, p. P11005, 2011.
  • [16] Q. Xuan, H. Fang, C. Fu, and V. Filkov, “Temporal motifs reveal collaboration patterns in online task-oriented networks,” Physical Review E, vol. 91, no. 5, p. 052813, 2015.
  • [17] A. Paranjape, A. R. Benson, and J. Leskovec, “Motifs in temporal networks,” in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.   ACM, 2017, pp. 601–610.
  • [18] C. E. Tsourakakis, J. Pachocki, and M. Mitzenmacher, “Scalable motif-aware graph clustering,” in Proceedings of the 26th International Conference on World Wide Web.   International World Wide Web Conferences Steering Committee, 2017, pp. 1451–1460.
  • [19] H. Wang, P. Zhang, X. Zhu, I. W.-H. Tsang, L. Chen, C. Zhang, and X. Wu, “Incremental subgraph feature selection for graph classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 128–142, 2017.
  • [20]

    Y. Jing, Y. Bian, Z. Hu, L. Wang, and X.-Q. S. Xie, “Deep learning for drug design: An artificial intelligence paradigm for drug discovery in the big data era,”

    The AAPS journal, vol. 20, no. 3, p. 58, 2018.
  • [21] T. Lane, D. P. Russo, K. M. Zorn, A. M. Clark, A. Korotcov, V. Tkachenko, R. C. Reynolds, A. L. Perryman, J. S. Freundlich, and S. Ekins, “Comparing and validating machine learning models for mycobacterium tuberculosis drug discovery,” Molecular pharmaceutics, 2018.
  • [22] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al., “Wide & deep learning for recommender systems,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems.   ACM, 2016, pp. 7–10.
  • [23]

    T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in

    Advances in neural information processing systems, 2013, pp. 3111–3119.
  • [24] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International Conference on Machine Learning, 2014, pp. 1188–1196.
  • [25] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, “graph2vec: Learning distributed representations of graphs,” arXiv preprint arXiv:1707.05005, 2017.
  • [26] S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt, “Graph kernels,” Journal of Machine Learning Research, vol. 11, pp. 1201–1242, 2010.
  • [27] N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” Journal of Machine Learning Research, vol. 12, pp. 2539–2561, 2011.
  • [28] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2015, pp. 1365–1374.
  • [29] Q. Xuan, B. Fang, Y. Liu, J. Wang, J. Zhang, Y. Zheng, and G. Bao, “Automatic pearl classification machine based on a multistream convolutional neural network,” IEEE Transactions on Industrial Electronics, vol. 65, no. 8, pp. 6538–6547, 2018.
  • [30] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, 2015, pp. 2224–2232.
  • [31] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013.
  • [32] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 3844–3852.
  • [33] R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolutional neural networks,” arXiv preprint arXiv:1801.03226, 2018.
  • [34] G. Li, M. Semerci, B. Yener, and M. J. Zaki, “Graph classification via topological and label attributes,” in Proceedings of the 9th international workshop on mining and learning with graphs (MLG), San Diego, USA, vol. 2, 2011.
  • [35] B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash, “Distributed representations of subgraphs,” in Data Mining Workshops (ICDMW), 2017 IEEE International Conference on.   IEEE, 2017, pp. 111–117.
  • [36] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, pp. 855–864.
  • [37] T. Lyu, Y. Zhang, and Y. Zhang, “Enhancing the network embedding quality with structural similarity,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.   ACM, 2017, pp. 147–156.
  • [38] S. Agarwal, K. Branson, and S. Belongie, “Higher order learning with graphs,” in Proceedings of the 23rd international conference on Machine learning.   ACM, 2006, pp. 17–24.
  • [39] J.-P. Eckmann and E. Moses, “Curvature of co-links uncovers hidden thematic layers in the world wide web,” Proceedings of the national academy of sciences, vol. 99, no. 9, pp. 5825–5829, 2002.
  • [40] D. Schiöberg, F. Schneider, S. Schmid, S. Uhlig, and A. Feldmann, “Evolution of directed triangle motifs in the google+ osn,” arXiv preprint arXiv:1502.04321, 2015.
  • [41] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity,” Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991.
  • [42] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma, “Statistical evaluation of the predictive toxicology challenge 2000–2001,” Bioinformatics, vol. 19, no. 10, pp. 1183–1193, 2003.
  • [43] K. M. Borgwardt, C. S. Ong, S. Schönauer, S. Vishwanathan, A. J. Smola, and H.-P. Kriegel, “Protein function prediction via graph kernels,” Bioinformatics, vol. 21, pp. i47–i56, 2005.
  • [44]

    T. Joachims, T. Hofmann, Y. Yue, and C.-N. Yu, “Predicting structured objects with support vector machines,”

    Communications of the ACM, vol. 52, no. 11, pp. 97–104, 2009.
  • [45] T. Kudo, E. Maeda, and Y. Matsumoto, “An application of boosting to graph classification,” in Advances in neural information processing systems, 2005, pp. 729–736.
  • [46] X. Zhao, B. Zong, Z. Guan, K. Zhang, and W. Zhao, “Substructure assembling network for graph classification,” 2018.
  • [47] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of machine learning research, vol. 3, pp. 1137–1155, 2003.
  • [48] F. Harary and R. Z. Norman, “Some properties of line digraphs,” Rendiconti del Circolo Matematico di Palermo, vol. 9, no. 2, pp. 161–168, 1960.