Asimplebaselinealgorithmforgraphclassification
None
view repo
Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to stateoftheart algorithms.
READ FULL TEXT VIEW PDFNone
Graph classification methods can schematically be divided into three categories: graph kernels, sequential methods and embedding methods. In this section, we briefly present these different approaches, focusing on methods that only use the structure of the graph and no exogenous information, such as node features, to perform classification as we only want to compare the capacity of the algorithms to capture structural information.
Kernel methods [16, 17, 15, 14]
perform pairwise comparisons between the graphs of the dataset and apply a classifier, usually a support vector machine (SVM), on the similarity matrix. In order to maintain the number of comparisons tractable when the number of graphs is large, they often use Nyström algorithm
[22] to compute a low rank approximation of the similarity matrix. The key is to construct an efficient kernel that can be applied to graphs of varying sizes and captures useful features for the downstream classification.Some methods tackle the varying sizes of graphs by processing them as a sequence of nodes. Earliest models used random walk based representations [4, 23]. More recently, [8] or [24]
transform a graph into a sequence of fixed size vectors, corresponding to its nodes, which is fed to a recurrent neural network. The two main challenges in this approach are the design of the embedding function for the nodes and the order in which the embeddings are given to the recurrent neural network.
Embedding methods [7, 1, 6, 13], derive a fixed number of features for each graph which is used as a vector representation for classification. Even though deriving a good set of features is often a difficult task, this approach has the benefit of being compatible with any standard classifier in a plug and play
fashion (SVM, random forest, multilayer perceptron…). Our model belongs to this class of methods as we rely on spectral features of the graph.
Let be an undirected and unweighted graph and its boolean adjacency matrix with respect to an arbitrary indexing of the nodes. is assumed to be connected, otherwise, we extract its largest connected component. Let be the matrix of node degrees, the normalized Laplacian of is defined as
(1) 
If the graph has less than
nodes, we use right zero padding to get a vector of appropriate dimensions:
. We denote this embedding as spectral features (SF).The normalized Laplacian matrix of a graph is a wellknown object in spectral learning [2, 9]
. However, for node clustering or classification most of the attention is usually directed to its eigenvectors and not its spectrum. A major benefit of the ordered spectrum representation for graph classification is that it does not depend on the indexing of the nodes.
The eigenvalues of the normalized Laplacian matrix lie between and . Such a property is very convenient for the downstream use of a standard classifier without heavy rescaling or preprocessing. The multiplicity of the eigenvalue corresponds to the number of connected components in the graph, hence the omission of in our representation as we only consider the largest connected component. Other values are also known to denote the presence of specific structures in the graph [5]. For example, an eigenvalue equal to denotes a bipartite structure.
In [3], each eigenvalue of the Laplacian corresponds to the energy level of a stable configuration of the nodes in the embedding space. The lower the energy, the stabler the configuration. In [20], these eigenvalues correspond to frequencies associated to a Fourier decomposition of any signal living on the vertices of the graph. Thus, the truncation of the Fourier decomposition acts as lowpass filter on the signal. Characterizing a graph by the smallest eigenvalues of its normalized Laplacian is thus comparable to characterizing a melody by its lowest fundamental frequencies.
Finally, there have been some attempts to connect spectral decomposition to graph isomorphism [21, 11], however, to the best of our knowledge, this is still an open problem.
The choice of the classifier is left to the discretion of the user. In our experiments, we chose a random forest classifier (RFC) which offers a good computational speed versus accuracy tradeoff. Results with several other common classifiers are displayed in appendix A.
An illustration of the model is proposed in figure 1.
We evaluated our model against some standard datasets from biology: Mutag (MT), Predictive Toxicology Challenge (PTC), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1) [10]. All graphs represent chemical compounds. Nodes are molecular substructures (typically atoms) and edges represent connections between these substructures (chemical bound or spatial proximity). In MT, the compounds are either mutagenic and not mutagenic while in PTC, they are either carcinogens or not. EZ contains tertiary structures of proteins from the 6 Enzyme Commission top level classes. In DD, graphs represent secondary structures of proteins being either enzyme or not enzyme. PF is a subset of DD where the largest graphs have been removed. In NCI1, compounds have either an anticancer activity or not. Statistics about the graphs are presented in table 1.
MT  PTC  EZ  PF  DD  NCI1  
graphs  188  344  600  1113  1178  4110 
classes  2  2  6  2  2  2 
bias ()  66.5  55.8  16.7  59.6  58.7  50.0 
avg. V  18  14  33  39  284  30 
avg. E  39  15  124  146  1431  65 
Each dataset is divided into 10 folds such that the class proportions are preserved in each fold for all datasets. These folds are then used for crossvalidation i.e, one fold serves as the testing set while the other ones compose the training set. Results are averaged over all testing sets. We built the folds using scikitlearn [18] StratifiedKFold function with the random seed fixed to in order to get reproducible results.
The embedding dimension is set to the average number of nodes for each dataset (see appendix B for additional experiments) and a unique set of hyperparameters for the classifier is used for all datasets. We used the random forest classifier from scikitlearn with class_weights: balanced. The other nondefault hyper parameters were selected by randomized cross validation over the different datasets (see table 5 for more details). We also conducted experiments to ensure the robustness of our model with respect to some of its hyperparameters, see Appendix C for more details. All experiments were run on a laptop equipped with an intel core i7 vPro processor and 16GB of RAM.
We compare our results (RFC) to those obtained by Earth Mover’s Distance [17] (EMD), Pyramid Match [17] (PM), FeatureBased [1] (FB), DynamicBased Features [7] (DyF) and Stochastic Graphlet Embedding [6] (SGE). All values are directly taken from the aforementioned papers as they used a setup similar to ours. For algorithms presenting results with and without node features, we reported the results without node features. For algorithms presenting results with several sets of hyperparameters, we reported the results for the set of parameters that gave the best performance on the largest number of datasets. Results are reported in table 2.
MT  PTC  EZ  PF  DD  NCI1  
EMD  86.1  57.7  36.8      72.7 
PM  85.6  59.4  28.2    75.6  69.7 
FB  84.7  55.6  29.0  70.0    62.9 
DyF  86.3  56.2  26.6  73.1    66.6 
SGE  87.2  60.0  40.7    76.6   
SF + RFC  88.4  62.8  43.7  73.6  75.4  75.2 

We see that our model achieves good performance compared to the stateofthe art. It gives the best result on five out of the six datasets (MT, PTC, EZ, PF, NCI1). Besides, it did not require any perdataset hyper parameters intensive tuning as we used the same random forest for all datasets.
The results were obtained extremely quickly (some kernel methods cannot run within one day on DD for example [6]). Embedding all graphs took approximately minutes (most of it dedicated to DD which has the largest graphs and largest embedding dimension), while training and testing the random forest on all folds took less than a minute. Hence, the total time to run all described experiments was less than minutes.
We experimentally showed the interest of normalized Laplacian eigenvalues for graph classification. This feature is easy to extract and can be combined to any other graph representation in order to improve the model performances. We hope it will inspire new approaches to graph classification. Experimenting with permutationinvariant classifiers [12, 19] could be a natural continuation of this work in order to properly include information from eigenvectors of which are nodeindexing dependent.
We would like to thank Thomas Bonald, Sebastien Razakarivony and all the anonymous reviewers for their comments and help. This work is supported by the company Safran through the CIFRE convention 2017/1317.
International Joint Conference of Artificial Intelligence
. Stanford InfoLab, 2003.Pointnet: Deep learning on point sets for 3d classification and segmentation.
Proc. Computer Vision and Pattern Recognition (CVPR), IEEE
, 1(2):4, 2017.Besides RFC, we experimented with different standard classifiers combined to our spectral embedding. Namely: nearest neighbors classifier (
NNC), 2layers perceptron with Relu nonlinearity (MLP), support vector machine with
one versus oneclassification (SVM) and ridge regression classifier (RRC). Results are reported in table
3.MT  PTC  EZ  PF  DD  NCI1  

SF + RFC  88.4  62.8  43.7  73.6  75.4  75.2 
SF + 1NNC  86.8  59.3  37.3  65.6  69.6  68.3 
SF + 15NNC  85.7  61.9  33.7  70.4  75.0  69.6 
SF + MLP  86.3  60.5  31.8  71.6  75.6  62.3 
SF + SVM  85.3  60.8  31.3  73.0  75.0  63.9 
SF + RRC  84.2  59.6  26.7  71.5  75.0  62.2 

As we can see, RFC provides the best results for all datasets except DD where MLP has an accuracy of 75.6 against 75.4. Our intuition to explain these good results is that the decision tree classifier, which is at the core of RFC, is an algorithm based on level thresholding. As explained in section
2, our embedding represents a sequence of energy levels, being above or below a certain level is thus likely to be meaningful for classification.We experimented with different embedding dimensions for RFC:
. The hyperparameters are the same as in section
3. Results are reported in table 4.MT  PTC  EZ  PF  DD  NCI1  

1  76.2  56.1  23.8  64.0  57.2  58.2 
5  86.8  62.5  39.0  69.6  73.9  72.5 
10  86.8  61.4  42.8  71.7  75.5  75.5 
25  88.4  62.8  42.7  72.8  75.7  75.2 
50  88.4  62.8  43.7  73.6  75.1  75.2 

We see that even the first energy level is sufficient to obtain a nontrivial classification. provides results competitive with the state of the art while provides results relatively similar to . We did not experiment with larger values of as it would mostly result into additional zero padding for most graphs. Note that embedding all graphs for took less than a minute in our experimental setting.
In order to confirm the intrinsic quality of our spectral graph representation, we performed robustness analysis of our model with respect to the classifier. To do so, we measured the marginal variation of accuracy with respect to some hyperparameters, the others being fixed. To ensure that we only capture parameters sensibility, we fixed the seed of the random forest to for all experiments. See table 5 for the parameters grid and figure 2 for the results.
We see that our method is very robust against RFC hyperparameters variability. Outliers in boxplots are all due to highly improper parameters (
, , …).RFC hyperparameters  Hyperparameters grid 

1, 10, 50, 100, 250, 500, 750, 1000  
1, 2, 3, 4, 5, 6  
1, 5, 10, 50, 100, 250, 500, 750, 1000  
True, False 
Comments
There are no comments yet.