This is the implementation of the CULP classification algorithm. The paper introducing this algorithm - `Classification Using Link Prediction`
Link prediction in a graph is the problem of detecting the missing links that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms use the graph representation of the data for classification, none are using link prediction as the heart of their classifying procedure. In this work, we propose a novel algorithm called CULP (Classification Using Link Prediction) which uses a new structure namely Label Embedded Graph or LEG and a link predictor to find the class of the unlabeled data. Different link predictors along with Compatibility Score - a new link predictor we proposed that is designed specifically for our settings - has been used and showed promising results for classifying different datasets. This paper further improved CULP by designing an extension called CULM which uses a majority vote (hence the M in the acronym) procedure with weights proportional to the predictions' confidences to use the predictive power of multiple link predictors and also exploits the low level features of the data. Extensive experimental evaluations shows that both CULP and CULM are highly accurate and competitive with the cutting edge graph classifiers and general classifiers.READ FULL TEXT VIEW PDF
Solving classification with graph methods has gained huge popularity in
We consider the graph link prediction task, which is a classic graph
Link prediction, the problem of identifying missing links among a set of...
Learning to predict missing links is important for many graph-based
In adversarial training, a set of models learn together by pursuing comp...
Most real-world networks are incompletely observed. Algorithms that can
Neural link predictors learn distributed representations of entities and...
This is the implementation of the CULP classification algorithm. The paper introducing this algorithm - `Classification Using Link Prediction`
Classification is an old problem in machine learning and pattern recognition that aims at finding a correct mapping between data and their corresponding labels. This mapping would then be used to derive the class of the unlabeled data.
This field is still highly active in the literature and a lot of algorithms have been proposed to correctly classify the data. Most of the classification algorithms aim at finding a decision boundary in the feature space for distinguishing the data belonging to different classes; however, as more complex data require more complex algorithms, these approaches could fail or not capture the true relations in the data.
One of the new approaches that has recently gained popularity in the literature is classification of the unlabeled instances using the graph representation of the data. Data can be represented in different forms one of which is a graph. In this setting, the data is first converted to a graph via a similarity function in the feature space, then unlabeled data is classified by incorporating a graph property. These graph properties are called high level feature which give more insight to the data compared to the low level features.
Classification using graph representation is studied extensively in numerous works (, , , , , , , ). These works use graph properties such as clustering coefficient, modularity, importance, PageRank and others to classify the unlabeled data and they tend to achieve more accurate results compared to the classifiers that classify based on the low level features of data. This approach has been used in text classification , hyperspectral image classification , , image classification , , handwritten digits recognition  and other areas.
Link prediction is the problem of predicting the missing link that would be formed in the graph in near future . Using the graph representation of the data we can treat the classification as a link prediction problem in an intuitive way where we try to find the link between the unlabeled node with it corresponding class. To our knowledge, there are not any work in the literature that uses link prediction to solve the problem of classification, however, the use of classification to solve link prediction is studied extensively .
In this work, we proposed an algorithm called CULP (acronym for Classification Using Link Prediction) that takes a different look at the classification problem through a link prediction approach. As we will elaborate in the paper, CULP uses a graph called LEG that models the data in an intuitive and suitable way for link prediction.
Any link predictors can be used to derive the class of the unlabeled node in CULP and we proposed a new local measure called Compatibility Score that is designed to improve the accuracy of link prediction and consequently classification.
As much insight as high level features have for capturing the patterns present in the data, exploiting the low level feature alongside them would further improve the predictive power of a graph classifiers and different researchers incorporate this idea in their work (, ). This is why we further improved CULP and proposed the CULM extension - a majority vote system (hence the M
in the acronym) with weights proportional to the probabilities of the predictions, this extension uses multiple link predictors along with a low level classifier. As we will see both CULP and CULM algorithms derive highly accurate results which are competitive with low level classifiers and other graph based classification methods.
The rest of the paper is organized as follows; in the next section a review of the general domains used in this paper is presented which is a preliminary section elaborating the problem of link prediction, similarity measures in vector space, method of converting graph to data and the problem of classification. After that a section of related works is given which is a summary of recent works using graph representation of the data for classification. Next, the CULP algorithm is presented with full details which elaborates on theLEG (Label Embedded Graph) structure, the classification procedure which uses link prediction, our novel link predictor - Compatibility Score, the time complexity and a toy example to demonstrate CULP. Finally, the CULM extension is presented which is followed by our extensive experimental results to put our proposed algorithms into perspective. At the end, the conclusion to the paper and the aim for future works are presented.
To fully understand CULP, a grounding for the details comprising this algorithm should be set. In this section, a general review to graph theory concepts and notations along with the definition of the link prediction problem in complex networks is given. After that, an overview of some of the most important similarity measures is presented, following this the different ways of converting data to graph is discussed. Finally at the end of this section the problem of classification is defined.
Given a set of vertices and a set of edges containing where the data structure can be defined as a graph. If the elements in
are ordered pairs,is considered to be a directed graph. In an undirected graph if it is implied that . Regardless of the directionality of the graph, node is a neighbor node to node if . For a node , is the set of the neighbor nodes of .
For the graph , adjacency matrix or simply is defined as an matrix with zero-one elements and . For any entry in , if and only if . In an undirected graph by definition . As our focus in this paper is toward undirected graph, for the sake of simplicity we use graph to state an undirected graph.
The degree of a node in a graph can be derived using . For any graph, the cardinality or can be obtained by summing over the degree of all nodes using Equation 1 where .
The problem of link prediction in a graph arises when the goal is to predict for the currently absent links ( entries in ) the probability of link formation in the future. There are many functions to predict the link prediction scores. These functions usually compute the local similarity between the nodes to derive the scores. One of the simplest techniques is known as common neighbors (CN) . Using this approach the prediction scores can be derived using the following:
Equation 2 simply counts the number of common neighbors of nodes and to derive a score for their link formation.
Another approach to find the link formation score is introduced by Adam and Adar  which uses degrees of common neighbors as features for prediction and it can be written as
Equation 3 is known as the Adamic-Adar score (AA). This score penalizes the features by their logarithm and uses these features for deriving the prediction scores. Another famous approach for tackling the problem of link prediction is the Resource Allocation Index (RA)  that simulates the transition of resources between nodes and . This index is defined as Equation 4.
This index is quite similar to AA, however it does not use the logarithm function which reduces the effect of nodes with high degree. This has the benefit of penalizing high degree common nodes. In a lot of networks, these nodes provide little insight for link prediction as they are connected to a lot of other nodes in the graph.
In this work, we are proposing a new similarity function used for the purpose of link prediction. called Compatibility Score which is discussed further in the paper.
Any data point with numeric features where can be regarded as a vector in an -dimensional space. This view would enable the measurement of the similarities between data points using conventional similarity measures. As we are going to utilize a similarity measure in converting our data to graph(discussed in the next segment), we are going to provide overview of some of these measures.
Having our data matrix , with rows and columns with each row being a data vector, the Cosine similarity can be defined as the following:
Where denotes the Euclidean norm of the vector x which is derived by the following:
Following the above equation, the Euclidean distance between any two dimensional vectors can be written as:
Utilizing the Euclidean distance, another similarity measure - namely Inverse Euclidean can be defined using:
In Equation 7 the term is a small number used to avoid division by zero in case of identical vectors. Another prominent distance in linear algebra is what is known as the absolute or Manhattan distance (Equation 8) and by substituting Equation 8 in Equation 7, the Inverse Manhattan similarity function is defined.
Any vector based data can be represented as a graph. Doing this would result in changing the structure of the data which enables us to compute high level features.
Two of the most used procedures for converting data to graph are -Radius and NN methods .
Using a similarity measure (e.g. cosine similarity discussed in the previous segment)and matrix data we can use either of these two algorithms to convert the data into a graph. In -Radius, an edge is created between every pair of data points that have a similarity higher than a predefined threshold . Another approach is using -nearest neighbors to form up the graph. If (based on a ) is in the -nearest neighbors of the edge is created.
Due to the fact that NN relation is not symmetric this approach would generally results in a directed graph. However the same principle can be used to create an undirected graph as in Algorithm 1. Using this approach, if has instances, the number of undirected edges in the created graph is bounded by . CULP uses an undirected NN modeling of the data for the task of classification.
Suppose there are two sets of data, with instances and features for each instance which is the set of our labeled data. The labels of is denoted by where with being the number of classes. Each pair makes up our training data. The other set of data is with instances and again features for each instance which are the unlabeled or the test data.
The classification problem aims at finding a mapping for every . In other words, we are trying to find a proper label for each of the unlabeled instance in . If , this is called binary classification and if , the problem is called multi-class classification .
Classifiers like NN or Decision Tree can naturally handle multi-class classification problems, however some classifiers like SVM are inherently designed for the binary classification task and upgrading them to handle multi-class classification requires using One vs. All or One vs. One approaches .
In one vs. all, classifiers are trained and each classifier has the task of deciding whether an instance belongs to a particular class or not. The one vs. one approach is done by training classifiers to classify an instance into either of two classes among all of the classes.
Using graph classification has recently gained popularity and numerous works ([2, 3, 4, 5, 6, 7, 8]) focus on using this approach instead of the classical methods of classification . These method can capture complex patterns in the data and they can generate high level features to guide the classification procedure, furthermore they can usually be modified to utilize the low level features of the data as well.
In  a random walker is used to classify unlabeled instances on the graph embedding of the data. This graph is represented by a weight matrix of similarities. The random walk process is continued until convergence and the new data receives the label through a weighted majority vote between the labels of the top nodes with highest probabilities. This method takes the smilarity among the data points into account with a single network for the dataset along with structural changes of an unlabeled instance on the networks created for each class. The complexity of the method is of , however, as the authors claimed, using sparse representations such as NN network, and graph construction method based on Lanczos bisection , this complexity can be reduced to a complexity between and .
Another system is proposed in  in which a graph is created for the training instances of each class, then using the proposed spatio-structural differential efficiency measure in the paper, a test instance is connected to some of the nodes in each graph. The label of the data would be the class of the graph that the test data has the highest importance in. The importance is characterized by Google’s PageRank measure of the network. The spatio-structural differential efficiency measure in  takes considers both physical and topological properties of the data and the complexity of the proposed method is again of which is once more reduced to a complexity between and by using graph construction method based on Lanczos bisection.
A hybrid method is proposed in  that aids a typical classifier (such as NN, SVM or Naive Bayes) by using high level features. These high level features are the difference of some graph properties before and after inserting a new instance into the graph representation of the data of each class. The graph of each class is constructed using combination of -radius and NN graph conversion methods. The graph properties used in their work are assortativity, network clustering coefficient and average degree. The label for the test instance is generated by a weighted combination of low level and high level features. The authors extended their work in  by using two more high level features namely Normalized Average Distance among vertices and coreness variability and using a stacking procedure to learn the weight for each feature. Also  extends the same work by discarding the use of any classical classifier and using a scheme that takes low level features techniques into account to filter irrelevant graphs of some of the classes.
and computing a posterior probability for each class to classify new instances. Similar toNN graph conversion method, k-Associated Optimal Graph computes the similarity of a data point with all of the training data, however, it would form an edge only if the points belong to the same class. This would result in having multiple component (and possibly more than one component for a class). The method furthermore tries to find a local for each class so that the resulting components get the maximal (a measure based on average degree of a component). This way the process of finding the parameter is conducted automatically which also make the complexity of the framework of . Another paper  also uses the k-Associated graph in this paper along with the high level classification method of  to classify new instances.
Other methods using different graph measures have been produced as well.  uses dynamic entropy for each weighted graph produced by -radius where the weights denote the distance between data points.  utilizes the modularity measure for classifying new instance that belongs to a pattern set of the same object in the training data. The label is derived by creating a NN graph for each pattern set and choosing the label of the graph with lowest modularity change after insertion of the new data. Both of the methods in  and  have the complexity of .
The graph based classification methods in the literature mostly have three characteristics in common. Firstly they create a different graph for each classes of the data; this approach avoids finding meaningful pattern that may form by the similarities between points in different classes.
The second aspect these algorithms have in common is that they treat test instances individually and add them to the graph of each class and measure a graph property before and after the insertion. This makes the prediction of a new instance inefficient in presence of large amount of test data.
Lastly, the properties that these algorithms use for finding the differences before and after the insertion of the unlabeled data (e.g. clustering coefficient, average path etc.) are time consuming and their computation times are usually dependent on the graph size which can make them infeasible for large datasets.
Our proposed algorithm CULP and it’s extension CULM solves the first and second issue by employing a novel graph representation called LEG which treats classes as nodes along with training and test instances as a unified object and is discussed further in the paper. As for the third problem, since the label of a test instance is derived using link prediction measures (as discussed in the previous section), the classification of the unlabeled data is faster than the similar methods.
CULP (Classification Using Link Prediction) is a classification method aimed to gain a higher accuracy in mulit-class classification task by exploiting the similarity among the data points. This algorithm employs the powers of graph representation and link prediction methods in complex networks to deal with this problem111The complete code of CULP in python can be found in github.com/aminfadaee/culp. The overall structure of CULP is consisted of 2 stages:
Creating the LEG structure from the data
Classifying the test data using
In the first step we model our data into an augmented graph data structure called LEG (Label Embedded Graph) which we call . is a heterogeneous graph which incorporates the data, the classes and the similarity between them as a unified object.
LEG essentially contains 3 sets of nodes and 2 set of links. The different type of nodes in are training nodes, testing nodes and class nodes, also a link between two data nodes denotes similarity between them and a link between a training node and a class node denotes the class membership of that node.
After creating , we can convert the classification problem to the problem of predicting the class membership link of a testing node. By utilizing a link prediction algorithm in the next step, membership score for every testing-class pair of nodes is computed.
Each of the membership scores acts as a posterior probability. A label is chosen for a testing node based on these scores.
CULP procedure is depicted in Algorithm 3. In the next segments each of the steps of the proposed algorithm is covered in more detail.
The first step toward classification using CULP is creating the LEG representation. LEG is a heterogeneous graph with three sets of nodes:
Training nodes ()
Testing nodes ()
Class nodes ()
and two sets of edges:
Similarity edges ()
Class membership edges ()
Each set of nodes correspond to their analogous set of data i.e. contains nodes, contains nodes and contains nodes.
The class membership edges are created based on the labeled data. contain edges where and is the node representation of , meaning that each training node is connected (without direction) to its corresponding class node. It should be noted that since the labels for the test data is not available, contains only pair of nodes from and .
Unlike , the members of are not obtained so trivially. is responsible for incorporating the similarities between instances of our data and the edges in this set are obtained by using a graph conversion algorithm. In this work the undirected version of NN graph conversion (Algorithm 1) is used.
Edges in primarily connect two nodes in or a node from to one in . However, there is no constraint on having an edge between two nodes in , meaning that we can find the similarity between unlabeled data and connect them as well (as we have done in this work).
If the unlabeled data is not available at first or in case of a new unlabeled node this node is first added to the set , after that the similarity edges between this node and other nodes of the graph is created through a linear similarity computation.
After creating all of the sets of nodes and edges, we can define the LEG where and . Although is inherently heterogeneous, we can treat it as a simple undirected graph. The procedure for creating is summarized in Algorithm 2. This algorithm takes the labeled and unlabeled data along with the parameter and the similarity measure and produces as the output.
There are always edges belonging to . The number of edges in however, has an upper and lower bound. The minimum number of possible edges in is obtained when the NN procedure of each pair of points in () is symmetric - meaning that , . The maximum number of edges in on the other hand is obtained when the NN procedure is not symmetric for any pair of nodes in . Using these, the bounds on the number of edges in a LEG can be derived as Equation 9.
By the bounds in Equation 9, it can be stated that gives us a new low memory cost representation of the data. The memory for the original data is of for , and , but since it is usually the case that
for high dimensional data, LEG saves a lot of memory compared to using the original data for the task of classification.
Another aspect of LEG is the fact that we are incorporating all of our labeled and unlabeled data and class labels in a unified structure that enables us to find the labels of the test data via simple and efficient graph properties, specifically link prediction methods which is covered in the next segment.
As stated before, in classification, the goal is to find a mapping for every . Using the LEG representation, this problem can be reformatted as finding for so that the probability of is maximized.
The new formulation means that edges will be added to the set by predicting the most probable membership link for every test node. This can be easily done via link prediction methods discussed before.
Using a local similarity measure for link prediction (e.g. Adamic-Adar index), this problem can be solved using the following:
Although more complex link prediction methods (random walk, average path length etc.) can be used to solve the problem, the local similarity measures are not only extremely fast and efficient to compute but they also derive competitively accurate results as it will be discuss in the experiments. The pseudocode of CULP is depicted in Algorithm 3.
In this work a novel local score for link prediction is formed which is designed specifically for the task of classification. This new similarity function is called Compatibility Score and like Adamic-Adar and Resource Allocation scores penalizes the common neighbors, however, this penalization is done differently.
Both AA and RA scores can be unfair in some instances, meaning that they can over-penalize a valuable common neighbor or give the same score to two inherently different nodes. Take the two LEGs in Figure 1 for example (, and ). In both cases the goal is to find the score for the link. AA and RA would both penalize node in the same way (penalty of for RA and for AA); however, in the first LEG the node is more valuable than that of the second LEG and this is due to the fact that three neighbors of this node (, , ) are also connected to node .
When trying to predict the score for the formation of link between nodes and with a common neighbors between them namely , two sets of edges can be defined starting from : compatible edges and incompatible edges.
Compatible edges for node are the ones connecting to nodes which are by themselves connected to the destination of the candidate link ( in this case). We can define incompatible edges as all the other edges which are not compatible.
Now the cardinality of incompatible edges or the incompatibility penalty for node which is a common neighbor of nodes and can be defined as the following:
Using the Compatibility Score for the cases of Figure 1 the score for link in LEG 1 can be computed as and in LEG 2 as . This is the desired outcome as the score in LEG 1 is now higher. In the experiments, a more detailed comparison of CS with other link prediction methods is done.
In this subsection, the time complexity of finding the class membership edge of a test node will be analyzed. The main component in finding the correct link is the local similarity measure which is used for link prediction. These local measures find the score in time proportional to the degree of their source and destination nodes. In CULP, the source node belongs to and the destination node belongs to . So the first step in analyzing the time of finding a class membership edge is finding the average degree of nodes in and .
The degree of node is the number of labeled nodes connected to it or more specifically which is the number of data points with class of node ; however, for the degree of a more detailed analysis is needed. As stated before, in any undirected graph Equation 1 holds. This equation can be rewritten as the following:
Since the degree of the class nodes sums up to the number of labeled data , it can be substituted in the above equation; on the other hand, if we treat each node in to have average degree , we can state that nodes in would have average degree of (since each of them has also a membership edge). Using all these, the above formula can be rewritten in the following manner:
and its lower bound as:
Consequently, the average degree for labeled and unlabeled nodes is of and for class nodes is of . The Common Neighbor, Adamic-Adar and Resource Allocation all have the complexity of finding the common neighbors between source and destination which is the intersection of the neighborhoods of the two nodes. The Compatibility Score however, first finds the common neighbors and does two intersection for each of the nodes in the common neighbor set.
If done efficiently, the intersection of two sets with sizes and can be obtained in order of in average. Using this, the complexity of finding the score in LEG for the formation of links between and is of when Common Neighbor, Adamic-Adar or Resource Allocation is used and is when Compatibility Score is used. Since is usually small (in our experiments ), it is safe to state that the link prediction is done in constant time; also as there are nodes in , predicting the label of instances would take time of after creating the LEG.
In this subsection a simple classification problem is solved using CULP to demonstrate the steps involving in this algorithm. The data is presented in Figure 2-A as two classes. The white points represent the data of class 1 and the dark points belong to class 2. The problem is finding the correct label of the red point (point ).
The first step is choosing a similarity function and a value for the parameter for forming the graph. Here we chose and the Euclidean similarity (discussed in the preliminaries section).
Now the node sets can be defined as , and all the other points as the set . By creating the edges in and as shown in Algorithm 2 the LEG in Figure 2-B can be derived. As can be seen, in this graph every node except for is connected to one of the class nodes and (white nodes) by dotted links and the black links represents the edges of .
Looking at the graph, it can be seen that the node is connected to nodes , and . This means these nodes would assist in finding the label for node . Using these nodes, the scores for edges and can be obtained with each of the scores discussed before as . The results of computing these scores are depicted in Table I.
As we stated in the time complexity analysis subsection and demonstrated in the toy example of the previous section, once the LEG structure is formed, the prediction of links can be done instantly; knowing this and the fact that there are different options in choosing the link predictor , the question arises as to why not use all of our predictors and somehow combine their predictive capabilities to assist us in finding the best membership link for a test node?
The next question arises after we analyze the related works done in the field of classification using complex network representations. A good portion of these methods are capable of incorporating or exploiting the low level features of the data to enhance the classification performance. How can we modify our framework CULP to exploit the low level features of the data as well as the high level features?
The answer to both of these questions lies in our extension to CULP algorithm which we call the CULM extension. CULM increases the predictive capabilities of CULP by using a weighted majority vote procedure (hence the M as in Majority in the end instead of P).
Instead of using only one link predictor , we will use an array of link predictors . Each link predictor when used, gives a score to the links for all
. We can use all of these scores to estimate the probabilityof our prediction correctness as Equation 16.
In this equation is the label corresponding to and is computed using Equation 10 of the previous section. Using Equation 16 we can assign confidence to the prediction of . When using multiple predictors, it is obvious that a with higher confidence is more reliable. We are going to use these probabilities to assign weights to each of the s in . This way instead of using a simple majority vote, a weighted voting procedure can be used. In a weighted majority vote procedure, few predictions are aggregated. Each of these prediction has an individual weight which states the value of their vote; finally the voting in this setting would be done as Algorithm 4.
In Algorithm 4, is the set containing the predicted labels of each of the predictors, is the respective weights of the labels and is a set with elements which keeps track of the weight for each of the classes. Using this algorithm enables us to not only use multiple link predictors’ predicted labels, but also incorporate arbitrary any classical classifier with suitable weights. This way the low level features of the data is exploited as well.
The next step is to define the weights for each of our predictors and . If is the predicted label of the predictor for the unlabeled data and is the probability of this prediction, the weight of predictor for can be defined as Equation 17. Also for the prediction of on which can be denoted as , we can define the weight as Equation 18.
The parameter which is used in both equations is provided by the user. This parameter controls the trade-off that CULM will make between the link predictors’ labels and the prediction of the low level classifier.
The parameter is chosen in the range to ; however any value below would result in neutralizing the vote of CULM predictors. Also if , the prediction is completely done by CULM predictors and the low level classifier is ignored; so in general it can be stated that .
Now the CULM extension can be formally defined as the procedure captured in Algorithm 5. In this algorithm, after creating the LEG, each of the predictors in produce a label and a probability. These probabilities and labels are then merged with that of the low level classifier to form up and which are passed to Algorithm 4 to produce the final label for the test instance.
In this section, we are presenting the result of our proposed algorithms CULP and CULM on 20 different real datasets and comparing it to classical classification methods as well as best classifiers of the related works in the domain of classification using complex networks.
The datasets used for our experiments are all obtained from UCI machine learning repository . These datasets include Zoo, Hayes-Roth (Hayes), Iris, Teaching Assistant Evaluation (Teaching), Wine, Sonar Mines vs. Rocks (Sonar), Image Segmentation training set (Image) and testing set (Segmentation), Glass Identification (Glass), Thyroid Disease (Thyroid), Ecoli, Libras Movement (Libras), Balance Scale (Balance), Pima Indians Diabetes (Pima), Statlog Vehicle Silhouettes (Vehicle), Vowel Recognition (Vowel), Yeast, Wine Quality Red (RedWine), Optical Recognition of Handwritten Digits (Optical), Poker Hand (Poker). Each of these datasets along with the number of instances, attributes and classes is listed in Table II.
|Zoo||95.567 5.8 (2)||96.567 5.3 (2)||96.833 5.4 (2)||96.767 5.4 (2)|
|Hayes||73.949 12.0 (1)||73.718 12.1 (1)||73.718 12.1 (1)||73.667 12.1 (1)|
|Iris||98.467 3.0 (11)||98.467 3.0 (11)||98.489 3.0 (11)||98.378 3.2 (11)|
|Teaching||63.756 11.3 (1)||63.356 11.1 (1)||63.356 11.1 (1)||63.622 11.2 (1)|
|Wine||98.549 2.8 (12)||98.745 2.6 (12)||98.725 2.7 (12)||98.137 3.2 (12)|
|Sonar||87.467 7.4 (2)||87.250 7.3 (3)||86.900 7.3 (3)||87.100 7.5 (3)|
|Image||88.333 6.7 (3)||89.317 6.3 (3)||89.175 6.3 (3)||89.063 6.4 (3)|
|Glass||71.857 9.1 (3)||73.540 9.2 (3)||73.397 9.3 (2)||74.048 9.1 (2)|
|Thyroid||97.540 3.1 (4)||97.413 3.2 (4)||97.413 3.2 (4)||97.333 3.3 (4)|
|Ecoli||86.798 6.1 (9)||87.010 6.0 (8)||87.141 6.1 (9)||87.030 6.0 (8)|
|Libras||79.935 6.5 (2)||82.472 6.3 (2)||81.713 6.4 (2)||82.750 6.2 (2)|
|Balance||93.753 2.9 (6)||96.446 2.2 (2)||96.446 2.2 (2)||96.780 2.2 (2)|
|Pima||76.061 4.5 (34)||76.154 4.4 (28)||76.211 4.4 (28)||76.355 4.3 (7)|
|Vehicle||73.611 4.4 (5)||73.091 4.7 (5)||73.198 4.7 (5)||72.512 4.7 (5)|
|Vowel||97.603 1.6 (3)||98.242 1.5 (2)||98.242 1.5 (2)||97.886 1.5 (2)|
|Yeast||59.682 3.9 (22)||59.971 3.7 (20)||60.032 3.6 (20)||60.365 3.8 (22)|
|RedWine||60.501 3.9 (1)||60.166 3.9 (2)||60.036 3.9 (2)||60.574 3.8 (2)|
|Segment||96.333 1.3 (3)||96.535 1.2 (4)||96.525 1.2 (4)||96.281 1.3 (4)|
|Optical||98.805 0.4 (5)||98.905 0.4 (5)||98.918 0.4 (5)||98.851 0.4 (4)|
|Poker||58.518 0.9 (32)||58.604 0.9 (32)||58.625 0.9 (32)||58.520 0.9 (32)|
The reason behind choosing these datasets is the variety of both structure and domain between them. The size of these data is between 101 to 25,010 which test the practicality of our algorithms on both small and large datasets; the number of attributes vary from 4 to 90 which test the proposed algorithms against both low and high dimensional datasets and finally there is a lot of variety in the number of classes in the datasets which ranges from 2 up to 10.
This section is organized as follows: first, the experiment on CULP and different predictors as is presented, after that the CULM algorithms is analyzed with 3 different low level classifier, the following subsection will discuss the effects of parameter, after that a comparison of CULP and CULM with classical classifiers will be demonstrated and finally CULP and CULM will be compared along all the classical approaches and the similar works around classification using complex networks.
As the first experiment, different link predictors are used in CULP to compare the performance of each one on the datasets. For this experiments the predictor is one of the CN, AA, RA and CS which are respectively defined in Equations 2, 3, 4, 12.
For each and each dataset, the parameter (), the vector similarity function and a preprocessing procedure on the data (none, normalization or principle component analysis) is tuned. This tuning is done via a 10-Fold Cross Validation procedure. After finding the best parameters, 30 runs of 10-Fold Cross Validation is done that amount to total of 300 runs. Table III captures the results obtained by these settings.
In each cell of Table III
, the first number is the mean accuracy of the runs and the second number is the standard deviation of them. The number in the parentheses represent the bestobtained for each cell and the bold cell are the best result obtained on a dataset.
As can be seen in Table III, the Compatibility Score achieved the best results among the predictors, this is due to the fact that CS exclusively got the highest accuracy on 6 datasets of Glass, Libras, Balance, Pima, Yeast and RedWine. In the second place is the Resource Allocation Index that obtained the top accuracy for Zoo, Iris, Ecoli, Optical and Poker exclusively and achieved an identical best accuracy with Adamic-Adar Score on the Vowel dataset. The third best predictor is the Common Neighbor with 5 datasets of Hayes, Teaching, Sonar, Thyroid and Vehicle on top and finally Adamic-Adar for Wine, Image and Segment and the shared best results with RA for Vowel.
Analyzing the s in this experiments, we can see that for 10 datasets of Zoo, Hayes, Iris, Teaching, Wine, Image, Thyroid, Libras, Vehicle and Poker the best is identical for each predictor on a dataset; in Balance and Pima however; the s are noticeably different with Common Neighbor having the highest in both of them. In the rest of the datasets the choice of among different predictors are at most different by 1 (for Yeast it is 2).
|Zoo||96.833 5.4 (RA, 2)||97.467 5.3 (1, 0.6)||97.500 5.0 (1, 0.6)||97.000 5.9 (1, 0.6)||+0.667|
|Hayes||73.949 12.0 (CN, 1)||74.513 11.6 (1, 0.7)||76.949 11.1 (1, 0.6)||76.487 11.1 (1, 0.6)||+3.000|
|Iris||98.489 3.0 (RA, 11)||98.467 3.0 (11, 0.7)||98.467 3.0 (11, 0.7)||98.467 3.0 (11, 0.7)||-0.022|
|Teaching||63.756 11.3 (CN, 1)||65.667 11.6 (1, 0.6)||64.200 12.0 (1, 0.6)||65.622 11.7 (1, 0.6)||+1.911|
|Wine||98.745 2.6 (AA, 12)||98.843 2.9 (12, 0.7)||98.706 2.7 (12, 0.7)||98.745 2.6 (12, 0.7)||+0.098|
|Sonar||87.467 7.4 (CN, 2)||87.233 7.2 (2, 0.6)||87.050 7.4 (3, 0.7)||87.817 7.3 (2, 0.6)||+0.350|
|Image||89.317 6.3 (AA, 3)||90.349 6.2 (3, 0.6)||90.333 6.0 (3, 0.6)||89.571 6.3 (3, 0.7)||+1.032|
|Glass||74.048 9.1 (CS, 2)||74.095 9.1 (2, 0.6)||74.952 8.8 (2, 0.6)||74.365 9.4 (2, 0.6)||+0.904|
|Thyroid||97.540 3.1 (CN, 4)||97.540 3.1 (4, 0.6)||97.492 3.1 (4, 0.6)||97.540 3.1 (4, 0.6)||0|
|Ecoli||87.141 6.1 (RA, 9)||87.475 5.8 (8, 0.6)||87.495 5.9 (8, 0.6)||87.293 5.8 (9, 0.6)||+0.354|
|Libras||82.750 6.2 (CS, 2)||82.843 6.0 (2, 0.6)||82.370 5.8 (2, 0.6)||82.944 5.9 (1, 0.6)||+0.194|
|Balance||96.780 2.2 (CS, 2)||97.016 2.0 (2, 0.6)||96.694 2.1 (2, 0.7)||97.946 1.7 (2, 0.6)||+1.166|
|Pima||76.355 4.3 (CS, 7)||76.535 4.5 (7, 0.6)||76.461 4.5 (7, 0.6)||76.373 4.6 (7, 0.6)||+0.180|
|Vehicle||73.611 4.4 (CN, 5)||74.829 4.6 (5, 0.6)||73.897 4.5 (5, 0.6)||74.167 4.6 (5, 0.6)||+1.218|
|Vowel||98.242 1.5 (AA, 2)||98.461 1.3 (2, 0.9)||98.508 1.4 (2, 0.9)||98.620 1.3 (2, 0.8)||+0.378|
|Yeast||60.365 3.8 (CS, 22)||60.360 3.6 (20, 0.6)||60.288 3.7 (20, 0.6)||60.113 3.7 (20, 1)||-0.005|
|RedWine||60.574 3.8 (CS, 2)||64.170 3.7 (1, 0.6)||63.453 3.7 (1, 0.6)||64.447 3.6 (1, 0.6)||+3.873|
|Segment||96.535 1.2 (AA, 4)||96.673 1.3 (2, 0.6)||96.922 1.2 (2, 0.6)||96.651 1.3 (2, 0.6)||+0.387|
|Optical||98.918 0.4 (RA, 5)||98.905 0.4 (4, 0.9)||98.890 0.4 (4, 0.9)||98.890 0.4 (4, 0.9)||-0.013|
|Poker||58.625 0.9 (RA, 32)||58.581 0.9 (32, 1)||58.695 0.9 (32, 0.6)||58.760 0.9 (32, 0.6)||+0.135|
As the next experiment, the CULM algorithm is run on each of the datasets. The parameter is tuned over the set . All the values below for is not used to keep the results and comparisons fair (as stated before, any value below for zeros the effect of CULP predictors also experimentally the same holds for ), this way we are sure that the link predictors is not completely overshadowed by the low level classifier. Other parameters of the algorithm and the tuning is done as before and again each cell is the result of 300 runs.
For a low level classifier to accompany the link predictors in CULM, three different algorithms have been chosen and used. These low level classifiers are LDA (Linear Discriminant Analysis), CART (Classification And Regression Trees) and multi-class SVM
(Support Vector Machine) with RBF kernel.
Table IV captures the results of this experiments. The first column is the best results for each of the datasets using CULP (Table III); the next three columns are the results of CULM with respectively LDA, CART and SVM as and in each of the cells in these column the numbers in parentheses represent the and used in runs. The last column in this table represents the accuracy gain achieved by using CULM instead of CULP. Each of the numbers in this column is obtained by comparing the best result obtained by CULM with the best result obtained by CULP for each dataset.
Looking at Table IV it is clear that in the Thyroid dataset, using CULM achieved no change in the accuracy and in the datasets Iris and Optical the accuracy deteriorates; however, taking into account the other 17 datasets, CULM almost achieved a completely higher result.
CULM with SVM as its low level classifier achieved the best results on 6 datasets of Sonar, Thyroid, Libras, Balance, Vowel, RedWine and Poker exclusively and shares the best result on Thyroid with CULM-LDA and CULP. As the next best classifiers we have both CULM-CART and CULM-LDA with exclusively 5 best accuracy each (Zoo, Hayes, Glass, Ecoli and Segment for CULM-CART and Teaching, Wine, Image, Pima and Vehicle for CULM-LDA).
Datasets Hayes and RedWine achieved the highest accuracy gain (more than ) using CULM which is a noticeable boost. In the next level are datasets Teaching, Image Balance and Vehicle with more than gain. In general, the collective amount of gain achieved using CULM is the average of through all datasets which is another proof that CULM achieves a better results than CULP.
As for the parameter , more robustness can be observed among different CULM classifiers than variations of CULP. Except for the datasets Sonar, Ecoli and Libras, the choice of in all variations of CULM are identical, also in these three datasets this parameter is different by at most on each classifier.
The other parameter in this experiments reveals interesting facts as well. Except for the CULM-SVM on Yeast data and CULM-LDA on Poker dataset, we can observe in all the experiments; this shows that using the low level features through the low level classifier did indeed help the classification accuracy. Saying this, we still need a more detailed analysis on the effect of on the accuracy which is the main discussion of the next segment.
To analyze the parameter further, six datasets were chosen, each with a single configuration to run with different values. The datasets are Zoo with , Hayes with , Iris with , Teaching with , Wine with and Sonar with and in each experiment. The choices for is to demonstrate the effect of zeroing the effect of predictors (), zeroing the effect of low level classifier () or picking something in between.
The results of this experiment are depicted in the charts of Figure 3. Each chart represents the experiments done on a dataset. These charts capture the accuracy of each of the 3 CULM classifiers for each value of . Red lines are demonstrating the accuracy of CULM-LDA, black lines are CULM-CART and the gray lines depict the results of CULM-SVM.
As stated before, any value below for zeros the effect of CULP predictors, we also noted that experimentally the same holds for . This is evident by looking at the plots of Figure 3 because in all datasets and classifiers the accuracies obtained for and are identical.
As can be seen from the figure, for all classifiers of the datasets Zoo, Iris, Teaching and Sonar, using the predictors improved the accuracy of the low-level classifier; on the other hand, in all datasets zeroing the effect of the low-level classifier () had not helped (if not worsened) the accuracy of the prediction. The other notable detail in these plot is the plateau of accuracy for roughly the values of between and . This means that a less fine-grained set of values can be also used for tuning this parameter.
|Zoo||96.833 5.4 (2)||97.500 5.0 (1)||97.500 12.7 (1)||92.033 8.2||94.867 6.8||90.233 10.0|
|Hayes||73.949 12.0 (1)||76.949 11.1 (1)||72.590 14.6 (1)||53.282 13.0||81.667 9.5||85.103 8.4|
|Iris||98.489 3.0 (11)||98.467 3.0 (11)||97.222 5.4 (17)||98.000 3.5||94.556 5.6||97.533 3.8|
|Teaching||63.756 11.3 (1)||65.667 11.6 (1)||62.511 12.7 (1)||53.044 13.2||64.333 11.4||54.378 12.7|
|Wine||98.745 2.6 (12)||98.843 2.9 (12)||97.000 5.6 (24)||98.941 2.4||90.294 6.9||98.392 3.0|
|Sonar||87.467 7.4 (2)||87.817 7.3 (2)||86.383 8.0 (1)||74.117 9.1||70.850 9.7||83.767 7.8|
|Image||89.317 6.3 (3)||90.349 6.2 (3)||85.667 8.3 (4)||89.635 6.1||88.222 6.8||87.317 6.4|
|Glass||74.048 9.1 (2)||74.952 8.8 (2)||72.683 10.4 (1)||62.381 10.0||66.698 9.1||70.190 9.8|
|Thyroid||97.540 3.1 (4)||97.540 3.1 (4)||96.206 5.8 (1)||91.397 5.7||93.857 5.4||95.921 4.0|
|Ecoli||87.141 6.1 (9)||87.495 5.9 (8)||86.909 6.3 (7)||86.869 5.6||79.515 6.8||86.828 6.0|
|Libras||82.750 6.2 (2)||82.944 5.9 (1)||85.880 8.0 (1)||64.620 8.4||68.713 8.2||80.306 6.6|
|Balance||96.780 2.2 (2)||97.946 1.7 (2)||90.140 5.5 (15)||86.747 3.9||81.306 5.9||90.489 3.6|
|Pima||76.355 4.3 (7)||76.535 4.5 (7)||74.171 4.8 (9)||77.320 4.3||70.289 5.1||76.013 4.4|
|Vehicle||73.611 4.4 (5)||74.829 4.6 (5)||72.206 5.2 (6)||78.052 4.3||71.282 4.9||76.675 4.8|
|Vowel||98.242 1.5 (2)||98.620 1.3 (2)||98.983 2.3 (1)||59.556 4.7||81.192 4.2||94.852 2.3|
|Yeast||60.365 3.8 (20)||60.360 3.6 (20)||59.586 3.8 (19)||58.923 3.8||51.205 4.0||60.124 3.7|
|RedWine||60.574 3.8 (2)||64.447 3.6 (1)||64.662 3.8 (1)||59.172 3.9||63.390 3.8||62.637 3.7|
|Segment||96.535 1.2 (4)||96.922 1.2 (2)||95.829 1.8 (1)||91.446 1.9||95.459 1.4||93.825 1.6|
|Optical||98.918 0.4 (5)||98.905 0.4 (4)||98.823 0.5 (3)||95.278 0.8||90.532 1.3||98.681 0.5|
|Poker||58.625 0.9 (32)||58.760 0.9 (32)||58.517 1.0 (34)||49.952 0.9||48.948 1.7||58.617 0.5|
For the next experiment, the results of CULP and CULM is compared with 4 classical classifiers. These classifiers include NN classifier, LDA, CART and multi-class SVM with RBF kernel.
The results of this experiment is captured in Table V. In this table the first column represent the best result of CULP for each dataset, the second column is the best result of CULM for each dataset and the other 4 columns are the results obtained by the classical classifier. The number in the parentheses in the cells of first three column represent and the bold cells are the best results obtained on the dataset.
Comparing the values in the first 3 columns of Table V, we can realize that except for Ecoli and Yeast, this parameter is smaller (or equal) for CULP and CULM than that of the NN algorithm and in some cases like Wine and Balance this difference is quite high. This is due to the fact that the undirected version of nearest neighbor is used to form up the LEG graph which consequently enables us to capture the similarity features with less neighbors.
It is evident from the results that CULP and CULM achieved superior results compared to the classical algorithms. CULP and CULM collectively obtained the best results on 13 datasets. The NN and LDA algorithms achieved the highest results on 3 datasets each, SVM got the best results only on the Hayes dataset and CART is completely outperformed by the other algorithms on all datasets.
One thing that can be noted is the fact that CULM could obtain the best results on the datasets Hayes, Wine, Pima and Vehicle with but as stated before we decided to forgo these values to give a fair comparison; however, in general we can state that CULM can outperform or achieve the same result of any classical classification algorithms given the right configuration for the parameter.
|Zoo||96.833 5.4||97.500 5.0||97.500 12.7||97.00 0.1||99.03 2.9|
|Hayes||73.949 12.0||76.949 11.1||85.103 8.4||61.70 2.3||73.09 11.7|
|Iris||98.489 3.0||98.467 3.0||98.000 3.5||98.00 0.6||97.20 3.7|
|Teaching||63.756 11.3||65.667 11.6||64.333 11.4||65.30 2.0||62.08 13.4|
|Wine||98.745 2.6||98.843 2.9||98.941 2.4||87.10 1.6||93.95 5.3|
|Sonar||87.467 7.4||87.817 7.3||86.383 8.0||81.79 7.8||82.00 7.5|
|Image||89.317 6.3||90.349 6.2||89.635 6.1||75.60 0.8||86.13 7.2|
|Glass||74.048 9.1||74.952 8.8||72.683 10.4||72.80 1.1||71.75 7.9|
|Thyroid||97.540 3.1||97.540 3.1||96.206 5.8||97.57 3.0||97.55 3.0|
|Ecoli||87.141 6.1||87.495 5.9||86.909 6.3||85.50 0.6||85.11 5.4|
|Libras||82.750 6.2||82.944 5.9||85.880 8.0||85.00 0.8||87.16 9.8|
|Balance||96.780 2.2||97.946 1.7||90.489 3.6||97.20 0.6||90.86 3.4|
|Pima||76.355 4.3||76.535 4.5||77.320 4.3||75.54 4.6||74.85 4.9|
|Vehicle||73.611 4.4||74.829 4.6||78.052 4.3||67.70 0.6||70.26 4.1|
|Vowel||98.242 1.5||98.620 1.3||98.983 2.3||97.50 0.3||98.49 1.2|
|Yeast||60.365 3.8||60.360 3.6||60.124 3.7||57.20 0.5||56.50 3.6|
|RedWine||60.574 3.8||64.447 3.6||64.662 3.8||61.60 0.5||66.68 3.5|
|Segment||96.535 1.2||96.922 1.2||95.829 1.8||93.20 0.2||95.63 1.5|
|Optical||98.918 0.4||98.905 0.4||98.823 0.5||95.09 2.1||98.94 0.4|
|Poker||58.625 0.9||58.760 0.9||58.617 0.5||55.42 0.9||53.78 0.8|
As the final experiment of this paper, a complete comparison is done to analyze the results of CULM, CULP, the classical algorithms and two of the similar works that use complex network representation of the data to classify the unlabeled instances. These two classifiers which were discussed in the related work sections are PgRkNN  and HLCRW  (short for High Level data Classification using Random Walk)
The results of PgRkNN and HLCRW algorithms on datasets which were already provided in their papers are used here without a change, for other cases we implemented and run both of them completely by the details provided in those papers.
Table VI captures these results along with the summaries of Tables III, IV and V. For each of the rows in this table the bold cell is the best result for classifying the instances of the dataset through all of the algorithms. The best result for each of the cases where CULP/CULM obtained the higher average accuracy, is tested for significance against the second best accuracy using the Welch’s t
-test with confidence level of 0.95. In this test, the null hypothesis is that the averages are the same and the alternative hypothesis is that they are different. Except for the Teaching dataset which the bold and underlined values are not significantly different all the other bold values in CULP and CULM columns are superior.
In the first glance at Table VI it can be realized that CULM is the leader with 8 best results on the datasets among different algorithms. These datasets include Teaching, Sonar, Image, Glass, Ecoli, Balance, Segment and Poker.The next best algorithm in case of the best results is the Classical group with 5 dataset of Hayes, Wine, Pima, Vehicle and Vowel in lead. As the third algorithm we have PgRkNN with datasets Zoo, Libras, RedWine and Optical. The one before last is CULP with Iris and Yeast on top and finally HLCRW with only Thyroid with the best result.
In order to give a more thorough view on the ranking of the algorithm of Table VI, Table VII is formed. In this table the best result on a dataset gets 1 and the worst gets a 5. In case of ties the algorithms get the same value and when computing the average rankings, the ties effect their averages as the mean of their respective ranks (if 2 algorithms are both ranked 3, they sum up as 3.5 to compute the average rank).
As can be seen in Table VII, CULM has the best rank of 1.9 which is far better than the second ranked Classical algorithms (rank 2.675). The third rank belongs to CULP with 2.925 and after that comes PgRkNN and HLCRW with 3.55 and 3.95 respectively. These are evidence that CULP and CULM are highly accurate classifiers and competitive with classical and similar works.
In this work, we proposed a novel way to look at the problem of classification using a link prediction scope. Our proposed memory efficient graph data structure LEG enabled the use of any link predictor to assist the classification procedure and captured not only the unlabeled and labeled data, but also the classes in a unified manner.
Our proposed algorithm CULP can be used with any link predictor to derive the class of the unlabeled data. In this work Common Neighbors, Adamic-Adar Index and resource allocation were used along with our own local link predictor called Compatibility Score as the predictors for CULP. Our algorithm demonstrated superiority to similar algorithms which use graph representations to classify a data point and our Compatibility Score was also one of the best predictors in our experiments.
We also extend CULP by a weighted majority vote with weights proportional to the probabilities of the predictions. CULM is the name of our extension which not only uses multiple predictors but it also exploits the low level features of the data as well.
Our experiments on both CULM and CULP showed high accuracy on 20 different datasets and superiority on all the classical approaches and similar graph based methods.
There are a lot to be done with all the proposed methods and algorithms elaborated in this paper. We are going to test our Compatibility Score on graph datasets and test its accuracy on explicit link prediction problems. Another idea in our agenda is testing both CULP and CULM algorithms with other link prediction methods, possibly more complex ones such as random walk or matrix factorization to analyze any further improvement. Finally, a stacking approach to find the weights of CULM is under construction which hopefully be discussed in another work.
IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 6, pp. 954–970, 2012.