Machine learning can be defined as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict the future data, or perform other kinds of decision making under uncertainty .
Usually, machine learning is divided into three main paradigms: supervised learning, unsupervised learning, and semi-supervised learning 
. Supervised learning uses tagged data to detect patterns and predict future cases. According to the kind of labels, the prediction is calledclassification for categorical labels and regression for numerical labels. The unsupervised machine learning algorithms explore the data to search for possible structures that can tag the data. For example, on social media, the users don’t provide necessarily some special information like political preferences. However, using the collected data an unsupervised algorithm could detect this 
. Semi-supervised learning is a combination of supervised and unsupervised learning. Usually, the quantity of labeled data is low because tagging is expensive. So, semi-supervised algorithms use some labeled data to predict the untagged data.
Data classification is one of the most important topics in supervised learning. It aims at generating a map from the input data to the corresponding desired output, for a given training set. The constructed map, called a classifier, is used to predict new input instances.
Many algorithms use only physical features (e.g., distance, similarity, or distribution) for classification. These are called low-level
classification algorithms. Such algorithms can get good classification results if the training and testing data are well-behaved, for example, the data satisfies a normal distribution. But, these techniques have a problem to classify data with complex structures. On the other hand, a high-level algorithm uses data interaction as a system for classification, exploiting the structure of the data to capture patterns. In this way, it can perform classification according to pattern formation of the data like cycles, high links density, assortativity, network communication (betweenness), and so on instead of just measuring physical features like euclidean distance.
In order to capture the structure and properties of the data, we propose to work with complex networks, which are defined as large scale graphs with nontrivial connection patterns . Many network measures have been developed and each one of them characterizes network structure from a particular viewpoint. In the category of degree-related measures, we have the density that represents how strong the nodes connections are  and assortativity degree that represents the attraction of nodes with a similar degree (degree correlation) . In the category of centrality measures, we have betweenness centrality that measures the node importance for communication on the network , closeness vitality that measures the impact of a network communication if a node is removed , and so on.
Several high-level algorithms have been proposed to use network measures to make a classification. Impact measure approach tries to reduce the variation of a measure once a new node is inserted into a network , link prediction approach uses a meta class node to represent each label and the classification is performed using link predictions techniques , and importance measure exploits the page-rank algorithm for classification .
The technique proposed in this work, captures the structure of data using just one metric . This measure captures the node importance for the graph communication. Nodes that have low tend to be on the periphery on the contrary the nodes tend to be focal points 
. Instead of focusing on the node insertion impact or preservation of the structure measure using many network measures. We focus on the structure generated once a new node is inserted and identify if this inserted node presents similar features to the others in the new network. Also, unlike other methodologies that require a classical algorithm like SVM (Support Vector Machine) to complete the high-level classification, our methodology uses pure network measures to classify. This approach shows good performance, avoids the double calculations of impact measure method, reduces the number of properties to be used, and do not require to be combined with other classical techniques.
2 Model Description
In this section, we describe the working mechanism of our model. Firstly, we give an overview of the training and classification model phase. Then we provide details about each step of the algorithm. Finally, we describe how we use the betweenness measure on the model.
2.1 Overview of the Model
Each complex network consists of a set of nodes or vertices and a set of links or edges between each pair of nodes. The input data of elements for contains two parts: the attributes and the labels .
In , the dataset where represents the attributes, and represents the label of the instance . The values of where is the possible labels of the instance. The goal of is to predict the values using the instances . This could be considered as function approximation where the function is our algorithm. To evaluate the model, it is required to split the data in training and testing datasets. The dataset will be used to build our model and the dataset will be used for evaluation.
In the training phase, we will build complex networks using the training dataset. The instances in the dataset will be the nodes and the links will represent the similarity between these nodes. Therefore, we will have , where is the set of nodes and is the set of links in the complex network . The links could be created using and or personalized relation metrics like friendship on social data, flight routes, or city connections.
The network will be built using to produce the nodes and and as relation metric for links . Then, we remove the links between nodes with different labels . Following this strategy, we will have one network component for each label in .
In the testing phase, we insert a node from into each component following the same and rules of training phase. Then, we calculate the of this node in each . This measure is compared to the others from each network component . So, the differences are saved in a new list for each .
Finally, we get the average of the lowest values for each list and we classify the new node to the with the lowest average. Then, we remove this node from the other components. In the case that the average differences of two or more lists are equal, we use the number of links connected to this new node in each component as a second difference measure.
2.2 Network-Based High Level Classification Algorithm Using Betweenness Centrality (NBHL-BC)
The proposed high-level classification algorithm, which will be referred as NBHL-BC, has four parameters , , , and . Where is the number of neighbors used in the , is the percentile into used to calculate , is the number of nodes with similar used for classification,and is the weight to balance between links and .
During the training phase, we need to build the network using the and where . Each node in is related with one instance in and each link in is defined following these two techniques:
Where represents a pair of data instance and its corresponding label . For each instance , is the set of nodes to be connected to it, its neighborhood. returns the set of nodes i.e. the set of nodes whose similarity with is beyond a predefined value and have the same class label. Here, is a similarity function like euclidean distance. returns the set containing the nearest neighbors of . The value is the percentile of the in the sub graph of . Note that the -radius criteria is used for dense regions (), while the is employed for sparse regions. With this mechanism, it is expected that each label will have an independent sub graph   .
In Figure 0(d), we can see the graph where represent the sub graphs with nodes red, green and blue. On the testing phase, we insert each instance (dark node) to each component following the same rule on equation (1), and assuming that the node will be inserted in each sub graph.
For example, in Figure 1, there are three network components and the node to be tested is inserted to each one. In Figures 0(a) 0(b) 0(c), the node uses its nearest neighbors with the same label. In this case with , there are 4 nodes in , 1 in , and 0 in . The with the (median of ) is less than 5, because the current inserted node presents a sparse behavior; for this reason, we will use just .Moreover, due to the condition of the same label, the algorithm will produce one sub graphs for each possible label.
Now we calculate the for the node in each component when the new node is inserted. Following this rule, the inserted node will have different values for each component.
The is a mixed measure (global and local) that captures how much a given node is in the shortest paths of others nodes . This measure captures the influence of a node in the communication of the network . We capture not only the characteristics of a node but also the behavior of their neighborhood. So, we have a metric that provides local and global network characteristics. This metrics is defined in the equation 2.
where is 1 when the node is part of the geodesic path from to and 0 otherwise. is the total number of shortest paths between and .
Then, we calculate the difference of this measure between the inserted node and the other nodes in each component . In the algorithm 2 on line 14, we can show this step. In the algorithm 1, we can appreciate how an inserted node will present a different for each sub graph.
These values will be inserted into an independent list for each component . We will calculate the average of the lower values on each list. In the 2 on line 19, we can appreciate how we get just the b lower elements on previously sorted on line 16. The results are stored on the array where each represents the average difference of the nearest betweenness node values on the sub graph . This process is represented in the algorithm 2 on line 27 and 28.
Where is the normalized version of
. In order to avoid conflicts of probabilities with the same value, we calculate the number of links of the inserted node with respect to each sub graph on the array . Then, we follow a similar process of equation 3 for normalization. This process is represented in the algorithm 2 on line 29.
Finally, once we normalize these values, we calculate the sum of and made a final normalization.
where represents the probability of a node to be inserted in the sub graph , and controls the weights between structural information and number of links. If , we just capture information using , and if , we just capture information about number of links. The fully algorithm is described in algorithm 2.
3 Performance Tests on Toy Datasets
In this section, we present the classification performance of our algorithm in toy datasets and compare the results with other algorithms using python as programming language and Scikit-learn library for algorithms 
. Specifically, we test our algorithm against Multi Layer Perceptron (MLP)
, Decision Tree C4.5 (DT)
, and Random Forest (RF). The algorithms are tested using cross validation 10-folds, executed 10 times, and we use a grid search to select the hyper parameters that give the best accuracy for all the algorithms.
The toy datasets are Moons and Circle with 0.0 and 0.25 of Gaussian standard deviation noise added to the data2. The NBHL-BC parameter values are shown in table 1, and the classification accuracy results are shown in table 2. These datasets were used because present clear data patterns where traditional algorithms reduce their effectiveness. In the case of Decision tree, we use gini index as quality measure without pruning method. In the case of Random Forest, we use gini index as split criterion and 100 trees. In the case of MLP, we use 2 hidden layer with 10 nodes and 100 interactions for dataset without noise and 500 interactions with noise.
We use in the first group because we want to evaluate just the structural methodology using . In the second group, we combine both strategies with and we got a small improvement on the dataset Circle 0.25. In the last group, we use just the number of links and we got similar results but there is a reduction of the accuracy in Moons 0.25. In some cases, we need to remove the property of using and increase the quantity of neighbors in Moons with 0.25 noise. The similar nodes were kept in all the tests because other values reduce accuracy.
In this simulations, our algorithm presents the best results in all the datasets. Specially in the the circle dataset with noise 0.25, which is the most difficult case, our algorithm presents better classification accuracy than other techniques under comparison.
4 Experimental Results on Real Datasets
In this section, we are going to present the results of the NBHL-BC technique on UCI classification datasets  . Also, we will compare our results with other algorithms . We tested our algorithm against Multi Layer Perceptron (MLP) , Decision Tree C4.5 (DT) , Random Forest (RF) , and the Network Base High Level Technique (NBHL) .
The algorithms are tested splitting each dataset in two sub data sets, for training and testing with a proportion of 75% and 25% respectively following an stratified sampling using python as programming language and Scikit-learn library for algorithms .
The datasets used are shown in table 3 with the number of instances, attributes and classes. These datasets were selected because the previous high-level algorithm used them. The NBHL-BC parameter values are given in table 4, and classification accuracy results are presented in table 5.
Our algorithm presents a good performance in all the datasets compared to other algorithms. In four cases, our algorithm presents the best results. Just in case of Iris dataset, another high level classification algorithm NBHL is better than the proposed one.
Moreover, the parameter that regulates the weight between the measure and number of links in 6 of the 7 datasets is 1.0 that means that the algorithm just use the . In the dataset Yeast, it was required an that means that give same importance between and number of links. In table 6, we tested UCI Wine dataset  using 10-fold cross validation with different values for . The accuracy with only links number is quite lower than , and the best result is mixing both techniques with . The parameter that evaluates the number of nodes with the lower difference with respect to the inserted node were kept constant.
In this paper, we describe a new technique for high-level classification using property. We propose that nodes with similar could determinate the new untagged instance belongs to. This measure provides the importance of each node in the sub-graph communication. We exploit this property to classify a new node into a sub-graph that presents a similar communication structure. We test this algorithm in 4 toy datasets and 7 real datasets and the results are promising.
As further works, we think that it is needed some procedures to reduce the noisy instances, and attributes that could produce disconnected sub graphs. Also, it is needed a way to detect the best parameters for and perhaps following an optimization approach like particle swarm.
-  (2002-01) Statistical mechanics of complex networks. Rev. Mod. Phys. 74, pp. 47–97. Cited by: §1.
-  (2001) Random forests. Machine Learning 45 (1), pp. 5–32. Cited by: §3, §4.
Organizational data classification based on the importance concept of complex networks.
IEEE Transactions on Neural Networks and Learning Systems29, pp. 3361–3373. Cited by: §1.
-  (2016) Machine learning in complex networks. Springer International Publishing. Cited by: §1, §1, §2.2.
-  (2018-07) A network-based high level data classification technique. In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1, §2.2, §4.
-  (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Cited by: §4, §4.
-  (2019) Classification using link prediction. Neurocomputing 359, pp. 395 – 407. Cited by: §1.
-  (1977) A set of measures of centrality based upon betweenness. Sociometry 40, pp. 35–41. Cited by: §1.
Hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems. 1st edition, O’Reilly Media, Inc.. Cited by: §1.
-  (2013) Machine learning : a probabilistic perspective. MIT Press, Cambridge, Mass. [u.a.]. Cited by: §1.
-  (2019) Graph algorithms: practical examples in apache spark and neo4j. O’Reilly Media, Incorporated. External Links: Cited by: §2.2.
-  (2003-02) Mixing patterns in networks. Phys. Rev. E 67 (2), pp. 026126. Cited by: §1.
-  (2019) Hands-on unsupervised learning using python. Cited by: §1.
-  (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §3.
A direct adaptive method for faster backpropagation learning: the rprop algorithm. In IEEE International Conference on Neural Networks, pp. 586–591 vol.1. Cited by: §3, §4.
-  (2000-08) SPRINT: a scalable parallel classifier for data mining. VLDB, pp. . Cited by: §3, §4.
-  (2012-06) Network-based high level data classification. IEEE Transactions on Neural Networks and Learning Systems 23 (6), pp. 954–970. Cited by: §1, §2.2.
-  (2015) High-level pattern-based classification via tourist walks in networks. Information Sciences 294, pp. 109 – 126. Note: Innovative Applications of Artificial Neural Networks in Engineering Cited by: §2.2.