I Introduction
In recent years, research on Artificial Neural Networks with supervised learning algorithms has made great advances, often appearing in technology news with increasingly impressive practical applications in diverse areas, such as Robotics
[1], Genomics [2], and Natural Language Processing
[3].Despite these advances, the fact that these methods require a large amount of properly labeled data for training (sometimes, in the order of thousands of patterns per class) makes their use in many applications impractical. In certain areas, such as in the medical field, it is extremely difficult and expensive to obtain balanced labeled datasets. In other areas, such as robotics, the dynamic imposed makes it impossible to have realtime labels. In addition, in certain problems, new categories of elements may frequently arise, making it infeasible to create a comprehensive previously labeled training dataset.
Therefore, at the current stage of research, it is of great importance to put forward methods that can benefit both from the (frequently large amounts of) unlabeled data available as well as from the smaller amounts of labeled data, what would expand the current range of machine learning applications.
In order to achieve performance improvements, SemiSupervised Learning (SSL) methods take advantage of both unlabeled and labeled data [4]. Hence, SSL is halfway between supervised and unsupervised learning, being applied for both classification and clustering tasks [5].
In semisupervised classification, the training process tries to exploit additional information (often available as label classes) together with the unlabeled data to achieve a more accurate classification function. In semisupervised clustering, this prior information is used to obtain a better clustering performance [5, 6]
. Prototypebased methods such as KMeans
[5] and SelfOrganizing Maps (SOM) [7, 8] are examples that have been successfully applied in this area.Kohonen proposed two very influential prototypebased methods. SOM [7]
is an unsupervised learning method frequently applied for clustering, and the Learning Vector Quantization (LVQ)
[9] is a supervised learning method that shares many similarities with SOM, which is frequently applied for classification. Therefore, these methods are good candidates for developing a hybrid approach for SSL.Various modifications of LVQ and SOM were proposed to improve their performance in more challenging datasets with thousands of dimensions, commonly found in areas such as data mining [10] and bioinformatics [2]
. In this context, traditional distance metrics often applied in prototypebased methods may become meaningless due to the curse of dimensionality
[11], in which objects may appear approximately equidistant from each other, what is aggravated by the presence of irrelevant dimensions in the dataset. SOM and LVQbased methods usually deal with such problems by applying weights to the input dimensions, what has been shown to provide significant performance improvements.Following this path, in this paper, we proposed a new method called SemiSupervised SelfOrganizing Map (SSSOM), which is an extension of Local Adaptive Receptive Field Dimension Selective SelfOrganizing Map (LARFDSSOM) [8], created by introducing important modifications to incorporate semisupervised learning.
In order to evaluate the SSSOM, we compared it with other supervised and semisupervised methods. The performance of SSSOM was evaluated in different conditions of labels availability, ranging from 1% to 100% of labeled samples in the dataset. The proposed method presents promising results when applied to realworld datasets, even in conditions of a low percentage of labeled data, reaching a similar accuracy of traditional supervised learning methods.
The rest of this article is structured as follows: Section II defines the machine learning approaches considered in this article. Section III presents a review of important and prominent classification and clustering methods from different learning approaches. Section IV describes in details the proposed method. Section V presents the experimental setup, methodology and the obtained results and comparisons. Finally, in Section VI we discuss the obtained results and indicate future directions.
Ii Machine Learning Approaches
In a broad sense, the learning processes are traditionally categorized into two fundamentally different types of tasks: learning with and without supervisor [12, 13].
In the first, called supervised learning, involving only labeled data, the goal is to learn a mapping from X to Y, given a training set made of pairs (, ), where are the labels of the samples
. The latter, involving only unlabeled data, can be divided into two subcategories: 1) unsupervised learning, where the goal is to find interesting structure in the data X by estimating a density of which is likely to have generated X; and 2) reinforcement learning, where the learning of an inputoutput mapping is performed through continued interaction with the environment in order to minimize some kind of cost function
[12, 13].In the past years, there is a growing interest in a hybrid setting, called semisupervised learning (SSL). SSL is a central point between supervised and unsupervised learning. In many learning tasks, there is a large supply of unlabeled data, but insufficient labeled ones, since it can be expensive and hard to generate. The basic idea of SSL is to take advantage of both labeled and unlabeled data during the training, combining them to improve the performance of the models [6, 14, 13, 5].
Moreover, SSL can be further classified into semisupervised classification and semisupervised clustering
[6]. Firstly, in the semisupervised classification, the training set is given in two parts: and . Where S and U are the labeled and unlabeled data, respectively. At first hand, it is possible to consider a traditional supervised scenario using only Sto build a classifier. However, the unsupervised estimation of the probability function
p(x) of the input set can take advantage of both S and U. Besides, classification tasks can reach a higher performance through the use SSL as a combination of supervised and unsupervised learning [6]. Many semisupervised classification algorithms have been developed in the past decades, and, according to Zhu [15], we can structure them into the following categories: 1) Selftraining; 2) SSL with generative models; 3) Semisupervised Support Vector Machines (
VM), or transductive SVM; 4) SSL with graphs; and 5) SSL with committees.Secondly, in the semisupervised clustering, the aim is to group the data in an unknown number of groups relying on some kind of similarity or distance measures in combination with objective functions. Clustering is a more difficult and challenging problem than classification, and the nature of the data can make the clustering tasks even more difficult, so any kind of additional prior information in respect to the data can be useful to obtain a better performance. Therefore, the general idea behind semisupervised clustering is to integrate some type of prior information in the process. For example, a subset of labeled data and further constraints on pairs of the patterns in form of mustlink and cannotlink [6, 15]
. Prototypebased models algorithms (e.g., kmeans, and SOMs), Hidden Markov Random Fields (HMRFs), Expectation Maximization (EM) and Label Propagation (LP) are examples that have been successful in this area
[6, 15, 5, 14].Iii Related Work
Several techniques have been developed and used to deal with highdimensional data in different learning contexts. Thus, in this section, we describe the unsupervised (Section
IIIA), supervised (Section IIIB), and semisupervised (Section IIIC) methods and discuss how they are connected with the motivating problem. Some of these methods will be further compared in the Sections VI and V.Iiia Unsupervised Methods
Unsupervised learning techniques can address the problem imposed by the highdimensional and unlabeled data. In this context, we can cite the concept of SelfOrganizing Maps (SOM), first introduced by Kohonen [9]. SOM is used in several applications including clustering data without the knowledge of the labels. SOM also provides a topology that preserves the mapping from the highdimensional space to map units and the relation between the points.
The general task of clustering involves not only clustering the data but also identifying subsets of the input dimensions which are relevant to characterize each cluster. One way to achieve this is by applying local relevances to the input dimensions. The identification of which dimension is relevant or not is an important feature when working with highdimensional data [2]. In this context, subspace clustering methods have been proposed aiming to determine clusters in subspaces of the input dimensions of a given dataset [10]. Moreover, in subspace clustering problems, a sample may belong to more than one cluster as a result of taking into account different subsets of the input dimensions [8]. On the other hand, it is important to mention that in projected clustering problems, each sample belongs to a single cluster.
Therefore, some variations of the original SOM were developed to improve the performance of the clustering tasks, and LARFDSSOM is an example. It uses a timevarying structure, a neighborhood defined by connecting nodes that have similar subspaces of the input dimensions, and a local receptive field that is adjusted for each node as a function of its local variance. Hence, LARFDSSOM showed good results in the motivating problem for both subspace and projected clustering
[8].IiiB Supervised Methods
Some supervised methods for classification were proposed to deal with highdimensional data. According to Hammer [16], some Learning Vector Quantization (LVQ) methods are good options, since they have been shown to be a valuable alternative to Support Vector Machines (SVMs) [17]
. Even so, SVMs and the Multilayer Perceptron (MLP)
[12] are also alternatives.As the SOM, LVQ was also proposed by Kohonen [9]. It is a family of algorithms for statistical pattern classification, which uses prototypes to represent class regions [18]
. These regions are defined by hyperplanes between prototypes, resulting in Voronoi partitions. Various modifications of LVQ exist to ensure faster convergence, a better adaptation of the receptive fields, and an adaptation for complex data structures
[19].The Generalized Learning Vector Quantization (GRLVQ) is a member of this family. The algorithm was inspired by GLVQ and proposed to deal with high dimensional datasets by using a relevance vector able to identify the irrelevant dimensions and/or noise commonly present in real datasets. GRLVQ adapts weights for each input dimension by incorporating an update rule [19].
IiiC Semisupervised Methods
The Kmeans is one of the most popular and simple clustering algorithms. Despite the fact that Kmeans was proposed over 50 years ago, it is still widely used, and many variations have been proposed. Semisupervised Kmeansbased methods were very successful demonstrating their advantages over standard approaches. One of them is called SeededKMeans [5]. It can be viewed as an instance of the EM algorithm, where labeled data provides prior information about the conditional distribution of hidden category labels working as a guide for the clustering process.
Given a dataset X, the Kmeans clustering of the dataset generates a number of k partitions of X. Let be the seed set, a subset of datapoints on which supervision is provided as follows: for each , a group will be created according to the partition to with it belongs. By the end of the process, the partitions of the seed set S will form the seed clustering and will be used to guide the Kmeans algorithm [5].
In the SeededKMeans, the seed clustering is used only to initialize the Kmeans algorithm. Hence, instead of initializing from K random means, the mean of the ith cluster is initialized with the mean of the ith partition of the seed set.
Label propagation (LP) is another promising approach for SSL [20]. LP methods operate on proximity graphs or connected structures to spread and propagate information about the class to nearby nodes according to a similarity matrix. It is based on the assumption that nearby entities should belong to the same class, in contrast to far away entities [4, 20].
For LP purposes, each node is assigned to a label vector. A label vector contains the probabilistic membership degrees of input samples to the available cluster. Here, the nodes propagate their label vectors to all adjacent nodes according to a defined distance W. Nodes belonging to a preclassified input sample have fixed label vectors [20].
A similar alternative to LP is called Label Spreading (LS) [21]
. It differs from LP in modifications to the similarity matrix. LP uses the raw similarity matrix constructed from the data with no changes, whereas LS minimizes a loss function that has regularization properties allowing it to be often better regarding robustness to noise.
Iv Proposed Method
SSSOM^{1}^{1}1Available at: https://github.com/phbraga/SSSOM is a semisupervised hybrid SOM, based on LARFDSSOM [8], with a timevarying structure [22] and two different ways of learning. It is possible for SSSOM, as in LARFDSSOM, that the nodes consider different relevances for the input dimensions and adapts its receptive field during the selforganization process.
Moreover, our model is a prototypebased method that can learn in a supervised or unsupervised way. The SSSOM can switch between these two ways during the selforganization process according to the availability of the information about the class label for each input pattern. To achieve this, we modified the LARFDSSOM to include concepts from the standard LVQ [9] when the class label of some input pattern is given. The operations of the map consist of three phases: 1) organization (Alg. 1); 2) convergence; and 3) clustering or classification.
In the organization phase, after the network initialization, the nodes compete to form clusters of randomly chosen input patterns. There are two different ways to decide who is the winner of a competition, which nodes need to be updated and when a new node needs to be inserted. If the class label of the input pattern is provided, the supervised mode is used (Section IVB), otherwise, the unsupervised mode is employed (Section IVA
). The model can be trivially modified to also incorporating reinforcement learning. The neighborhood of SSSOM is formed connecting nodes with others of the same class label, or with unlabeled nodes. In both cases, it is necessary to take into account a similar subset of the input dimensions. The competition, adaptation and cooperation steps are repeated for a limited number of epochs. Furthermore, as in LARFDSSOM, the nodes that do not win for a minimum number of patterns are removed from the map every time that a certain age number (
parameter) is reached.The convergence phase starts after the organization phase. Here, the nodes are also updated and removed when necessary, similarly to the way conducted in the first phase. The difference is the fact that there is no insertion of new nodes. Moreover, this phase finishes the cycle left by the organization phase and runs another one to ensure convergence.
After finishing the convergence phase, the map can cluster and classify input patterns. Depending on the amount and distribution of labeled input patterns presented to the network during the training, after the convergence phase the map may have: 1) all the nodes labeled; 2) some nodes labeled; 3) no nodes labeled. For the first case, the clustering and classification are straightforward: each test pattern is associated with the label of the node with the highest activation. For the second case, if the node with the highest activation has no class, we continue looking for another node with a defined class label, and an activation above the threshold . For the third and final case, we can identify the clusters of the input patterns, but not their classes.
It is important to mention that in subspace clustering an input pattern may belong to more than one cluster. However, in this work, we considered only the task of projected clustering, when each input pattern is assigned to a single cluster.
The next sections describe the operation in the unsupervised and supervised modes.
Iva Unsupervised Mode
Given an unlabeled input pattern, we look for a winner node disregarding their class labels. Therefore, as in the Eq. 1
, the winner of a competition is the node that is the most activated according to a radial basis function with the receptive field adjusted as a function of its relevance vector. In other words, the winner
s(x) is the node with the highest activation value (Section IVC2) for the input pattern:(1) 
where
is the activation function explained in Section
IVC2 and is the relevance vector of the node j.Similarly to LARFDSSOM, SSSOM has an activation threshold . According to this, if the activation of the winner is lower than , a new node is inserted into the map at the input pattern position because the winner is not close enough. Otherwise, the winner and its neighbors are updated to get closer to the input pattern (Section IVC3), for that, we consider two fixed learning rates: 1) for the winner node; and 2) for its neighbors, where ¡. Alg. 2 presents this procedure.
IvB Supervised Mode
In order to incorporate the supervised learning mode, each node in the map can be associated with a class label. Hence, when a labeled input pattern is given, we treat it differently. The Alg. 3 presents this procedure.
In order to obtain performance improvements from the labeled patterns, we take the labels into account when looking for a winner. Here, unlike the unsupervised mode that only consider the activation, if the most activated node has the same class of the input pattern or a not defined class (line 1 in Alg. 3), a very similar procedure to the unsupervised mode (Section IVA) is ran (lines 2 to 9). The difference, in this case, is the fact that is necessary to set class to the same class of the input pattern x, as well as update its connections. Otherwise, we search for another winner matching the following conditions (line 11): 1) it needs to have the same class of the input pattern or an unspecified class, and 2) the activation must be higher than .
If any node fulfills these conditions (line 12 in Alg. 3), a new winner has been found, and it and its neighbors will be updated as in the unsupervised mode (Section IVA). However, the fact that was the wrong winner leads to the possibility to push it away from the input pattern. Therefore, similarly as in the LVQ, we push away from the input pattern with a fixed learning rate . This procedure is presented in lines 13 and 14 of Alg. 3. Otherwise, if the maximum number of nodes in the map was not achieved, a new node is inserted into the map at the same position and with the same class of the input pattern x (lines 16 and 17 of Alg. 3).
IvC Common Operations for Both Modes
IvC1 Nodes structure
In SSSOM, each node j in the map represents a cluster and is associated with three mdimensional vectors, where m is the number of input dimensions; is the center vector that represents the prototype of the cluster j in the input space; is the relevance vector in which each component represents the estimated relevance, a weighting factor within [0, 1], that the node j applies for the ith input dimension; and is the distance vector, that stores a moving average of the observed distance between the input patterns x and the center vector . The vector is used solely to compute the relevance vector, as in [8].
IvC2 Nodes activation
The activation of a node in SSSOM is calculated as a radial basis function of the weighted distance with the receptive field adjusted as a function of its relevance vector. The activation grows as the distance decreases and as the relevances increases. The Eq. 2 shows the activation calculation as follows:
(2) 
where is a small value added to avoid division by zero and is the weighted distance function used in LARFDSSOM:
(3) 
IvC3 Node Update
In SSSOM, in order to update the vectors associated with the nodes (the winner, the neighbors or the winner of a wrong class), a fixed learning rate is used, depending on the undergoing procedure (Alg. 3 or Alg. 2).
Alg. 4 shows how the update occurs in SSSOM. Given a learning rate, the node will be updated as in LARFDSSOM. We present the equations as follows:
(4) 
where e is the learning rate.
To compute the relevance vectors, we estimate the average distance of each node to the input pattern that it clusters. As in LARFDSSOM, the distance vectors are updated through a moving average of the observed distance between the input pattern and the current center vector
(5) 
where e is the learning rate, ]0,1[ controls the rate of change of the moving average, and the operator denotes the absolute value, not the norm [8].
After updating the distance vector, each component of the relevance vector is calculated by an inverse logistic function of the distances as follows in Eq. 6
(6) 
where , , are the maximum, the minimum, and the mean of the components of the distance vector , respectively. The parameter s 0 controls the slope of the logistic function [8].
IvC4 Node Removal
In SSSOM, each node j in the map stores a variable that represents the number of the node victories since the last reset. Whenever age_wins is reached, a reset occurs (lines 1319 in Alg. 1), it means that any nodes which do not win at least the minimum percentage of the competitions lp age_wins will be removed. After the reset, the number of victories of the remaining nodes is set to zero.
IvC5 Neighborhood Update
When a reset occurs, and the nodes have been removed, the connections between the remaining nodes must be updated. In SSSOM, the neighborhood is formed by nodes with the same class or unlabeled nodes that apply similar relevances for the input dimensions, so that, a connection between two nodes means that they cluster patterns with the same class or at least in similar subspaces. Eq. 7 considers these similarities between the relevances of every pair of nodes to control this behavior.
(7) 
IvD SSSOM Parameters Summary
SSSOM inherits all parameters from LARFDSSOM and includes a new parameter called . This parameter provides a specific learning rate for the update of wrong winners as described in Section IVB. It means that we have 11 parameters to set up. Despite this being a high number of parameters, a sensitivity analysis showed in [8] revealed that only and lp present a high impact on the results. SSSOM kept this characteristic with the addition of as a new sensitive parameter. So that, we can keep the other parameters values fixed inside the ranges defined in Table V, given their marginal influences, including the number of epochs. The parameter
, however, is crucial. Since it defines the receptive field of nodes, during the training, it affects the number of nodes inserted in the map, as well as the number of patterns regarded as outliers during the clustering and classification phase. The parameter
lp defines the minimum percentage of input patterns that a node has to cluster for not being removed from the map. This parameter is dataset dependent and has a substantial impact on the results. Finally, the parameter is the learning rate of the winner node, it defines the update step, which depends on the dataset. After a well adjust of and lp, it starts to impact the results, but it is not so significant than the other two. A short description for the other parameters can be found in [8].V Experiments
In order to evaluate the classification capabilities of SSSOM, we compare it with some traditional supervised methods such as MLP [12], SVM [17], and GRLVQ [19]. We also compared SSSOM with the following semisupervised methods: Label Spreading [21] and Label Propagation [4]. Finally, we used seven realworld datasets of the OpenSubspace framework [23]. It provides realworld datasets adapted from the UCI machine learning repository [24] as well as an extensive amount of synthetic datasets. A detailed description of the datasets can be found in [23].
In Section VA, we present the methodology and the experimental setup, next in Section VB, we present the results and analysis necessary to clarify the final conclusions.
Va Experimental Setup
For all the algorithms, on each dataset, we used 3times 3fold crossvalidation. Each method was trained and tested 500 times for each fold with different parameter values sampled from the parameter ranges presented in Tables I–V, according to a Latin Hypercube Sampling (LHS) [25]
, while the best accuracy achieved by each method in each fold was recorded for each dataset. This comprises a total of 752,000 experiments. After that, we calculate the mean and the standard deviation of the best results for each dataset separately. The LHS guarantees the full coverage of the range of each parameter. For our case, the range of each parameter is divided into 500 intervals of equal probability which leads to a random selection of a single value from each interval
[8].For studying the effects of the different levels of supervision, i. e., the percentage of labeled data, the semisupervised methods were trained with the following percentages: 1%, 5%, 10%, 25%, 50%, 75% and 100%. The ranges of the parameter for the supervised methods are shown in Tables I–III and the parameter ranges for both semisupervised methods can be seen in Table IV. Finally, the ranges for SSSOM are shown in Table V. The maximum number of nodes for SSSOM () was set to be the size of the training set. A detailed description of the parameters of the comparable methods can be found in [19], [12], [17], [4], and [21].
Parameters  min  max 
Number of nodes  10  30 
Positive learning rate  0.4  0.5 
Negative learning rate  0.01  0.05 
Weights learning rate  0.15  0.2 
Learning Decay  0.000001  0.00002 
Number of epochs  5000  10000 
Parameters  min  max 

C  0.1  10 
Kernel Function  1  4 
Degree of polynomial kernel function  3  5 
Gamma of kernel functions 2, 3 and 4  0.1  1 
Independent term in kernel functions 2 and 3  0.01  1 

1: linear, 2: poly, 3: rbf and 4: sigmoid.
Parameters  min  max 
Number of neurons in each layer 
1  100 
Number of hidden layers  1  3 
Learning rate  0.001  0.1 
Momentum  0.85  0.95 
Epochs  100  200 
Optimizer  1  3 
Activation function  1  3 
Learning Decay  1  3 

1: lbfgs; 2: sgd; 3: adam;
1: logistic; 2: tanh; 3: relu;
1: constant; 2: invscaling; 3: adaptative.
Parameters  min  max 

Kernel Function  1  2 
(for RBF Kernel)  10  30 
Number of Neighbors (for KNN Kernel) 
1  100 
0  1  
Number of epochs  20  100 

1: RBF and 2: KNN. * is only used for label spreading.
We considered a projected clustering problem, where each sample should be assigned to a single cluster, and SSSOM was set to operate in such mode. For classification purposes, if available, we use the node class as the predicted class. Otherwise, it is straightforwardly considered as an error. The next section presents the obtained results and their analysis.
Parameters  min  max 
Activation threshold ()  0.80  0.999 
Lowest cluster percentage (lp)  0.001  0.01 
Relevance rate ()  0.001  0.5 
Max competitions ()  
Winner learning rate ()  0.001  0.2 
Wrong winner learning rate ()  
Neighbors learning rate ()  
Relevance smoothness ()  0.01  0.1 
Connection threshold ()  0  0.5 
Number of epochs ()  1  100 

* S is the number of input patterns in the dataset.
VB Experimental Results and Analysis
Fig. 1 shows the results of SSSOM in comparison with Label Propagation and Spreading for the realworld datasets as a function of the percentage of labeled data. In all datasets, the performance of the proposed method is superior to the other semisupervised methods concerning the supervision rate between 1% to 75%, whereas with higher percentages (100%) the difference is smaller, but it continues to outperform or obtain comparable results. These results show the robustness of proposed method in situations when only a small number of labeled data is available.
Table VI shows the results of SSSOM and other semisupervised methods using 100% of the labeled data, allowing a fair comparison with supervised methods such as GRLVQ, MLP, and SVM. Our method shows a comparable performance with the other semisupervised methods, where the biggest difference is for Vowel. Also, SSSOM appears as the best overall among the semisupervised methods (the first three in the table), as well as the MLP among the supervised (the last three in the table).
On considering all methods at 100% of supervision, the MLP outperforms all the others in four of seven datasets. Our method presented the best result for the Shape dataset, outperforming all the other methods. Whereas Label Spreading and Propagation methods are the best ones for Vowel, the SSSOM showed better results than two of three supervised methods, SVM and GRLVQ, with the former showing a low accuracy value. Also, the SVM appears as the best for Pendigits. Besides that, in all the other datasets, SSSOM showed results close to the best, even with it not being the primary objective of this work.
Accuracy  Breast  Diabetes  Glass  Liver  Pendigits  Shape  Vowel 

SSSOM  0.832 (0.044)  0.776 (0.016)  0.714 (0.033)  0.748 (0.025)  0.978 (0.004)  0.935 (0.029)  0.876 (0.017) 
Label Propagation  0.805 (0.063)  0.730 (0.031)  0.663 (0.044)  0.623 (0.036)  0.994 (0.003)  0.925 (0.036)  0.948 (0.012) 
Label Spreading  0.805 (0.066)  0.729 (0.031)  0.663 (0.044)  0.640 (0.031)  0.994 (0.003)  0.925 (0.036)  0.948 (0.012) 
MLP  0.854 (0.032)  0.791 (0.017)  0.746 (0.031)  0.766 (0.031)  0.993 (0.001)  0.923 (0.034)  0.874 (0.033) 
SVM  0.850 (0.037)  0.788 (0.020)  0.718 (0.028)  0.746 (0.054)  0.997 (0.001)  0.931 (0.030)  0.909 (0.022) 
GRLVQ  0.830 (0.049)  0.772 (0.020)  0.676 (0.027)  0.699 (0.022)  0.915 (0.004)  0.823 (0.061)  0.515 (0.027) 

In bold, the best results for each dataset on each category: semisupervised and supervised methods. The underlined results indicate the global best.
Vi Conclusion and Future Work
This article presented an approach for classification and clustering with semisupervised learning. The behavior of SSSOM was shown to have led to significant improvements in classification results for small amounts of labeled data, establishing its position as a good option when dealing with such problems, which is the central point of this article. The proposed method showed its robustness under this condition, being better than other semisupervised models, achieving impressive results even with only 1% of labeled data. Furthermore, despite the fact that SSOM has 11 parameters, only three of them (, and ) present important effects on the results.
Also, in all datasets, using 100% of the labels, SSSOM showed results better than or at least close to the best found in comparison with others supervised and semisupervised methods, even with it not being the objective of this work.
It is important to mention that in the current implementation, the selforganizing process is run for a number of epochs sampled from LHS, which is usually greater than the necessary to converge, even at the defined interval. An adequate stop criterion is an object of study for future versions in order to reduce the training time.
Notice that LARFDSSOM presented good results for subspace clustering [8], and when there is no labeled sample available, SSSOM works exactly as LARFDSSOM, inheriting its characteristics and performance. However, when labeled samples are given, the results can be even better. Moreover, with a small change, SSSOM could also incorporate reinforcement learning, being, thus, capable of switching between three different learning approaches, to exploits several forms of information available, what is left for future work.
Ackowledgments
The authors would like to thank the Brazilian National Council for Technological and Scientific Development (CNPq) for supporting this research study.
References

[1]
S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning handeye coordination for robotic grasping with deep learning and largescale data collection,”
CoRR, 2016.  [2] F. R. B. Araujo, H. F. Bassani, and A. F. R. Araujo, “Learning vector quantization with local adaptive weighting for relevance determination in GenomeWide association studies,” in The 2013 International Joint Conference on Neural Networks. IEEE, aug 2013, pp. 1–8.

[3]
J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, “Deep recurrent models with fastforward connections for neural machine translation,”
CoRR, 2016.  [4] X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation,” 2002.
 [5] S. Basu, A. Banerjee, and R. Mooney, “Semisupervised clustering by seeding,” in In Proceedings of 19th International Conference on Machine Learning, 2002.
 [6] F. Schwenker and E. Trentin, “Pattern classification and clustering: A review of partially supervised learning approaches,” Pattern Recognition Letters, vol. 37, pp. 4–14, 2014.
 [7] T. Kohonen, “The selforganizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
 [8] H. F. Bassani and A. F. Araujo, “Dimension selective selforganizing maps with timevarying structure for subspace and projected clustering,” IEEE transactions on neural networks and learning systems, vol. 26, no. 3, pp. 458–471, 2015.
 [9] T. Kohonen, “Learning vector quantization,” in SelfOrganizing Maps. Springer, 1995, pp. 175–189.
 [10] H.P. Kriegel, P. Kröger, and A. Zimek, “Clustering highdimensional data: A survey on subspace clustering, patternbased clustering, and correlation clustering,” ACM Transactions on Knowledge Discovery from Data, vol. 3, no. 1, p. 1, 2009.
 [11] M. Köppen, “The curse of dimensionality,” in 5th Online World Conference on Soft Computing in Industrial Applications, 2000, pp. 4–8.
 [12] S. Haykin, Neural Networks and Learning Machines, 3rd ed. PrenticeHall, 2008.
 [13] O. Chapelle, B. Scholkopf, and A. Zien, “Semisupervised learning,” IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542–542, 2009.
 [14] A. K. Jain, “Data clustering: 50 years beyond kmeans,” Pattern recognition letters, vol. 31, no. 8, pp. 651–666, 2010.
 [15] X. Zhu, “Semisupervised learning literature survey,” Computer Science, University of WisconsinMadison, vol. 2, no. 3, p. 4, 2006.

[16]
B. Hammer, M. Strickert, and T. Villmann, “Relevance lvq versus svm,” in
International Conference on Artificial Intelligence and Soft Computing
. Springer, 2004, pp. 592–597.  [17] C. Cortes and V. Vapnik, “Support vector machine,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
 [18] D. Nova and P. A. Estévez, “A review of learning vector quantization classifiers,” Neural Computing and Applications, vol. 25, no. 34, pp. 511–524, 2014.
 [19] B. Hammer and T. Villmann, “Generalized relevance learning vector quantization,” Neural Networks, vol. 15, no. 8, pp. 1059–1068, 2002.
 [20] L. Herrmann and A. Ultsch, “Label propagation for semisupervised learning in selforganizing maps,” Proceedings of the 6th WSOM, 2007.
 [21] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, “Learning with local and global consistency,” in Advances in neural information processing systems, 2004, pp. 321–328.
 [22] A. F. Araujo and R. L. Rego, “Selforganizing maps with a timevarying structure,” ACM Computing Surveys, vol. 46, no. 1, p. 7, 2013.
 [23] E. Müller, S. Günnemann, I. Assent, and T. Seidl, “Evaluating clustering in subspace projections of high dimensional data,” Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1270–1281, 2009.
 [24] A. Asuncion and D. Newman, “Uci machine learning repository,” 2007.
 [25] J. C. Helton, F. Davis, and J. D. Johnson, “A comparison of uncertainty and sensitivity analysis results obtained with random and latin hypercube sampling,” Reliability Engineering & System Safety, vol. 89, no. 3, pp. 305–330, 2005.
Comments
There are no comments yet.