Deep Categorization with Semi-Supervised Self-Organizing Maps

06/17/2020 ∙ by Pedro H. M. Braga, et al. ∙ UFPE 0

Nowadays, with the advance of technology, there is an increasing amount of unstructured data being generated every day. However, it is a painful job to label and organize it. Labeling is an expensive, time-consuming, and difficult task. It is usually done manually, which collaborates with the incorporation of noise and errors to the data. Hence, it is of great importance to developing intelligent models that can benefit from both labeled and unlabeled data. Currently, works on unsupervised and semi-supervised learning are still being overshadowed by the successes of purely supervised learning. However, it is expected that they become far more important in the longer term. This article presents a semi-supervised model, called Batch Semi-Supervised Self-Organizing Map (Batch SS-SOM), which is an extension of a SOM incorporating some advances that came with the rise of Deep Learning, such as batch training. The results show that Batch SS-SOM is a good option for semi-supervised classification and clustering. It performs well in terms of accuracy and clustering error, even with a small number of labeled samples, as well as when presented to unsupervised data, and shows competitive results in transfer learning scenarios in traditional image classification benchmark datasets.



There are no comments yet.


page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Nowadays, with the advance of technology, there is a plentiful amount of unstructured data available. However, organize and label them is considerably challenging. Labeling is an expensive, time-consuming, and difficult task that is usually done manually. People can label with different formats and styles, incorporating noise and errors to the dataset [1]

. For instance, competitions like Kaggle: ImageNet Object Localization Challenge


, tries to encourage this kind of practice in order to obtain more reliable and bigger datasets continuously. There, participants are challenged in tasks to identifying objects within an image, so those images can then be further classified and annotated to be incorporated into datasets.

It is well-known that supervised learning algorithms normally reach good performances when high amounts of reliable and properly labeled data are available [3]. On the other hand, ssl purpose is to categorize (classify or cluster) data even with a lack of properly labeled examples. To this extent, ssl algorithms put forward learning approaches that benefit from both labeled or unlabeled data.

Still, the abundant unlabeled data has a large amount of discriminating information that can be fully explored by ssl algorithms and then combined with the prior information available from the smaller number of labeled samples. In this context, previous works in ssl have contributed directly to a variety of areas in different application scenarios, such as traffic classification [4], health monitoring [5], and person identification [6].

Moreover, those datasets are most common in the form of a high number of dimensions, then providing complex data structures to be fed to the models. Such high-dimensional data space imposes great challenges for the traditional machine learning approaches because they normally have the presence of noisy or uncorrelated data in some dimensions. Furthermore, due to the curse of dimensionality, traditional distance metrics may become meaningless, making objects appear to be approximately equidistant from each other. So, many approaches have been applied to deal with this problem. For example, lvq and som based models, such as

[7, 8].

Therefore, not only the datasets itself can be used for clustering or classification tasks, but also its characteristics (features), or learned representations, that can be found and extracted using deep learning models. Then, it can be transferred and fed to independent learning models using transfer learning techniques [9], which is a common practice employed to workaround the computational cost necessary to train huge datasets while also exploring generalization capabilities. For instance, it is very common to see many works using pre-trained features of ImageNet [10]. However, such strategies are often neglected in ssl, but it is still a good starting point, which can be further explored.

In this article, we propose a new model called bsssom, which is a novel approach to the previous sssom model that can be easily scaled to a wide range of deep learning tasks. To achieve this, many modifications were incorporated into the baseline model to allow dealing with batches of samples and easily couple the proposed model in Deep Learning architectures. For the evaluation, we compared it with other semi-supervised models as well as studied its performance and behavior under different amounts of available labels in a variety of deep learning benchmark datasets.

The rest of this article is organizing as follows: Section II presents a short background related to the areas in which this paper is inserted. Section III discusses the sssom, the baseline model for the current work. Section IV describes in detail the proposed method. Section V presents the experiments, methodology, the obtained results, and comparisons. Finally, Section VI debates the obtained results and concludes this paper, as well as indicates future directions.

Ii Background

According to [11], it is expected that unsupervised and semi-supervised learning becomes far more important in the longer term. On considering a purely unsupervised scenario, many approaches based on deep learning have been proposed recently. In that sense, [12] divides them into three different main strategies, as illustrated in Fig. 2.

The so-called Multi-Step Sequential Deep Clustering consists of two main steps: 1) learn a richer deep representation (also known as latent representation) of the input data; 2) perform clustering on this deep or latent representation. For instance, it can be distinguished by the use of transfer learning techniques [9]

, relying on the use of pre-trained models to create or extract the representations that can be further fed to clustering models. The current paper is based on this approach. In Joint Deep Clustering, the step where the representation is learned is tightly coupled with the clustering. Hence, models are trained with a combined or joint loss function that favors learning a good representation while performing the clustering task itself. The Closed-loop Multi-step Deep Clustering is similar to Multi-Step Sequential Deep Clustering. However, after pre-training, the steps alternate in an iterative loop, where the output of the clustering method can be used to allow retraining or fine-tuning of the deep representation.

Fig. 1: Multi-Step Sequential Deep Clustering. In the first step the input date is used to train a model to create a deep representation. After that, this representation can be used in the second step by a clustering method to perform cluster.
Fig. 2:

(Deep) Unsupervised Learning Taxonomy


On the other hand, Projected Clustering, Soft Projected Clustering, Subspace Clustering, and Hybrid algorithms are common approaches for the semi-supervised and unsupervised traditional context. They use diverse kinds of models, ranging from Prototype-based models algorithms and hmrf to lp [13].

ssl can be further classified into semi-supervised classification and semi-supervised clustering [13]. In semi-supervised classification, the training set is normally given in two parts: and . Where S and U are the labeled and unlabeled data, respectively. It is possible to consider a traditional supervised scenario using only S

to build a classifier. However, the unsupervised estimation of the probability function

p(x) can take advantage of both S and U. Besides, classification tasks can reach a higher performance through the use of ssl as a combination of supervised and unsupervised learning [13].

Nonetheless, in the semi-supervised clustering, the aim is to group the data in an unknown number of groups relying on some kind of similarity or distance measures in combination with objective functions. Moreover, the nature of the data can make the clustering tasks challenging, so any kind of additional prior information can be useful to obtain a better performance. Therefore, the general idea behind semi-supervised clustering is to integrate some type of prior information in the process.

Many models from both kinds of approaches have been proposed over the years [14]. However, as mentioned, conventional forms of clustering suffer when dealing with high-dimensional spaces. In this sense, som-based algorithms have been proposed [8, 15, 16, 17]. However, most of them do not have any form to explore the benefits of more advanced techniques, even a simple form of mini-batch learning. The sssom is explained in more detail in the next section in order to establish the ideas of the model proposed in this paper.

Finally, notice that ssl is growing in machine learning alongside the deep learning context, as it is possible to see in [18, 19, 20, 21]. So, it is not unusual to find the term dssl to express deep learning methods applicable to ssl, as well as approaches combining more traditional models to work in a deep learning scenario. They range from approaches based on generative models [18] to transfer learning [9], convolutional, and SOM-based approaches [16, 22].

Iii Ss-Som

sssom [15] is a semi-supervised som, based on larfdssom [8], with a time-varying structure [23]

and two different ways of learning. It is composed of a set of neurons (nodes) connected to form a map in which each node is a prototype representing a subset of input data. The nodes in sssom can consider different relevances for the input dimensions and adapt its receptive field during the self-organization process. To do so, sssom computes the called relevance vectors by estimating the average distance of each node to the input pattern that it clusters. The distance vectors are updated through a moving average of the observed distance between the input patterns and the current center vector (prototype).

The sssom can switch between a supervised or unsupervised learning procedure during the self-organization process according to the availability of the information about the class label for each input pattern. It modifies the larfdssom to include concepts from the standard lvq [24] when the class label of an input pattern is given. The general operation of the map consists of three phases. They are: 1) Organization; 2) Convergence; and 3) Clustering and/or Classification.

In the organization phase, after initialization, the nodes compete to form clusters of randomly chosen input patterns. There are two different ways to decide which node is the winner of a competition, which nodes need to be updated, and when a new node needs to be inserted. If the class label of the input pattern is provided, the supervised learning mode is used, and each winner node in the map will be associated with a respective class label. Otherwise, the unsupervised mode is employed.

When executing the unsupervised mode, given an unlabeled input pattern, the sssom algorithm looks for a winner node disregarding their class labels. Therefore, the winner of a competition is the node that is the most activated according to a radial basis function with the receptive field adjusted as a function of its relevance vector. Also, the neighborhood of sssom is formed by connecting nodes with others of the same class label, or with unlabeled nodes. Moreover, in sssom, any node that does not win at least a minimum percentage of competitions will be removed from the map.

The convergence is pretty similar to the organization phase. However, there is no insertion of new nodes. After finishing the convergence phase, the map can cluster and classify input patterns. Depending on the amount and distribution of labeled input patterns presented to the network during the training, after the convergence phase, the map may have: 1) all the nodes labeled; 2) some nodes labeled; 3) no nodes labeled. All of these before mentioned situations are handled differently by sssom. In [15], a full description concerning each of them is given.

Iv Batch SS-SOM

Fig. 3: The basic operation performed by bsssom when a mini-batch is given, and its resulting cases.

To extend the range of applications of sssom, Batch sssom111Available at:

is introduced. Initially, to take advantage of gpu, to allow mini-batch training, and thus to be more integrated with other Deep Learning approaches that commonly use the same framework and structure, the implementation uses the PyTorch framework. Moreover, three important modifications to the baseline model in order to improve its performance under the new set of conditions are proposed.

First, when a mini-batch is given to the model, it is separated into two different mini-batches: 1) the unsupervised mini-batch; and 2) the supervised mini-batch, as shown by the first two columns of Fig. 3. For the unsupervised case, the key-point modification is to compute an average vector of all unlabeled samples that each winner node succeeded to be the most activated during the competition. After that, the process continues straightforwardly to the unsupervised procedure by sending all the average vectors and their representative winner node.

On the other hand, the supervised scenario results in three distinct situations, as illustrated by the last column of Fig. 3, that must be handled differently after finding the winner node for each sample contained in the supervised mini-batch (likewise in sssom):

  1. Fig. 3-A: A node with an undefined class is the winner for a labeled sample;

  2. Fig. 3-B: A node with a defined class is the winner for one or more samples of the same class;

  3. Fig. 3-C: A node with a defined class is the winner for one or more samples of different classes, including or not its own.

Fig. 4 shows how each of these mentioned situations is handled. In Fig. 4-A workflow, the actions are to set the node class to be the same as the input pattern and then update its position towards such an input.

Second, in Fig. 4-B, it is necessary to compute the average vector , where is the related class label, considering all the samples that are under this situation. Notice that is unique. Following, the usual supervised update procedure of sssom is called, where the class is the same for both node and average sample vector.

Third, the case illustrated in Fig. 4-C is handled as follows: for all the classes contained in this subset of samples, every different class duplicates the original winner node by preserving the centroid vector , the distance vector , and the relevance vector , but setting the class of the new duplicated node to be the same as the current treated class, as well as setting its number of victories to zero. After that, for each class l found in the current subset, a vector is calculated, and the respective duplicated node is updated using both and l, suchlike in Fig. 4-B, in which the original winner node is updated using its corresponding vector and class.

Still, notice that when this situation occurs for an unlabeled winner node j, the first calculated vector and its related class are used to update j as in Fig. 4-A. Finally, all the operations executed in the bsssom are performed in parallel on the gpu, which optimizes the computational cost and allows the model to be applied to more complex tasks, datasets, and architectures.

Fig. 4: How to handle each distinct situation from the bsssom operation when a mini-batch is given.

V Experiments

The experiments were divided into two distinct scenarios. The first one is focused on comparing bsssom with semi-supervised methods widely used in clustering tasks to show how competitive the model is. The latter demonstrates the capability of the proposed model to cluster and deal with extracted features from cnn architectures.

V-a Parameters Sampling

In order to properly adjust the parameters of the model, the lhs [25] was used. It is a statistical method for generating a random sample of parameter values from a multidimensional distribution. In this sense, for the first experimental scenario, we gathered 500 different parameter settings, i.e., the range of each parameter was divided into 500 intervals of equal probability to be sampled [25]. For the latter scenario, we sampled 10 different parameter sets using lhs. For both, a batch size of 32 was used. In Table I, the parameter ranges for the Batch sssom is provided.

Parameters min max
Activation threshold () 0.90 0.999
Lowest cluster percentage (lp) 0.001 0.01
Relevance rate () 0.001 0.5
Max competitions ()
Winner learning rate () 0.001 0.2
Wrong winner learning rate ()
Neighbors learning rate ()
Relevance smoothness () 0.01 0.1
Connection threshold () 0 0.5

Number of epochs (

1 100
  • * S is the number of input patterns in the dataset.

TABLE I: Parameter Ranges of bsssom

V-B Datasets

Before underlying in more detail the two sets of experiments, it is important to specify the used datasets.

V-B1 UCI Datasets

We select some datasets from the UCI machine learning repository [26] that were previously used in similar works to perform a comparison with our approach in the same experimental setup used for them. They are Breast, Diabetes, Glass, Liver, Shape, and Vowel.

V-B2 Mnist

The MNIST is a widely used image benchmark dataset of handwritten digits; it has 60,000 examples of the training set and 10,000 examples of the test set. Each sample fits into a 28x28 grayscale level bounding box [27]. Fig. 5 shows some of its samples.

Fig. 5: MNIST samples.

V-B3 Fashion-MNIST

Fashion-MNIST is a fashion product dataset of Zalando’s article images [28]; it shares the same image size and structure of train and test splits of MNIST dataset. Fig. 6 shows Fashion-MNIST samples.

Fig. 6: Fashion-MNIST samples.

V-B4 Svhn

The svhn dataset is a real-world house numbers dataset obtained from Google Street View images. The svhn is much harder than MNIST because images have a lack of contrast, normalization, and sometimes the digit has been overlapped by others, or it has noisy features. It consists of digits for training, digits for testing, and extra training data ranging from to digits[29]. Fig. 7 shows svhn samples.

Fig. 7: SVHN samples.

V-C bsssom on UCI Datasets

Table II shows the best values of the Clustering Error (CE) of the models over 500 runs on each of them, and as high the value is, the better is the model. In addition to the comparisons of SOM-based models, DOC [30] and PROCLUS [31] models were used. They both are commonly used benchmarks for this type of datasets.

bsssom showed to perform well in clustering tasks over a variety of datasets from UCI. In the Breast dataset, it achieved the same value as other clustering methods. In Diabetes and Vowel, it was statistically equal to LARFDSSOM [8] / sssom [15] and ALT-SSSOM [17], respectively, because in the unlabeled scenario, it behaves similarly. In the Glass, Liver, and Shape datasets, the batch size has a slightly negative influence on the outcome, once it presented a small degradation in terms of performance, which is an effect of the mean vector update rule. However, the bsssom accelerates the training process by using the mini-batches and can also be employed as a last layer of deep learning models to perform categorization tasks. In particular, it is important to point out that sssom works exactly as LARFDSSOM when no labels are available.

CE Breast Diabetes Glass Liver Shape Vowel
DOC [30] 0.763 0.654 0.439 0.580 0.419 0.142
PROCLUS [31] 0.702 0.647 0.528 0.565 0.706 0.253
LARFDSSOM [8] / SS-SOM [15] 0.763 0.727 0.575 0.580 0.719 0.317
ALT-SSSOM [17] 0.763 0.697 0.575 0.603 0.738 0.319
Batch SS-SOM 0.763 0.723 0.537 0.580 0.693 0.301
TABLE II: CE Results for Real-World Datasets. Best results for each dataset are shown in bold.

V-D bsssom using Features Extracted from Custom cnn Models

Since bsssom showed good results in comparison with its competitors in the first scenario, we accessed its performance in a more challenging task with high-dimensional data, such as images, using high-level features. To do so, we develop the following strategy. First, we trained a cnn model from scratch and then extracted the features. More specifically, we extracted the features before the classifier layer, using them as input to bsssom. Second, we defined several supervision rates, i.e., the percentage of available labels. It is worth mentioning that the sampling was not balanced. Also, this experiment indicates the effects of the number of labeled samples in the outcome results for MNIST, Fashion-MNIST, and SVHN. For this scenario, we started from MNIST and then expanded to the other datasets to guide a case study about the behavior of the model. The main idea is not to surpass any other model, but understand its behaviors when applied to more complex data structures or representations.

Different cnn architectures were evaluated in order to achieve better results for each dataset in particular. For MNIST, Fig. (a)a

describes the cnn layer block (convolution, batch normalization, ReLU and Max Pooling); Fig.


illustrates the full architecture: layer1 (16 filters, kernel size 5x5, stride 1x1, padding 2x2), layer2 (32 filters, kernel size 5x5, stride 1x1, padding 2x2), fully-connected 1 (FC1 with 32 neurons), fully-connected 2 (FC2 with 10 neurons); In Fig.

(c)c, we described the bsssom training pipeline, in which we removed FC2 and extracted features to feed bsssom with 32 input dimensions.

For Fashion-MNIST, Fig. (a)a describes the cnn layer block (convolution, ReLU and Max Pooling); In Fig. (b)b, the full architecture is given: layer1 (64 filters, kernel size 5x5, stride 1x1), layer2 (32 filters, kernel size 5x5, stride 1x1), dropout (0.5), fully-connected 1 (FC1 with 128 neurons), fully-connected 2 (FC2 with 32 neurons), and fully-connected 3 (FC3 with 10 neurons); Still, Fig. (c)c shows the bsssom training pipeline, where we remove FC3 layer to extract the features to bsssom with 32 dimensions.

(a) Custom layer block for MNIST.
(b) MNIST cnn Model: Two custom layer blocks (Fig. (a)a) followed by two fully-connected (dense) layers.
(c) bsssom training pipeline: The previous FC2 is removed, the features are extracted from FC1 and then fed to bsssom.
Fig. 8: MNIST Training Pipeline.

Lastly, for svhn, Fig. (a)a draws the full architecture: Conv2d (20 filters, kernel size 5x5, stride 1x1), MaxPool2d, Conv2d (16 filters, kernel size 5x5, stride 1x1), fully-connected 1 (FC1 with 400 neurons), fully-connected 2 (FC2 with 120 neurons), fully-connected 3 (FC3 with 84 neurons); Fig. (b)b outlines the bsssom training pipeline, where the FC3 is removed, the features are extracted from FC2, and then sent to bsssom with 84 input dimensions.

Table III illustrates the best results over 10 runs on each dataset. It showed that, as expected, bsssom has increasing gains as the number of labeled samples grows, specifically for beginning percentages. Following through, at a certain point, around 5% of labeled data, the performance stabilizes. This behavior is observed across all the datasets, showing that the proposed method is a good approach to the problem at hand. Notice that transfer learning is a difficult task, and it is a challenge for a great variety of methods. Such performance obtained by the bsssom defines a promising path through the use and application of som-based methods.

(a) Custom layer block for Fashion-MNIST.
(b) Fashion-MNIST CNN Model: Two custom layers (Fig. (a)a), and a dropout layer followed by 3 fully-connected (dense) layers.
(c) bsssom training pipeline: The previous FC3 is removed, the features are extracted from FC2 and then fed to bsssom.
Fig. 9: Fashion-MNIST Training Pipeline.
(a) SVHN cnn Model: One convolutional 2D layer followed by a max-pooling 2D, other convolutional 2D layer and 3 fully-connected (dense) layers.
(b) bsssom training pipeline: The previous FC3 is removed, the features are extracted from FC2 and then fed to bsssom.
Fig. 10: SVHN Training Pipeline.
1% 0.788 0.560 0.624
5% 0.9643 0.716 0.797
10% 0.974 0.713 0.798
25% 0.9793 0.777 0.834
50% 0.983 0.792 0.847
75% 0.9839 0.810 0.840
All 0.9836 0.826 0.846
TABLE III: The Accuracy results obtained with bsssom on each dataset according to a percentage of labeled data.

Vi Conclusion and Future Work

This paper presented the bsssom, an approach that can be applied to both classification and clustering tasks. The proposed model showed a good performance in comparison with other traditional models and also demonstrated its capabilities in the context of having to deal directly with more complex datasets and its representations.

Although the proposed approach is not far superior to other models, it can trace a promising path to follow. It can be considered as the first step towards more SOM-based models that can work effectively in non-traditional scenarios.

Our main contributions include modifications in the previous model behavior to allow dealing with more complex data structures, while still performing well in traditional tasks for which it was initially intended to do. Finally, for future work, we have left some more detailed studies with transfer learning, optimizations on the bsssom model, and a better way to estimate the unsupervised error when prior information of labels is not given.


The authors would like to thank the Brazilian National Council for Technological and Scientific Development (CNPq) and Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research study. Moreover, the authors also gratefully acknowledge the support of NVIDIA Corporation with the GPU Grant of a Titan V.


  • [1] I. Jindal, M. Nokleby, and X. Chen, “Learning deep networks from noisy labels with dropout regularization,” in 16th International Conference on Data Mining (ICDM).   IEEE, 2016, pp. 967–972.
  • [2] “Competition name: Imagenet object localization challenge.” [Online]. Available:
  • [3] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, “A survey on deep learning for big data,” Information Fusion, vol. 42, pp. 146–157, 2018.
  • [4] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, “Offline/realtime traffic classification using semi-supervised learning,” Performance Evaluation, vol. 64, no. 9-12, pp. 1194–1213, 2007.
  • [5] B. Longstaff, S. Reddy, and D. Estrin, “Improving activity classification for health applications on mobile devices using active and semi-supervised learning,” in Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2010 4th International Conference on.   IEEE, 2010, pp. 1–7.
  • [6] M.-F. Balcan, A. Blum, P. P. Choi, J. Lafferty, B. Pantano, M. R. Rwebangira, and X. Zhu, “Person identification in webcam images: An application of semi-supervised learning,” in ICML 2005 Workshop on Learning with Partially Classified Training Data, vol. 2, 2005, p. 6.
  • [7] B. Hammer and T. Villmann, “Generalized relevance learning vector quantization,” Neural Networks, vol. 15, no. 8, pp. 1059–1068, 2002.
  • [8] H. F. Bassani and A. F. Araujo, “Dimension selective self-organizing maps with time-varying structure for subspace and projected clustering,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 3, pp. 458–471, 2015.
  • [9] A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, “Realistic evaluation of deep semi-supervised learning algorithms,” arXiv preprint arXiv:1804.09170, 2018.
  • [10] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in

    Conference on Computer Vision and Pattern Recognition (CVPR)

    .   IEEE, 2009, pp. 248–255.
  • [11] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
  • [12] G. C. Nutakki, B. Abdollahi, W. Sun, and O. Nasraoui, “An introduction to deep clustering,” in Clustering Methods for Big Data Analytics.   Springer, 2019, pp. 73–89.
  • [13] F. Schwenker and E. Trentin, “Pattern classification and clustering: A review of partially supervised learning approaches,” Pattern Recognition Letters, vol. 37, pp. 4–14, 2014.
  • [14] X. Zhu, “Semi-supervised learning literature survey,” Computer Science, University of Wisconsin-Madison, vol. 2, no. 3, p. 4, 2006.
  • [15] P. H. M. Braga and H. F. Bassani, “A semi-supervised self-organizing map for clustering and classification,” in 2018 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2018, pp. 1–8.
  • [16] H. Dozono, G. Niina, and S. Araki, “Convolutional self organizing map,” in International Conference on Computational Science and Computational Intelligence (CSCI).   IEEE, 2016, pp. 767–771.
  • [17] P. H. M. Braga and H. F. Bassani, “A semi-supervised self-organizing map with adaptive local thresholds,” in 2019 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2019, pp. 1–8.
  • [18] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in Advances in neural information processing systems, 2014, pp. 3581–3589.
  • [19] L. Chen, S. Yu, and M. Yang, “Semi-supervised convolutional neural networks with label propagation for image classification,” in 2018 24th International Conference on Pattern Recognition (ICPR).   IEEE, 2018, pp. 1319–1324.
  • [20] X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in Proceedings of the 20th International Conference on Machine learning (ICML), 2003, pp. 912–919.
  • [21] H. R. Medeiros, F. D. de Oliveira, H. F. Bassani, and A. F. Araujo, “Dynamic topology and relevance learning som-based algorithm for image clustering tasks,” Computer Vision and Image Understanding, vol. 179, pp. 19–30, 2019.
  • [22] N. Liu, J. Wang, and Y. Gong, “Deep self-organizing map for visual classification,” in International Joint Conference on Neural Networks (IJCNN).   IEEE, 2015, pp. 1–6.
  • [23] A. F. Araujo and R. L. Rego, “Self-organizing maps with a time-varying structure,” ACM Computing Surveys, vol. 46, no. 1, p. 7, 2013.
  • [24] T. Kohonen, “Learning vector quantization,” in Self-Organizing Maps.   Springer, 1995, pp. 175–189.
  • [25] J. C. Helton, F. Davis, and J. D. Johnson, “A comparison of uncertainty and sensitivity analysis results obtained with random and latin hypercube sampling,” Reliability Engineering & System Safety, vol. 89, no. 3, pp. 305–330, 2005.
  • [26] A. Asuncion and D. Newman, “Uci machine learning repository,” 2007.
  • [27] Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
  • [28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
  • [29] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS workshop on deep learning and unsupervised feature learning, vol. 2011, no. 2, 2011, p. 5.
  • [30] C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. Murali, “A monte carlo algorithm for fast projective clustering,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002, pp. 418–427.
  • [31] C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park, “Fast algorithms for projected clustering,” ACM SIGMoD Record, vol. 28, no. 2, pp. 61–72, 1999.