A Deterministic Self-Organizing Map Approach and its Application on Satellite Data based Cloud Type Classification

08/24/2018 ∙ by Wenbin Zhang, et al. ∙ NASA University of Maryland, Baltimore County 12

A self-organizing map (SOM) is a type of competitive artificial neural network, which projects the high-dimensional input space of the training samples into a low-dimensional space with the topology relations preserved. This makes SOMs supportive of organizing and visualizing complex data sets and have been pervasively used among numerous disciplines with different applications. Notwithstanding its wide applications, the self-organizing map is perplexed by its inherent randomness, which produces dissimilar SOM patterns even when being trained on identical training samples with the same parameters every time, and thus causes usability concerns for other domain practitioners and precludes more potential users from exploring SOM based applications in a broader spectrum. Motivated by this practical concern, we propose a deterministic approach as a supplement to the standard self-organizing map. In accordance with the theoretical design, the experimental results with satellite cloud data demonstrate the effective and efficient organization as well as simplification capabilities of the proposed approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The self-organizing map provides an automatic data analysis technique which helps to produce a low-dimensional representation, called map, of the high-dimensional input space without any external supervision [23]

. Unlike in most biologically inspired neural network models, SOM performs competitive learning as opposed to error-correction learning by having units compete for the current object, and in the sense that it is topological preserving by having a neighborhood function to adjust the weights of neighbors of the winning unit concurrently. Various versions of the SOM models have been investigated over the years with practical applications across numerous disciplines, ranging from meteorology and oceanography to finance analysis, bioinformatics and image retrieval

[9, 13]. In the pioneering work of the SOM-based meteorology application, Malmgren et al. showed the potential of such neural network system in identifying climate zones by organizing climate data, seasonal averages of precipitation and temperatures over the course of 30 years [16]. In [5], the SOM was successfully applied to detect the dipole sea surface temperature anomaly pattern for Indian Ocean. Oja et al. used the SOM to study the mutual relationships of the HERVs and their similarities to other associated DNA elements [18]. A tree structured SOM was designed to reduce the time complexity of search for the purpose of effective image retrieval and user’s relevance feedback was interactively incorporated [10]. Including the just mentioned applications, the SOM-based approaches have gained their popularities as a powerful data analysis technique and have achieved varying degrees of success [26].

One of the shortcomings of these methods, however, is the inherent randomness of SOM and the indeterministic arising therefrom [13], which perplexes SOM, results in dissimilar SOM patterns when being trained on the same training set with identical parameters and remains it as a black box especially to users without related background. As a result, dampening more potential users’ interests in pursuing further SOM applications. Efforts have been made in offering suggestions and guidelines on how to handle the randomness of SOM, but it is desirable that the inherent randomness could be eliminated or at least minimized [14]. With this in mind, this paper proposes a variant deterministic self-organizing map to eliminate the randomness of standard self-organizing map approach. The maximum iterations parameter that was once necessary also becomes self-tuned as a byproduct of the random elimination process. These random eliminators are designed based on self-organizing map and are illustrated with an satellite cloud classification application, but they are generalizable knowledge in applications employing other learning algorithms.

In summary, the contributions of this paper are:

  • A deterministic self-organizing map is proposed to eliminate the randomness of the standard self-organizing map. This deterministic network is invariant when the same training set is used.

  • The proposed random eliminators are generalizable knowledge and are applicable for other learning algorithms. The tuning required maximum iterations parameter also becomes self-tuned during the randomness eliminating process.

  • The utilization of proposed network on real world satellite observation of clouds walks through the use of our method in practical application, which addresses the practical concerns of geoscientists and demonstrates the effectiveness and efficiency of our method.

In the following sections, related studies will be firstly reviewed in Section II. We propose our method in Section III. Section IV presents the application and then the performance will be analyzed and reasons for the results will be discussed in Section V. Finally, Section VI concludes the paper.

Ii Related Work

Ii-a Deterministic Clustering

Clustering algorithms are to partition objects into groups based on their similarity. Many clustering algorithms face indeterministic issue. For instance, as one of the mostly used clustering algorithm, the standard K-means algorithm 

[15, 28] randomly choose its initial centroids. It causes the algorithm to be very sensitive to its initial seed clusters and often produce very different results when running the K-means algorithm with the same parameter configuration. There have been several studies on how to achieve deterministic clustering [25, 4][4] studies how to reconcile clustering results from different runs of the same algorithm and derive a consensus among them. [25]

proposes to use principal component analysis (PCA) based divisive hierarchical approach for deterministic K-means initialization. This is also the mainstream in achieving deterministic clustering using SOM 

[1]. The high computational cost of determining the data-dependent PCA transform, however, hinders its applications in high-dimensional situations such as the classification of satellite cloud regimes.

Ii-B Satellite Data based Cloud Type Classification

The study of clouds, including their frequency of occurrence, location and characteristics, plays a key role in the understanding of climate change. Thick clouds in the lower atmosphere primarily reflect the incoming solar radiation and consequently cool the surface of the Earth. On the other hand, thin clouds in upper atmosphere easily transmit the incoming solar radiation and also trap some of the outgoing infrared radiation emitted by the Earth s surface and radiate it back downward, consequently warming the atmosphere and surface of the Earth.

There have been many studies on cloud type classification.  [11, 12]

used maximum likelihood (ML) classification method to classify cloud types. In recent years, K-means, as one of the main approaches, has been widely used for cloud type clustering while others started to employ SOM for the cloud study. Our previous work 

[20, 19, 6] used K-means approach to identify cloud regimes. As the pioneering work, McDonald et al.  [17] studied how to use SOM to identify cloud regimes and reported more objective organization compared to k-means. Because neither approach is deterministic, we still face usability challenges. This work is motivated by this critical practical concern. Collaborating with geoscientists, we aim to identify and interpret cloud regimes deterministically.

Iii The deterministic approach

Iii-a Standard Self Organizing Map

An SOM [8]

is made up of a set of nodes. Each node holds a representative feature vector called the prototype. The standard SOM starts from randomly initializing the prototype feature vector of each node in the map. From there a sample vector is randomly selected and fed to the network, its Euclidean distances to all prototype vectors are then computed in order to find the neuron that most closely matches with the current sample vector. The prototype vector that best represents that sample becomes the winning unit and is called the Best Match Unit (BMU). Next, the neurons belong to the neighborhood set of BMU are also activated and the prototype vectors of all activated neurons are adjusted towards the input vector at the same time. From this step, the magnitude of the adjustment decreases with time and with distance from the BMU in an attempt to preserve topology relationships that exist within the input data. This whole process iterates until the predefined stopping condition is met. The common theme through the following sections is to eliminate the randomness of standard self-organizing map for the stable and efficient purpose, thus simplify the use of SOM for cloud regimes identification automata. A summary of notations used in this paper is given in Table 

I.

Notation Description
The prototype vector of each node in the map at time t
The current sample’s feature vector
The node that best matches with current sample’s feature vector
Initial learning rate
Learning rate at time t
Initial neighborhood radius
Neighborhood radius at time t
TABLE I: Notation used for method description.

Iii-B Update Procedure

The implementation of self-organizing map algorithm demands the instantiation of its update procedure and there are two key components involved in the procedure: neighborhood radius and learning rate. The randomness roots in update procedure arises from the tunable parameters of distinct update functions as different parameters result in dissimilar SOM patterns. It is desirable that the tuning process could be eliminated or at least minimized [14]. Our approach therefore employs the update functions in which maximum iterations is the exclusive tunable parameter, and this parameter is self-tuned in the devised staggered sample selection method to be discussed in Section III-E.

The first component neighborhood radius comes in a variety of flavors [26]. Any node that is inside the neighborhood radius of a node that is to be updated also gets updated to a degree. Our approach uses the circle neighborhood that has an initial radius equal to half of the size of the smallest dimension of the SOM. Formally put:

(1)

The radius of the SOM needs to decay with time so that the map will stabilize into its final organization. Here is the decay function used:

(2)

where , time constant and logarithmic base.

This decay function exponentially decays the original radius to 1 when t reaches its maximum value, which is the maximum number of iterations which is specified in Section III-C. If the SOM is continued to be trained at a radius of 1 the nodes immediately to the top, right, bottom, and left of the BMU would still be affected but this is not the case because the equation only becomes 1 when the maximum iteration is reached, meaning training is finished. This neighborhood function proves to work well as the amount that one node changes those around it decreases with time as it should [3].

The learning rate is the amount of impact a sample that matches best to a node should have on that node and its neighbors. Like the neighborhood the learning rate similarly decays with time. Here is its function:

(3)

where , time constant and logarithmic base.

This is the same exponential decay function as the neighborhood radius except the time constant has changed to the given maximum number of iterations. The conducted experiments of this work show that speed and accuracy both increase as the initial learning rate decreases to a point. An initial rate of 0.1 works well. This is due to the fact that a high rate causes more oscillation in the prototype vectors of the SOM nodes because as two or more samples may jockey for main position in the same BMU. A high learning rate also causes the node to swing more toward the most recently matching sample than it should, meaning it “forgets” the impact of the other nodes that matched to it before too quickly.

There is another factor that adjusts the learning rate of a neighborhood node () of a BMU () by taking the actual distance that the neighboring node is from the BMU into account. Here is the mathematical definition of influence used in our method, note that it also decays with time:

(4)

where , influence on node (i, j) by (k, l) at time t, distance from node (i, j) to (k, l) and logarithmic base.

As is evident the influence exponentially decreases the further the neighboring node is from the BMU using the ratio between the distance between the two nodes and the current neighborhood diameter.

All of the pieces are combined into a single vector update equation, formally put:

(5)

Iii-C Stopping Condition

Considering that the speed at which the SOM can be trained hinges on the stopping condition, we consider the two types of convergence mechanisms in our approach. The first type of stopping condition simply uses a user specified maximum iterations parameter as the upper bound on the training iterations. An iteration is the whole training set being shown to the SOM. The maximum number of training times is based on the given parameter and then the training stops. The drawback to this approach is that it does not take the map activity into account therefore training may proceed through many iterations while the SOM is actually producing no improvement [26]. In this case if the system had some measurement of improvement after each iteration it could stop earlier when it sees that there was no improvement can be made.

The second type stopping condition, namely No Moves, defines “no improvement” in SOM’s status as no training samples changing their best match unit in a complete iteration of the training set [26]. Using this condition the training process is stopped as soon as it sees no improvement. While this stopping condition could be used alone in some SOMs our approach uses it in conjunction with the maximum iterations condition described above. It is because, as described in Section III-B

on the proposed method’s updating procedure, the learning rate and neighborhood radius update equations require maximum number iterations to be known. In any event, using this condition alone could also result in infinite training if update factors are not set up to decay correctly. For this reason the maximum iterations condition can be think of hidden in the background as a safety net that will rarely being need, but it is nice to have there just in case. It has to be said that while this stopping condition, as a proxy for the true goal, might not be fully optimized just like other heuristic strategies, a local optimum returned by simple greedy search may be better than the global optimum

[2]. The study of the SOM’s behavior from this work also has shown that the point where the samples stop changing their BMU is the point where map configuration has often peaked. On the occasion beyond “often”, it is the chance that it has not and as such user runs the risk when using this method that user may miss out on a slightly better map organization, it is really matter whether user wants to wait through the maximum number iterations or not for a little improvement, if any. There is another small caveat that developed through this study’s observing of the SOM’s activities. If this stopping condition is used and the training usually halts before the specified maximum iterations is reached. Although samples are distributed correctly the actual prototype vector of the node may not reflect the samples as well as it should. Recall that a SOM works on the “best match principle” and so although the prototype vector may not represent the samples in its node well it can still be the best match for those samples when compared with the rest of the nodes in the map. A figured out way to cope with this is to turn the initial learning rate up a little. Although this flies in the face of what is found to work best in the general case in the learning rate section above, it tends to produce slightly better map configuration in some cases if the map does converge too fast under this condition. The reason for this is that if the map converges too fast the number of iterations is small and as such given that the training samples are only shown to the SOM a short few times they need to impact their best match unit’s prototype vector fast so that any sample shown to the SOM will find the prototype vectors containing the other samples it usually should.

Iii-D Initialization Method

Initializing a node refers to setting the values in the prototype vector of that node before training begins. Random initialization is the common technique used to initialize the nodes. It simply means to run through every value in the nodes’ vectors that need to be initialized and set them to a random value. It was found that this technique is good for producing an even distribution but because of the randomness a random distribution will be produced every time [1]. Even though the cloud data samples will organize nicely once there has been enough iteration chances are they will be in different cloud type nodes when the training completes. The layout of these different nodes may be better or worse than the other times, there is no guarantee due to the randomness. To eliminate the initialization randomness of standard SOM method, we set up the nodes with a smooth transition from the top left to the bottom right corner. The gradient initialization computes the initial values of the prototype vector of any node () in the following way:

(6)

where is the maximum distance possible of the map, that is the distance from the top left node to the bottom right node.

Iii-E Sample Selection

It is important with most, if not all, neural networks that during the training process the samples are not fed to the network sequentially in the same order every time. Doing this may cause a bias for either the beginning or the ending input samples, depending on the type of network [27]. Random selection is the commonly used technique. As its name suggests, available samples are randomly selected from the training set during sample selection step of standard SOM. The consequences of this randomness are that while a good distribution should be achieved it may not always be, and not only that, it is hard to tell one training run from another since nodes’ sets of samples, while normally similar, will be shifted around. The organization will thus be randomly produced each time with this method [26]. To come up with consistent results a staggered selection method that tries to maintain the good characteristics of randomness and eliminate the actual randomness is devised. The staggered approach gives all of the training samples equal opportunity to start an iteration at some point during the training. As well as giving the samples this equal opportunity it ensures that the samples are not shown to the learning algorithm in the same order during each training iteration. The idea is detailed in Algorithm 1.

Input: Training samples’ index.
Output: Training order list;
           Maximum iterations of SOM.
1while  do
2       if  then
3            ;
4      else
5            ;
6       end if
7      ;
8       do
9             Train the network on the sample at ;
10             if  then
11                   ;
12                  
13            else
14                   ;
15                  
16             end if
17            
18      while  ;
19      if reverse then
20            ;
21      else
22            ;
23       end if
24      ;
25       ;
26      
27 end while
Algorithm 1 Staggered sample selection algorithm

In Algorithm 1, a front index () and a back index () that point to the first element and last element in the list of training samples are initialized respectively. The current direction () is initialized as , which means the current direction is set to forward. Another two indexes, start index and current index () refer to the initial element and the current element respectively. The staggered sample selection first (lines 2-6) determines the sample that bootstraps each iteration during the training. If the current direction is forward the start index () is set equal to the front index, otherwise, i.e., when is , set the start index equal to the back index. Lines 8-15 complete one whole training iteration. The network is first trained on the sample that the current index points to. Next, if the current direction is forward increment the current index until the current index is equal to the front index. In the meanwhile, any change in the current index that goes outside the range of the training list wraps around to the other side. The same scheme applies when the current direction is reverse but the current index works in decrement fashion until the current index is equal to the back index. Lines 16-20 update the front index or back index after each iteration depending on the current direction. If the current direction is forward increment the front index, otherwise decrement the back index. Finally the current direction is reversed per iteration as well, that is to say if the current direction is forward then set it to reverse, otherwise set the current direction to forward. The algorithm ceases when the front index is greater than the back index.

This staggered method of selection produces equivalent results to its random counterpart every time. This equivalence is guaranteed because the results are not random and therefore have the same performance every time. In addition, by changing the start sample and reversing the order of input after each iteration, this method has the effect of evening out the influence of a single training sample because right after a sample is used the first becomes the last, and then second, and then second last, and so on. Another convenience of this method is that the maximum number of iterations is chosen by the size of the training list alone making the maximum iterations parameter that was once necessary obsolete. In summary, this staggered method of selection is relatively fast, consistent, and tunes the maximum iterations parameter, which is required in the update equations and stopping condition.

Iii-F Towards Self-tuned

Despite its gained popularity as a powerful data analysis technique in a variety of communities, the self-organizing map remains as a black box especially to users without related background due to its parameter choices. Although efforts have been made in providing guidelines on how to tune the SOM, the distinct choices of tunable parameters may result in dissimilar SOM patterns. It is thus anticipated that the parameter choices could towards self-tuned in order to further streamline the usage of self-organizing map based approaches [14]. In our proposed approach, an SOM can be produced by supplying just two parameters, the average samples per node desired (from which the SOM dimension can be derived) and the initial learning rate of the SOM. The initial neighborhood radius, radius decay function, learning rate function as well as the maximum iterations parameter are self-tuned during the randomness eliminating process, which minimize the tuning effort thus simplify the use of SOM for geoscience as well as other domain practitioners.

Iv Application

The study of clouds, including their frequency of occurrence, location and characteristics, plays a key role in the understanding of climate variability and climate change. Clouds have complex impacts on the Earth s climate since they interact with both the incoming solar radiation and outgoing infrared radiation, with the interactions depending strongly on cloud altitude and thickness. In this sense, cloud optical thickness (COT) and cloud top pressure (CTP) are key variables for describing both the solar and infrared radiative effects of cloud. A data set of passive cloud retrievals, the International Satellite Cloud Climatology Project (ISCCP) employed these two variables to build a 2-D joint histogram [24], shown in Figure 1, to distinguish among different cloud types with distinct radiative effects.

Fig. 1: Assignment of traditional cloud types to 2-D joint histogram of COT and CTP (so-called ISCCP-like 2-D histogram) [24].

The 2-D joint histograms of satellite cloud retrievals have proven to be a useful dataset to perform and study cloud classification. Because both optical thickness and top pressure of cloud may vary significantly in the scale of (100km), a rather expansive 2-D COT-CTP joint histogram is required to describe co-variation of COT and CTP. The ISCCP-like MODIS 2-D joint histogram consists of 42 elements (= 6 classes of COT 7 classes of CTP), with each element representing the occurrence of a specific COT-CTP combination as cloud fraction (CF) ranging from 0 to 1. A big scientific challenge is to group the satellite images represented by the 2-D joint histograms into different clusters, one example of which is the “Cloud Regime” (CR) [19, 6].

The “Cloud Regime” is a concept of dominant mixtures of cloud types represented by the means of similar co-variations of 2-D joint histogram. Our previous works in [20, 19]

obtained CR from K-means clustering analysis 

[15]. Figure 2 shows one of optimal centroid of K-means clustering using the same data set at the same region (tropics; see Section V-A for details of data set). It shows that tropical cloud variability can be explained by 5 high cloud regimes, 4 low cloud regimes, and 1 semi-clear regime with quite low CF. The relative frequency of occurrence (RFO) map indicates that the high cloud regimes usually occur over the tropical warm pool area, intertropical convergence zone (ITCZ), and land area, while low and thick clouds crowds eastern side of oceans (not shown). The semi-clear regime is very popular all over the tropics.

Fig. 2: The cloud regime (CR) centroids of daily ISCCP joint histograms. The cloud fraction (CF) of each regime, namely sum of 42 bin CF values, is also provided.

V Experiments

This section analyzes how specific details of the proposed deterministic network affects its map configuration and running time when it is applied for satellite cloud classification.

V-a Dataset

In this experiment, we used 2-D joint histogram of COT and CTP from the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument aboard the Aqua satellite. The MODIS cloud data set (MYD08 D3 [7, 21]) provides Level-3 cloud products at daily timescales with 1 1 horizontal resolution. We used the latest version of the MODIS atmospheric data sets, “Collection 6” [22]. Specifically, we used Level-3 2-D joint histogram in the tropics (15S - 15N) for one year (2005). Thus the input dimension is 42 array elements, 360 30 spatial elements (grid cells), and 365 days. For the clustering analysis, missing data and completely cloud-free data (all 42 values are zero) are excluded. There are 3,445,612 records in total.

V-B Cloud Classification Result

We first investigate the map configuration of our proposed network w.r.t. determinateness and cloud regimes classification. A 43 SOM was selected as suggested by [17] and initial learning rate was set as 0.1 for all map experimentation. Figure 3 and Figure 4 show the CR joint histograms and the associated relative frequency of occurrence (RFO) map associated with each node in the SOM from multiple executions of the same standard SOM with the same parameter configurations. It clearly shows that the standard SOM produces different CR histograms when trained with the same training set and identical parameters. On the contrary, using the proposed algorithm, the results, shown in Figure 5 and Figure 6, are invariant when the same training set and dimension are used. This confirms that our proposed method of random eliminators produces consistent and predictable results in terms of physical sense.

Compared to the K-means results shown in Figure 2, the SOM results produce reasonable CRs despite the different configuration of total number of CRs. For example, the CR histograms of deterministic SOM result (Figure 5) contain all of CR histogram characteristics shown in Figure 2. In the case of standard SOM results (Figure 3), the CR output is slightly unsatisfactory because CR2 and CR3 here shares large similarity in both histogram and RFO map patterns, correlation coefficients of which are 0.60 and 0.74, respectively. This result shows that our deterministic SOM algorithm produces quality CRs.

(a) Standard SOM execution result 1.
(b) Standard SOM execution result 2.
(c) Standard SOM execution result 3.
Fig. 3: The SOM cloud type vectors displayed as joint histograms by three standard SOM execution results using the same parameter configuration.
(a) Standard SOM execution result 1.
(b) Standard SOM execution result 2.
(c) Standard SOM execution result 3.
Fig. 4: The relative frequency of occurrence (RFO) corresponding to Figure 3.
Fig. 5: The SOM cloud type vectors displayed as joint histograms using proposed deterministic SOM.
Fig. 6: The relative frequency of occurrence (RFO) corresponding to Figure 5.

V-C Execution Time

The running time of the satellite cloud classification task is mainly dominant by the time for training and classifying. Specific to SOM, after the training is complete, each processing cloud data is labeled with associated cloud type. The training speed is therefore responsible for the main execution time difference. In this experiment, we verify the efficiency of two proposed random eliminators in determinizing SOM. These two eliminators include (1) the use of gradient initialization to compute the initial values of the prototype vector of cloud type node (denoted as GI); (2) staggered sample selection to feed cloud data for training (denoted as SSS). The performance difference of adding each eliminator is shown in Table II.

Eliminators SOM SOM+GI SOM+GI+SSS
Time 41 36 35
TABLE II: Running time (minutes) comparison of adding each eliminator for satellite cloud classification (GI: gradient initialization, SSS: staggered sample selection, SOM: standard SOM).

Table II shows our proposed network, other than the desired deterministic property, also produces its results in a more efficient manner as a bonus. Sufficed to say, this is because after initializing the network gradiently, a pattern is already present so the nodes organize around it more quickly than having to jostle randomly initialized ones into a pattern while they organize themselves. The further inclusion of staggered sample selection incurs no extra runtime costs but guarantees the map configuration is not random and maintains the good characteristics of random selection at the same time.

Vi Conclusion and Future Work

This work is motivated by the usability concern caused by inherent randomness of SOM. To address this practical concern, we propose a deterministic self-organizing map with effective satellite cloud type organization and execution capabilities. The improvements by including the concocted random eliminators are that not only both the speed and clustering of the training improved but by running with identical training samples the same resultant SOM will be produced every time. These random eliminators are generalizable knowledge and can be used as supplements to other iterative methods of learning neural networks. We successfully applied our deterministic SOM to a real-world scientific application to demonstrate its effectiveness and efficiency. It is anticipated that the proposed deterministic SOM could simplify the usage of self-organizing map based approaches and stimulate more potential user’s interests in pursuing further SOM applications. In the future, we plan to extend the proposed network in conjunction with our previous work [29] for streaming scenarios.

Vii Acknowledgment

This work is supported by the grant CyberTraining: DSE: Cross-Training of Researchers in Computing, Applied Mathematics and Atmospheric Sciences using Advanced Cyberinfrastructure Resources from the National Science Foundation (grant no. OAC–1730250).

References

  • [1] A. A. Akinduko, E. M. Mirkes, and A. N. Gorban. Som: Stochastic initialization versus principal components. Information Sciences, 364:213–221, 2016.
  • [2] P. Domingos.

    A few useful things to know about machine learning.

    Communications of the ACM, 55(10):78–87, 2012.
  • [3] A. Flexer. On the use of self-organizing maps for clustering and visualization. Intelligent Data Analysis, 5(5):373–384, 2001.
  • [4] A. Goder and V. Filkov. Consensus clustering algorithms: Comparison and refinement. In Proceedings of the Meeting on Algorithm Engineering & Expermiments, pages 109–117. Society for Industrial and Applied Mathematics, 2008.
  • [5] I. Iskandar, T. Tozuka, Y. Masumoto, and T. Yamagata. Impact of indian ocean dipole on intraseasonal zonal currents at 90 e on the equator as revealed by self-organizing map. Geophysical Research Letters, 35(14), 2008.
  • [6] D. Jin, L. Oreopoulos, and D. Lee. Regime-based evaluation of cloudiness in cmip5 models. Climate Dynamics, 48(1):89–112, Jan 2017.
  • [7] M. D. King, W. P. Menzel, Y. J. Kaufman, D. Tanré, B.-C. Gao, S. Platnick, S. A. Ackerman, L. A. Remer, R. Pincus, and P. A. Hubanks. Cloud and aerosol properties, precipitable water, and profiles of temperature and water vapor from modis. IEEE Transactions on Geoscience and Remote Sensing, 41(2):442–458, 2003.
  • [8] T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480, 1990.
  • [9] T. Kohonen. Essentials of the self-organizing map. Neural networks, 37:52–65, 2013.
  • [10] J. Laaksonen, M. Koskela, and E. Oja. Picsom: Self-organizing maps for content-based image retrieval. In Neural Networks, 1999. IJCNN’99. International Joint Conference on, volume 4, pages 2470–2473. IEEE, 1999.
  • [11] J. Li, W. P. Menzel, Z. Yang, R. A. Frey, and S. A. Ackerman. High-spatial-resolution surface and cloud-type classification from modis multispectral band measurements. Journal of Applied Meteorology, 42(2):204–226, 2003.
  • [12] Z. Li, J. Li, W. P. Menzel, T. J. Schmit, and S. A. Ackerman. Comparison between current and future environmental satellite imagers on cloud classification using modis. Remote Sensing of Environment, 108(3):311–326, 2007.
  • [13] Y. Liu and R. H. Weisberg. A review of self-organizing map applications in meteorology and oceanography. In Self Organizing Maps-Applications and Novel Algorithm Design. InTech, 2011.
  • [14] Y. Liu, R. H. Weisberg, and C. N. Mooers.

    Performance evaluation of the self-organizing map for feature extraction.

    Journal of Geophysical Research: Oceans, 111(C5), 2006.
  • [15] J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In

    Proceedings of the fifth Berkeley symposium on mathematical statistics and probability

    , volume 1, pages 281–297. Oakland, CA, USA, 1967.
  • [16] B. A. Malmgren and A. Winter. Climate zonation in puerto rico based on principal components analysis and an artificial neural network. Journal of climate, 12(4):977–985, 1999.
  • [17] A. J. McDonald, J. J. Cassano, B. Jolly, S. Parsons, and A. Schuddeboom. An automated satellite cloud classification scheme using self-organizing maps: Alternative isccp weather states. Journal of Geophysical Research: Atmospheres, 121(21), 2016.
  • [18] M. Oja, P. Somervuo, S. Kaski, and T. Kohonen. Clustering of human endogenous retrovirus sequences with median self-organizing map. In Proc. WSOM, volume 3, 2003.
  • [19] L. Oreopoulos, N. Cho, D. Lee, and S. Kato. Radiative effects of global modis cloud regimes. Journal of Geophysical Research: Atmospheres, 121(5):2299–2317, 2016.
  • [20] L. Oreopoulos, N. Cho, D. Lee, S. Kato, and G. J. Huffman. An examination of the nature of global modis cloud regimes. Journal of Geophysical Research: Atmospheres, 119(13):8362–8383, 2014.
  • [21] S. Platnick, M. D. King, S. A. Ackerman, W. P. Menzel, B. A. Baum, J. C. Riédi, and R. A. Frey. The modis cloud products: Algorithms and examples from terra. IEEE Transactions on Geoscience and Remote Sensing, 41(2):459–473, 2003.
  • [22] S. Platnick, K. G. Meyer, M. D. King, G. Wind, N. Amarasinghe, B. Marchant, G. T. Arnold, Z. Zhang, P. A. Hubanks, R. E. Holz, et al. The modis cloud optical and microphysical products: Collection 6 updates and examples from terra and aqua. IEEE Transactions on Geoscience and Remote Sensing, 55(1):502–525, 2017.
  • [23] G. Pölzlbauer. Survey and comparison of quality measures for self-organizing maps. na, 2004.
  • [24] W. B. Rossow and R. A. Schiffer. Advances in understanding clouds from isccp. Bulletin of the American Meteorological Society, 80(11):2261–2288, 1999.
  • [25] T. Su and J. Dy. A deterministic method for initializing k-means clustering. In

    Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on

    , pages 784–786. IEEE, 2004.
  • [26] H. Yin. The self-organizing maps: background, theories, extensions and applications. In Computational intelligence: A compendium, pages 715–762. Springer, 2008.
  • [27] B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, page 114. ACM, 2004.
  • [28] W. Zhang, J. Tang, and N. Wang. Using the machine learning approach to predict patient survival from high-dimensional survival data. In Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on, pages 1234–1238. IEEE, 2016.
  • [29] W. Zhang and J. Wang. A hybrid learning framework for imbalanced stream classification. In Big Data (BigData Congress), 2017 IEEE International Congress on, pages 480–487. IEEE, 2017.