I Introduction
In hyperspectral remote sensing problems, it is difficult to find an optimal segmentation algorithm that covers all the spectral bands. Some objects are recognized on specific spectral bands, whereas other objects may require the processing of different bands together. For example, the algorithms with a set of selected parameters may successfully detect objects such as water and shadow in the nearinfrared (NIR) band, but may fail to detect objects which provide color or textural information, such as farms and buildings. Therefore, one may need to employ more than one segmentation output obtained from multiple spectral bands to extract various types of objects. Additionally, depending on the object types, one may need to employ more than one set of features in the segmentation algorithms.
In this study, we introduce a new approach for the segmentation fusion problem based on a consensus clustering algorithm, called Stochastic Filtered Best One Element Move (Filtered Stochastic BOEM) [1]. The proposed method can also be employed to find the optimal set of parameters for a segmentation algorithm for a dataset. We first employ different segmentation algorithms or a single segmentation algorithm with a set of different parameters to a remote sensing image and obtain a set of candidate outputs. Then, we design a fusion strategy by adapting the Filtered Stochastic BOEM method. There are two major contributions of the proposed segmentation fusion method. The first is to formalize the Filtered Stochastic BOEM method as a segmentation fusion problem, where we design a new distance learning method. The second contribution is to embed the computation of the optimal cluster number into the Filtered Stochastic BOEM method. In the suggested framework, we assume that some of the segments in the candidate segmentation set are expected to represent acquired target objects.
Three wellknown segmentation algorithms, kmeans, Graph Cuts
[2, 3, 4] and Mean Shift [5, 6] are used as the base segmentation algorithms in order to segment benchmark hyperspectral image datasets. In the next section, we introduce our segmentation fusion method. We examine the suggested method with various experiments in Section 3. Section 4 concludes the paper.Ii Filtered Stochastic BOEM Formulated for the Fusion of Segmentation Algorithms
Filtered Stochastic BOEM [1] is a consensus clustering algorithm which approximates a solution to the Median Partition Problem [7] by integrating BOEM [7]
and Stochastic Gradient Descent (SGD)
[8].In the proposed segmentation fusion method, we first feed an image to segmentation algorithms , . Each segmentation algorithm is employed on to obtain a set of segmentation outputs where is a segmentation output, is the set of segment labels with pixels with different segment labels, and a distance function .
An initial segmentation is selected from the segmentation set consisting of
segmentations using algorithms which employ search heuristics, such as Best of K
(BOK) [7]. Then, a consensus segmentation is computed by solving the following optimization problem:Given two segmentations and , the distance function is defined as the symmetric distance function (SDD) given by , where is the number of pairs cosegmented in but not in , and is the number of pairs cosegmented in but not in . In order to compare segmentations with a different number of pixels and segmentations , we use a normalized form of which is called Average Sum of Distances (Average SOD)
(1) 
At each iteration of the optimization algorithm, a new segmentation is computed. Specifically, a segmentation is randomly selected from the segmentation set. Then, the best one element move of the current segmentation is computed with respect to the objective of the optimization and applied to the current segmentation to generate a new segmentation. If there is no improvement on the best move, the current segmentation is returned by the algorithm.
Similar to the gradient descent method, the best one element move of segmentation is defined as
and can be evaluated by , where is the objective at time . Using the assumption that single element updates do not change the objective function, can be approximated by with a scale parameter . Then,
where is the randomly selected segmentation for updating the current BOEM. If an matrix is defined such that the row and the column of the matrix, , is the updated value of obtained by switching element of to the segment label, the move can be approximated by
(2) 
if the segmentation is selected for updating at time .
In the proposed Segmentation Fusion Algorithm, we initialize at . Until reaches a given termination time , we update the segmentation . We randomly select a segmentation from a pseudorandom permutation of the numbers until we traverse all the segmentations in . Then, we generate a new segmentation and repeat this operation until all of the permutations are traversed. We update by aggregating with the scaled . controls the convergence rate and the performance of the algorithm. If , the algorithm becomes pure stochastic BOEM and the algorithm is memoryless. If , the algorithm forgets slowly. However, Zheng, Kulkarni and Poor [1] reported that the algorithm may perform worse if is on either end of . Selection of the optimal values for segmentation fusion is explained in the next section. After is updated, we compute in order to update . We iterate the algorithm until the termination criterion is achieved.
Iia Distance Learning
In this section, we propose a method, called distance learning that employs the training data to measure the distance between two segmentations obtained at the output of different segmentation algorithms. The proposed distance learning method is also flexible for measuring the distance between two segmentations with different numbers of segments.
We first define Rand Index (
), which is used to estimate the quality of the segments. Given two segmentations
and , is defined as , where . However, is not corrected for chance, for instance, the average distance between two segmentations is not zero and the distance depends on the number of pixels [9]. Therefore, we assume that each segmentation consists of different numbers of segments . We define as the number of pixels in the segment of , and as the number of pixels in both the segment of and the segment of . In addition, we assume that andare randomly drawn with a fixed number of segments, and a fixed number pixels in each segment according to a generalized hypergeometric distribution
[10]. Then, an adjusted version of called Adjusted Rand Index () [10] is defined as(3) 
where , and
.
Note that, if we apply our assumptions for equal segmentation sizes in , we obtain (1) [7]. Instead, we compute for each different base segmentation algorithm output with different segment numbers and is computed from , such that [7]. We call this method Distance Learning for BOEM (DL) in which we learn by computing using the data.
An important assumption that is made in the derivation of [11] is that the number of pixels in each segment is the same. However, this assumption may fail in the segmentation of images that contain complex targets, such as airports or harbors.
In order to relax this assumption, we employ a normalization method for quasidistance functions, introduced by Luo et al. [12] as
(4) 
where and are the minimal and maximal values of . Luo et al. [12] states that the exact computation of for any segmentation distribution is not known and they introduce several approximations. In the experiments, we employ (4) as the method called Quasidistance Learning(QD). For the details of the algorithms to solve (4), please refer to [12].
An important difference between (3) and (4) is that we consider the minimal and maximal values of the distances between the pairwise segmentations as the normalization factors, in order to compute the distances between and , in (4). On the other hand, (3) considers the expected values of the distances between all of the segmentations in the computations.
If the training data is available, then and can be computed using the training data and employed to test data. However, one must assure that the statistical properties of training and test data are equivalent in order to employ the learning methods. We observe that this equivalent requirement may not be satisfied in remote sensing datasets in the experiments because of the variability of the images in the context of space and time.
IiB Estimating Number of Clusters and parameters for BOEM
One of the crucial problems of image segmentation is to estimate the number of clusters that forms different segments, , in the image. This problem is very crucial for the segmentation of remotely sensed images even if the images are labeled using expert knowledge.
In order to estimate in the base segmentation algorithms, several clustering validity indices can be employed [13]. In this section, we introduce a new method to estimate for segmentation fusion. For this purpose, we consider a segmentation index (SI) for BOEM as , where is the set of segmentations where each segmentation contains segments with different labels [14]. Then, we solve the following optimization problem,
(5) 
where is the maximum value of provided by the user. Vinh and Epps [14] compared Normalized Mutual Information and for the estimation of segment number on several datasets. Since both of the algorithms agree on the segment number in various experiments, we employ in our experiments for estimating .
A similar approach is employed to estimate the parameter . Given a set of values , we introduce a beta index () as , where is the output segmentation of the Segmentation Fusion Algorithm implemented using . Then the optimal is computed by solving the following optimization algorithm:
(6) 
Iii Experiments
We use two indices to measure the (dis)similarity between an output image and the ground truth of the images as performance criteria: i) Rand Index (), and ii) Adjusted Rand Index () [9], which takes values in . When the output image and the ground truth image are identical, the and the are equal to . Moreover, the equals when the equals its expected value.
Average Base  Algorithm 1  DL  QD  
0.703  0.704  0.710  0.714  
0.159  0.160  0.184  0.174 
In the first set of experiments, we employ the proposed segmentation fusion algorithms on band Thematic Mapper Image which is provided by MultiSpec [15]. We split the image with size into training and test images: i) a subset of the pixels with coordinates and is taken as the training image and ii) a subset of the pixels with coordinates and is taken as the test image. In the images, there are clusters corresponding to different segments.
We first implement kmeans on different bands, in order to perform multimodal data fusion. The termination time of Filtered Stochastic BOEM is set to . Assuming that we do not know the number of clusters in the image, we employ (5) using the training data in order to find the optimal for . Then, we find with . We employ (6) for and find with . The results of the experiments on the test data of Thematic Mapper Image are given in Table I. In the Average Base column, the average performance values of kmeans algorithms are given. The performance values of the segmentation fusion algorithm are given in the column labeled Algorithm 1. We observe that the performance values of Algorithm 1 are similar to the arithmetic average of the performance values of kmeans algorithms. The performance of Distance Learning and Quasidistance Learning algorithms, are given in DL and QD, respectively. Since distance functions for Algorithm 1 are computed using the segmentationwise values in DL and QD, we observe that performance increases in the values of DL and QD compared to Algorithm 1.
In the second set of the experiments, we employ kmeans, Graph Cut and Mean Shift algorithms on band training and test images. Now, the image segmentation problem is considered as a pixel clustering problem in dimensional spaces. We find and with using the training data. The results on the test data are given in Table II. The performance values of Algorithm 1 are closer to the performance values of the Mean Shift algorithm, since the output image of Algorithm 1 is closer to the output segmentation of the Mean Shift algorithm. We observe that the values of DL are greater than the values of QD, since DL computes the distance functions by computing the values between the segmentations. However, the values of QD are greater than the values of DL, since QD calibrates distance functions considering the distance measure of the .
kmeans  Graph Cut  MeanShift  
RI  0.715  0.717  0.714 

ARI  0.125  0.132  0.176 
Algorithm 1  DL  QD  
0.714  0.710  0.724  
0.176  0.180  0.178 
In the third set of experiments, we employ kmeans algorithm on each band of band Moderate Dimension Image: June 1966 aircraft scanner Flightline C1 (Portion of Southern Tippecanoe County, Indiana) [15]. The size of the image is , and there are clusters in the ground truth of the image [15]. We randomly select pixels for training and the remaining pixels for testing. We find and with using the training data. The results on the test data are given in Table III and Table IV. We observe that the performance values for Algorithm 1 are smaller than the average performance values of base segmentation outputs. Since the distance functions are computed for each segmentation pair, we achieve better performance for distance learning algorithms (DL and QD).
Ch1  Ch2  Ch3  Ch4  Ch5  Ch6  
0.537  0.531  0.528  0.532  0.532  0.523  
0.014  0.006  0.009  0.009  0.006  0.003  
Ch7  Ch8  Ch9  Ch10  Ch11  Ch12  
0.529  0.531  0.534  0.527  0.540  0.540  
0.000  0.008  0.015  0.003  0.023  0.018 
Average Base  Algorithm 1  DL  QD  
0.532  0.530  0.533  0.530  
0.009  0.007  0.011  0.011 
Iv Conclusion
In this study, we introduce a new approach for the fusion of the segmentation outputs of several segmentation algorithms to achieve a consensus segmentation. Therefore, the output segmentation fusion algorithm can be interpreted as the image representing the mutual information on a set of segmentation outputs obtained from various segmentation algorithms.
We construct the candidate segmentation set by using the kmeans, Mean Shift and Graph Cuts methods applied on the hyperspectral images. The parameter optimization of the segmentation is embedded into the Filtered stochastic BOEM method. Additionally, the distance metrics are learned using the training data in order to enhance the segmentation performance without preselecting parameters, or evaluating the outputs for specific targets. The performances of the suggested segmentation fusion algorithm demonstrates its efficacy in compromising oversegmented results and undersegmented results.
References
 [1] H. Zheng, S. R. Kulkarni, and H. V. Poor, “Consensus clustering: The filtered stochastic bestoneelementmove algorithm,” in Proc. 45th Conf. Inf. Sci. Syst. (CISS), Baltimore, MD, Mar. 2011, pp. 1–6.
 [2] S. Bagon, “Matlab wrapper for graph cut,” Dec. 2006.
 [3] Y. Boykov and V. Kolmogorov, “An experimental comparison of mincut/maxflow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1124–1137, Sep. 2004.
 [4] Y. Boykov, O. Veksler, and R. Zabih, “Efficient approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1222–1239, Nov. 2001.

[5]
K. Fukunaga and L. Hostetler,
“The estimation of the gradient of a density function, with applications in pattern recognition,”
IEEE Trans. Inf. Theory, vol. 21, no. 1, pp. 32 – 40, Jan. 1975.  [6] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603 –619, May 2002.
 [7] A. Goder and V. Filkov, “Consensus clustering algorithms: Comparison and refinement.,” in ALENEX, J. Ian Munro and Dorothea Wagner, Eds., San Francisco, CA, USA, Jan. 2008, pp. 109–117.

[8]
L. Bottou,
“Stochastic learning,”
in
Advanced Lectures on Machine Learning
, Olivier Bousquet and Ulrike von Luxburg, Eds., Berlin, 2004, Lecture Notes in Artificial Intelligence, LNAI 3176, pp. 146–168, Springer Verlag.
 [9] L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, pp. 193–218, 1985, 10.1007/BF01908075.
 [10] L.I. Kuncheva and S.T. Hadjitodorov, “Using diversity in cluster ensembles,” in Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, The Hague, Netherlands, Oct. 2004, vol. 2.
 [11] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: is a correction for chance necessary?,” in Proc. 26th Int. Conf. Machine Learning, New York, NY, USA, 2009, ICML ’09, pp. 1073–1080, ACM.
 [12] P. Luo, H. Xiong, G. Zhan, J. Wu, and Z. Shi, “Informationtheoretic distance measures for clustering validation: Generalization and normalization,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1249 –1262, Sep. 2009.
 [13] C. A. Sugar and G.M. James, “Finding the number of clusters in a dataset,” J. Am. Statistical Assoc., vol. 98, no. 463, pp. 750–763, 2003.
 [14] N. X. Vinh and J. Epps, “A novel approach for automatic number of clusters detection in microarray data based on consensus clustering,” in Proc. 19th IEEE Int. Conf. Bioinformat. Bioeng., Washington, DC, USA, 2009, BIBE ’09, pp. 84–91, IEEE Computer Society.
 [15] L. Biehl and D. Landgrebe, “Multispec: a tool for multispectral–hyperspectral image data analysis,” Comput. Geosci., vol. 28, pp. 1153–1159, Dec. 2002.
Comments
There are no comments yet.