1 Introduction
Clustering problems arise from variety of applications, such as documents/web pages categorization [45]
, pattern recognition, biomedical analysis
[41], data compression via vector quantization
[36] and nearest neighbor search [19, 27]. In general, clustering analysis plays an indispensable role for understanding various phenomena across different contexts. Given a set of samples in ddimensional space , the task of clustering is to partition the data samples into subsets (called clusters) such that the samples in the same cluster are more homogeneous or closer to each other than those from different subsets (clusters).Traditionally, this issue has been modeled as a distortion minimization problem in kmeans [26]. The clustering procedure is organized into two steps. Firstly, samples are assigned to their closest centers. Secondly, the center of each cluster is updated with the data samples assigned to it. These two steps are repeated until the structure of clusters does not change in two consecutive iterations. This algorithm is simple and efficient whereas it is unable to discover clusters that are not in spherical shape.
Aiming to identify clusters of arbitrary shapes, a series of algorithms have been proposed in the last two decades. Most of these algorithms [11, 6, 33, 24, 2, 32, 31] are conceived from the perspective of density distribution. Intuitively, samples within each cluster are concentrated, and clusters are separated by sparse regions. Among these algorithms, samples are either iteratively assigned to [11, 2, 33] or shifted towards the density peaks [6]
. The clusters are therefore forged. However, heuristic rules
[11] or kernels [6] that are employed in the algorithms are unable to deal with various density distributions in practice. For this reason, the performance of these algorithms turns out to be unstable under different scenarios.Apart from the density based approaches, graph based algorithms are also able to discover clusters of arbitrary shapes. Representative methods are Chameleon [22] and orderconstrained transitive distance clustering [44]. In both of them, the connectivity between samples are carefully considered. According to the strategies presented in the papers, samples that are far away from each other are still clustered together as long as they are reachable to each other via a chain of closely connected bridging samples. Unfortunately, for both of them, a matrix that keeps pairwise distance between samples is required, which makes it inscalable to largescale clustering tasks.
As a consequence, despite numerous efforts have been taken in the last several decades, two major goals in clustering analysis, namely the ability of identifying clusters in arbitrary shapes and the scalability towards large scale and high dimensional data, are hardly achieved with one algorithm. In this paper, a simple but effective density based solution is proposed. The basic idea is inspired by the phenomenon of land erosion by water. The boundaries between clusters are drawn gradually by a boundary erosion process without any heuristic rules or kernels. This is particularly powerful in the case that boundaries between clusters are obscure at the first sight. In addition, the bitbybit boundary erosion produces a sequential order following which the potential clusters could be reconstructed with the guidance of an
rNN graph. The boundary erosion is feasible in any metric spaces as long as the density of data sample could be estimated. Furthermore, we also demonstrate that this algorithm achieves satisfactory performance and high efficiency on largescale and high dimensional clustering task with the support of efficient
kNN graph construction [8].2 Related Work
Since the proposal of kmeans, a variety of clustering algorithms have been proposed in the past three decades, which are in general categorized into seven groups [43]. Namely, they are agglomerative [23], divisive, partitioning [26], density based [11, 32, 2, 33, 15, 6], graph based [22]
and neural network based algorithms. In the literature, it is also seen the ensemble of several existing algorithms to boost the performance
[47, 46]. For the comprehensive surveys, readers are referred to [43, 17]. In this section, our focus will be on the review of several typical algorithms that are able to identify clusters in arbitrary shapes. Namely, the density based and graph based algorithms are discussed mainly.Although clustering problem has been modeled from different perspectives, people basically agree clusters are composed by samples that are relatively concentrated and are separated inbetween by relatively sparse regions. This perception is made without any specification about the distance measure on the input data. Density based algorithms are designed in general in line with this perception. Although different in details, the density based algorithms aims to discover groups of samples that are continuously connected.
In general, two steps are involved in the density based clustering process. Firstly, the local density surrounding each sample is estimated. Given sample and radius , density of sample is defined as the number of samples (s) that fall into ’s neighborhood of range (as shown in Eqn. 1).
(1) 
Function in Eqn. 1 returns the distance between and . In the second step, the clusters are forged basically in two different manners. For instance, in DBSCAN [11], cluster is formed by expanding it from “core points” (points hold high density) to points with low density. While in meanshift clustering [6], data samples are shifted iteratively from region of low density towards the density peaks. In DBSCAN, the expansion process could be very sensitive to the parameters. For instance, two heterogeneous clusters are falsely merged into one as the parameter changes slightly. While in meanshift, the shifting process could be easily stuck in a local if there is no obvious density peak. In the approach of clustering based on density peak (clusterDP) [33], data samples are directly assigned to the closest density peak in which each density peak is recognized as a cluster center. However, it faces similar problem as meanshift since it is hard to identify the cluster center when there is no obvious density peak. Another pitfall of this approach is that the number of peaks to be selected as the cluster center has to be set manually.
Recently, several clustering methods are proprosed to deal with high dimensional data, which are hardly separable in the original space. Methods are typically in two categories. One is built upon generative model [42, 21]
. Another is called sparse spectral clustering (SSC)
[10, 20]. SSC projects the highdimensional data into a sparse and lowdimensional representation first. The projected data are clustered via spectral clustering. Method in [7] in general follows similar framework to fulfill the clustering on highdimensional data.3 Clustering via Boundary Erosion
3.1 Motivation
For all the clustering algorithms discussed, the key step is to partition the data samples into groups. However, this is challenging particularly when boundaries between different clusters are not obvious. In this paper, a boundary erosion procedure is proposed, which addresses such kind of ambiguity. The idea is inspired by the natural phenomenon of land erosion by water. An illustration is given in Fig. 1. As shown in the figure, the erosion explicitizes the boundaries between clusters gradually as the water erodes the lands bit by bit. More importantly, a sequential order that indicates how the samples are assembled one after another as one cluster is established based on the order of the samples being eroded. With this sequential order, which is called as sequential boundary levels in the paper, the latent clusters could be easily reconstructed.
Notice that this idea is essentially different from watershed transform [34] in the sense that “water level” in our case does not rise up to bury the lands. Instead, it only erodes the lands. The lands on the outer part are eroded earlier than the inner lands instead of being buried at the same time, even they are on the same altitude.
3.2 Generating Boundary Levels via Erosion
To facilitate the boundary erosion, the density of one sample is estimated in a quite different manner from conventional algorithms. Namely, the density of each sample is dynamically estimated by gradually eroding its neighbors out. In order to do that, a dynamic array , in which samples along with their dynamic boundary density is maintained. The dynamic boundary density is given in Eqn. 2.
(2) 
As shown in Eqn. 2, the major difference from Eqn. 1 is that samples outside of are not counted during density estimation. At the beginning, all the samples are put into the dynamic array . For this reason, of each sample is initially the same as given in Eqn. 1.
The boundary erosion starts from deleting the sample with the lowest density (which corresponds to boundaries we are most cerntain) in . Each time, data sample holding the lowest density^{1}^{1}1It is possible that several samples hold the same density value will be removed at once. is removed out from . Due to the removal of sample , dynamic boundary density of its neighbors’ are influenced according to Eqn. 2. Therefore the density of ’s neighbors’ in are recalculated and updated. Thereafter, next sample which holds the lowest dynamic boundary density is identified and removed from . This process continues until
is empty. At each time of removal, a sequential boundary level is assigned to the samples of being removed. Samples that are removed at the same moment are assigned with the same level. This erosion process is summarized in Alg.
1.This erosion process invades inwards from boundaries as more and more samples have been eroded out. It is imaginable that samples that are initially not located on the cluster border are gradually exposed to the boundary erosion. The erosion continues until all the samples have been deleted from . In the erosion process, a sample living inside automatically ruptures as the start of new boundary when the current lowest dynamic boundary density equals to the density of this sample. An illustration of the erosion process is given by movie S1 in the supplementary materials.
In the above process, samples are removed out from sequentially according to their dynamic boundary density ranging from low to high. Based on the order of being removed from , sample is assigned with a boundary level , which reflects both the original density (Eqn. 1) and innerness of one sample as a cluster member. It is easy to see the sample lying outer holds lower boundary level than the one lying inner even they share the same . This is the essential difference between our approach and watershed transform [34].
Fig. 2(a) and Fig. 2(b) show the density estimation from Eqn. 1 and the boundary levels produced by the erosion process respectively. Accordingly, the 3D views of Fig. 2(a) and Fig. 2(b) are shown in Fig. 2(c) and Fig. 2(d). As shown in the figure, density estimated by Eqn. 1 is full of potholes. This is not surprising since it is not necessarily true that density from the border to the center increases smoothly. The cluster expansion undertaken afterwards is easily trapped in the potholes distributed along the density slope, which is a common issue latent in the traditional approaches. While this issue is avoided in dynamic density estimation, which considers both the density and innerness of one sample. A clear contrast is seen from their 3D views (in Fig. 2(c) and Fig. 2(d)), the boundary levels produced by Alg. 1 turn out to be smooth within each emerging cluster.
In Alg. 1, the first step calculates the rNN Graph which keeps nearest neighbors of each sample within range . is the only parameter used to set the scale of neighborhood of each sample. Entry keeps a list of nearest neighbors for sample in its neighborhood . The nearest neighbor list of each entry is sorted in ascending order according to the distances to the sample . This will facilitate the afterwards boundary erosion and labeling step (Alg. 2). The time complexity of building nearest neighbor lists for all the samples is quadratic to the scale of input samples, which is on the same level as DBSCAN [11, 15] and algorithm in [33]. The complexity of computing an approximate rNN graph could be decreased to [8], which will be discussed in detail in later section.
In order to support fast updating of for s induced by the removal of (Alg. 1, Line 1013) , a reverse nearest neighbor graph [8] is also maintained, in which keeps the data samples s that appears in their nearest neighbor lists. Essentially, reverse nearest neighbor graph is nothing more than a simple reorganization of .
Discussion
The advantages of boundary erosion are several folds. Firstly, the erosion takes place on the region of lowest density all the way. It therefore guarantees that the boundaries between clusters are drawn along the most likely regions. From this sense, the global optimality of this process is reached although it is a greedy strategy. Secondly, the bitbybit erosion allows the boundaries between clusters to be drawn gradually instead of at once, which is approriate when boundaries are not clear at the beginning. More importantly, the gradual erosion produces an ordered sequence that sorts samples from boundary to center, the reverse of which regularizes a roadmap for cluster expansion. In the whole process, no kernels or heuristic rules are introduced which avoids any unnecessary assumptions on the data distribution or metric spaces.
3.3 Label Propagation
Once the sequence of boundary levels are produced, the clustering process becomes natural and could be conveniently undertaken. It is basically a process of cluster expansion that starts from peaks of boundary levels (given in Alg. 2). The propagation starts from the data sample with the highest boundary level. Data sample is assigned with a new cluster label if none of its neighbors in are labeled. Otherwise, the sample is assigned with the same cluster label as its closest neighbor that has been labeled in the previous rounds. Likewise, the unlabeled samples are sequentially visited following the boundary levels from high to low. The process continues until all the samples are assigned with a label. In this process, the expansion of one cluster stops automatically when it reaches to the cluster boundary, where samples from other cluster hold higher boundary levels. An illustration of this propagation procedure is given by movie S2 in the supplementary materials.
As a summary, the proposed clustering process consists of three steps. Firstly, given the radius of neighborhood , a nearest neighbor graph is built. In the graph, a list of neighbors falling within range are kept for each sample. With the support of nearest neighbor graph, the sequential boundary levels are produced based on a boundary erosion process in the second step. Finally, clusters are produced by propagating cluster labels sequentially from samples of holding high boundary level to those of lower.
Similar as DBSCAN [11], meanshift [6] and clusterDP [33]
, our algorithm is able to identify clusters of arbitrary shapes as well as the outliers. However, the proposed approach is more attractive from several aspects of view. On one hand, unlike DBSCAN, no heuristic rules are introduced, which makes the clustering insensitive to extra the parameter settings. On the other hand, unlike meanshift or clusterDP in
[33], no kernel is adopted in the density estimation, which makes it feasible for various types of metric spaces. Moreover, unlike DBSCAN or clusterDP in [33], no cluster centers or cluster peaks are explicitly defined or specified. Instead, similar as affinity propagation [13], the cluster peaks and the clusters emerge gradually. Furthermore, in the algorithm, there is no specification on the distance measure. As a consequence, unlike kmeans [26], meanshift [6] or recent OCTD [44], it is feasible for various metric spaces as long as the density of samples could be estimated.The boundary erosion shares similar motivation as “borderpeeling” in [3], however they are essentially different in three major aspects. Firstly, no kernel is introduced in our approach. Secondly, all samples will be eroded out after the erosion process. In constrast, cores points are reserved for cluster expansion in “borderpeeling”. Finally, “borderpeeling” relies on DBSCAN to reconstruct the clusters, while clustering in boundary erosion is undertaken via label propagation with the guidance of a rNN graph.
In the above label propagation process, the same rNN graph is used as the boundary erosion process (Alg. 1). Alternatively, it is feasible to use a different rNN graph in the label propagation. In some cases, the density of a sample is very low. Such kind of samples are usually recorgnized as outliers by Alg. 1. However in certain scenario, we may expect that such kind of outliers are assigned to clusters that are the most close to them. To achieve that, the rNN graph that is supplied to the above expansion procedure is revised. In particular, is augmented to top nearest neighbors when the size of its nearest neighbors list is less than , where is another given parameter. In the experiment section, we are going to show this augmented propagation strategy is meaningful in certain circumstances.
According to our observation, boundary erosion fails only when samples from different clusters are mixed with each other. In this case, the assumption of this algorithm that clusters are separated by sparse regions actually breaks. However, it is possible to address this issue by recent subspace embedding [20]. The input high dimensional data are firstly projected to lower and separable space by DSCNetL2 [20]. Boundary erosion is therefore applied on the projected data, which will be illustrated in the experiment section.
4 Clustering in Largescale
As presented in Alg. 1, rNN graph is required as the prerequisite of the boundary erosion process. The time complexity of calculating rNN graph could be as high as . Moreover, in the worst case, the space complexity of keeping rNN graph is close to since one could not assume how many neighbors are located in range in advance. As a consequence, this algorithm becomes computationally inefficient in the situation that both and are large. To address this issue, an approximate solution is presented in this section.
4.1 Clustering with Approximate rNN Graph
As shown above, it is computationally expensive to calculate an exact rNN graph particularly in high dimensional and largescale cases. Many attempts have been made to seek for approximate solutions for this issue. Thanks to the progress made in recent years, with NNDescent algorithm presented in [8], it is possible to construct a kNN graph in high accuracy under the empirical complexity of . More attractively, there is no specification on distance measure in the algorithm, which is precisely in line with our clustering algorithm.
In our practice for largescale clustering task, the first step of Alg. 1 (i.e., Line 1) is modified. NNDescent [8] is called to produce an approximate kNN graph. The kNN list of each sample is further pruned according to the given parameter , which results in an approximate rNN graph . While the rest of clustering process remains unaltered. In the experiment section, the results on the largescale image clustering are illustrated.
4.2 Complexity Analysis
As presented in the previous sections, clustering via boundary erosion basically consists of three steps. In the first step, an rNN graph is built. The time complexity of building an exact rNN is . This is feasible for low dimensional and small scale cases. While for high dimensional and largescale task, NNDescent [8] is adopted for the approximate rNN graph construction, for which the construction complexity is around [8]. In the second step, the boundary erosion process operates on a dynamic array . Each time, at least one sample with the lowest boundary density is removed from the array. samples on average^{2}^{2}2This is the average number of samples fall into neighborhood . in the array, that are influenced by the removal of one sample, are updated. For efficiency, the dynamic array can be implemented with a heap. The removal repeats for at most times. As a result, the time complexity of this step is , which is on level. In the label propagation step, it is clear to see the complexity is only . Overall, the complexity of the clustering algorithm is if one expects exact solution. This is suitable for small scale tasks. While for largescale cases, the complexity is only with the support of NNDescent kNN graph construction, which is even more efficient than the conventional kmeans.
5 Experiments
In the following, the performance of the proposed boundary erosion (BE) is studied in comparison to several stateoftheart approaches on various evaluation benchmarks and tasks, such as synthetic data of different distributions, face image grouping, and clustering on biological as well as largescale image data. On large scale image clustering part, our algorithm is implemented with C++ and compiled with GCC 5.4,and conducted by single thread on a PC with 3.6GHz CPU and 16G memory setup.
5.1 Clustering on Synthetic Datasets
The first experiment is conducted on six synthetic datasets, all of which are 2D spatial data points. The data points are drawn from a probability distribution with nonspherical shapes. These datasets have been widely adopted to test the robustness of a clustering algorithm. Results produced by our algorithm are presented in Fig.
3. The valid range of parameter for each dataset that allows to reproduce the same results is accordingly attached. As shown in the figure, the proposed algorithm is able to identify all the clusters as well as the outliers of each case. In particular, satisfactory results are observed on challenging datasets S3, Path and Jain, on which existing approaches hardly produce descent results.Fig. 4 shows the results from the augmented propagation strategy. As shown from the figure, the results for datasets a, d and f are the same as previous experiment. While for datasets b, c and e, where the outliers are in presence, the augmented propagation assigns the outliers to the clusters that are the most close to them. This is meaningful in the case that one prefers to producing clusters without isolated outliers. The results in Fig. 4 are also quantiatively shown in Tab. 1 in terms of clustering accuracy [44]. It is compared to kmeans (KMS), spectural clustering (SC) and orderconstrained transitive distance clustering (OCTD). Perfect results are achieved on most of the datasets. Compared to results from the most representative methods (in [44]), BE achieves the best performance in all the cases.
In the following experiments, the results are produced by rNN graph without augmentation if it is not specified.
Mthd.  Agg  S3  flame  sparil  Path  Jain 

Kms  87.92  85.58  84.17  33.97  74.34  78.28 
SC  99.37  8.10  98.75  59.30  97.00  100.00 
OCTD  99.87  U.A.  100.00  100.00  96.66  100.00 
BE  100.00  95.80  100.00  100.00  100.00  100.00 
5.2 Clustering on Biological Data
Mthd.  score  Para. Settings 

DIANA  0.991  metric = , k = 26 
AGNES  0.987  Completelink, 
metric = ,k = 25  
HC  0.987  Completelink, 
k = 25  
TC  0.986  T = 48.868 
clusterDP  0.975  k = 25, 
dc = 258.645  
clusterONE  0.946  s = 1, d = 0.0 
MC  0.923  I = 2.196 
kMedoids  0.912  k = 37 
AP  0.910  dampfact = 0.845, 
preference = 80.827  
maxits = 5000,  
convits = 500  
DBSCAN  0.680  eps = 323.306, 
MinPts = 1  
SC  0.656  k = 11 
BE  0.998  r = 60 
In this part, our algorithm is tested on Brown dataset [41]
which are biological data. In this datset, the affinity matrix that keeps pairwise distances between DNA sequences are supplied. They are pairwise of blasted sequences of
232 proteins belonging to 29 groups of families. In this case, algorithms such as kmeans, are not feasible since they only work on space. In this study, BE is compared to DIANA [23], AGNES [23], Hierarchical Clustering (HC)
[38], Transitivity Clustering (TC) [40], clusterDP [33], clusterONE [30], Markov Clustering (MC) [9], kMedoids (PAM) [23], Affinity Propagation (AP) [13], DBSCAN [11] and Spectral Clustering (SC) [35].Twenty eight clusters are produced when by our algorithm. The score is 0.998, which is nearly perfect. This is also the best performance ever reported according to [41], as is shown in Table 2. While affinity propagation [13] only achieves 0.910, which is considerably worse than that of our algorithm. Our algorithm also outperforms clusterDP [33] by more than 2%.
5.3 Clustering on Highdimensional Data
Our algorithm is also tested on two face datasets namely Olivetti Face Database (ORL) [37] and Extended Yale B(EYaleB) [25], and two visual object image datasets, namely COIL20 [29] and COIL100 [28]. In these four datasets, there are 400, 2432, 1440 and 7400 images which are from 40, 38, 20 and 100 visual object groups respectively. On these four datasets, clustering algorithms are expected to identify images that are from the same object groups. Since the images are not directly separable by their pixel intensities (i.e., RGB), images are projected to lowdimensional feature space by DSCNetL2 [20]. Our algorithm (BE) is adopted in the final clustering stage. In the experiments, DSCNetL2 in combination with BE (denoted as DSC+BE) is compared to sparse spectral clustering (SSC) [10], DSCNetL2 in combination with spectural clustering (denoted as DSC+SC) and standard configuration of DSCNetL2 based clustering [20], in which a discriminative variant of spectral clustering is integrated. The clustering error rates [20] are shown in Table 3. As shown in the table, DSC+BE outperforms or is very close to the best results ever reported on these datasets.
Datasets  SSC  DSC  DSC+SC  DSC+BE 

ORL  32.50  14.00  15.16  12.23 
EYaleB  27.51  2.67  11.92  4.52 
COIL20  14.86  5.14  9.00  3.82 
COIL100  45.00  30.96  34.99  31.67 
Overall, superior performance is achieved in all experiments and on different categories of data by BE, which is essentially attributed to its extraordinary capability of identifying clusters in arbitrary shapes and the genericness of its model.
5.4 Image Clustering in Largescale
In this section, the effectiveness of the proposed clustering algorithm is verified on image clustering/linking task. A subset of YFCC100M [39] is adopted for evaluation. There are 1.1
million images in total. They are represented with deep features from HybridNet
[1], which are in 128 dimensions after PCA. In the clustering, NNDescent is called to build the approximate rNN graph for YFCC 1.1 million. In the experiments, top is fixed to 5 for YFCC 1.1 million. While is set to 0.70. The augmented propagation is adopted in the cluster expansion stage, which avoids isolating similar images that are under severe transfomations.It takes around 20 minutes for rNN graph construction and 1.2 minutes to produce 474,500 groups. In contrast, for the same task it would take more than 100 hours for kmeans. Most of the clusters produced by our algorithms are meaningful. There are 4,268 clusters that contain more than 3 images. Since no groundtruth available, only three sample groups are shown in Fig. 6(e). As shown in the figure, the algorithm performs reasonably well even with the support of approximate rNN graph. According to our observation, the small clusters (whose size is less than 10) are comprised by nearduplicate images, which is highly helpful for largescale image linking tasks.
6 Conclusion
Boundary erosion is a process of unspinning the natural structures of potential clusters. The erosion starts from boundaries of clusters, invading inwards, till it reaches to all the density peaks. It therefore produces a sequential order that following which the clusters are naturally reconstructed. In the whole process, only one parameter, namely the radius of neighborhood is involved. The density peaks, the corresponding clusters and the cluster boundaries emerge automatically. The effectiveness of the algorithm has been verified on various clustering tasks and in different scales. Due to its simplicity, genericness, speed efficiency as well as superior performance across various datasets, this algorithm will find its value in various science and engineering tasks.
7 Acknowledgments
This work is supported by National Natural Science Foundation of China under grants 61572408.
References

[1]
G. Amato, F. Falchi, C. Gennaro, and F. Rabitti.
YFCC100M hybridnet fc6 deep features for contentbased image retrieval.
In The 2016 ACM Workshop on Multimedia COMMONS, pages 11–18, 2016.  [2] M. Ankerst, M. M. Breunig, H.P. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering structure. In ACM Sigmod record, pages 49–60. ACM, 1999.
 [3] N. Bar, H. AverbuchElor, and D. CohenOr. Borderpeeling clustering. CoRR, abs/1612.04869, 2016.
 [4] H. Chang and D.Y. Yeung. Robust pathbased spectral clustering. Pattern Recognition, 41(1):191–203, January 2008.
 [5] H. Chang and D.Y. Yeung. Robust pathbased spectral clustering. Pattern Recognition, 41(1):191–203, 2008.
 [6] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, August 1995.

[7]
K. G. Dizaji, A. Herandi, C. Deng, W. Cai, and H. Huang.
Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization.
In2017 IEEE International Conference on Computer Vision (ICCV)
, pages 5747–5756. IEEE, 2017.  [8] W. Dong, C. Moses, and K. Li. Efficient knearest neighbor graph construction for generic similarity measures. In International Conference on World Wide Web, pages 577–586, Mar. 2011.
 [9] S. Dongen. A cluster algorithm for graphs. Technical report, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, 2000.
 [10] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 35(11):2765–2781, 2013.
 [11] M. Ester, H. peter Kriegel, J. Sander, and X. Xu. A densitybased algorithm for discovering clusters in large spatial databases with noise. In IEEE Transactions on Knowledge Discovery and Data Engineering, pages 226–231, 1996.
 [12] P. Fränti and O. Virmajoki. Iterative shrinking method for clustering problems. Pattern Recognition, 39(5):761–775, May 2006.
 [13] B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972–976, Feburary 2007.
 [14] L. Fu and E. Medico. FLAME, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinformatics, 8(3), January 2007.
 [15] J. Gan and Y. Tao. Dbscan revisited: Misclaim, unfixability, and approximation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 519–530, 2015.
 [16] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1), March 2007.
 [17] R. Greenlaw and S. Kantabutra. Survey of clustering: Algorithms and applications. International Journal of Information Retrieval and Resouces, 3(2):1–29, April 2013.
 [18] A. K. Jain and M. H. Law. Data clustering: A user’s dilemma. PReMI, 3776:1–10, 2005.
 [19] H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, Janurary 2011.
 [20] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid. Deep subspace clustering networks. In Advances in Neural Information Processing Systems, pages 23–32, 2017.

[21]
Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou.
Variational deep embedding: An unsupervised and generative approach
to clustering.
In
International Joint Conference on Artificial Intelligence
, 2017.  [22] G. Karypis, E.H. S. Han, and V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68–75, August 1999.
 [23] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
 [24] H.P. Kriegel, P. Kröger, J. Sander, and A. Zimek. Densitybased clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3):231–240, 2011.

[25]
K.C. Lee, J. Ho, and D. J. Kriegman.
Acquiring linear subspaces for face recognition under variable lighting.
IEEE Transactions on pattern analysis and machine intelligence, 27(5):684–698, 2005.  [26] S. P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28:129–137, March 1982.
 [27] M. Muja and D. G. Lowe. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36:2227–2240, 2014.
 [28] S. Nene, S. Nayar, and H. Murase. Columbia object image library (coil 100). Technical Report CUCS00596, 1988.
 [29] S. A. Nene, S. K. Nayar, H. Murase, et al. Columbia object image library (coil20). Technical report CUCS00596, 1996.
 [30] T. Nepusz, H. Yu, and A. Paccanaro. Detecting overlapping protein complexes in proteinprotein interaction networks. Nature methods, 9(5):471–472, 2012.
 [31] C. Otto, D. Wang, and A. Jain. Clustering millions of faces by identity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
 [32] T. Pei, A. Jasra, D. J. Hand, A.X. Zhu, and C. Zhou. DECODE: a new method for discovering clusters of different densities in spatial data. Data Mining and Knowledge Discovery, 18(3):337–369, 2009.
 [33] A. Rodriguez and A. Laio. Clustering by fast search and find of density peaks. Science, 344(6191):1492–1496, June 2014.
 [34] J. B. Roerdink and A. Meijster. The watershed transform: Definitions, algorithms and parallelization strategies. Journal Fundamenta Informaticae, 41(2):187–228, April 2000.
 [35] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, August 2000.
 [36] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, pages 1470–1477, October 2003.
 [37] F. S. Smaria and A. C. Harter. Parameterisation of a stochastic model for human face identification. In IEEE Workshop on Applications of Computer Vision, pages 138–142, 1994.
 [38] R. C. Team. R: A language and environment for statistical computing. Technical report, R Foundation for Statistical Computing., 2012.
 [39] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.J. Li. YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2):64–73, Feb. 2016.
 [40] T. Wittkop, D. Emig, S. Lange, S. Rahmann, M. Albrecht, J. H. Morris, S. Böcker, J. Stoye, and J. Baumbach. Partitioning biological data with transitivity clustering. Nature methods, 7(6):419–420, 2010.
 [41] C. Wiwie, J. Baumbach, and R. Röttger. Comparing the performance of biomedical clustering methods. Nature methods, 12(11):1033–1038, 2015.

[42]
J. Xie, R. Girshick, and A. Farhadi.
Unsupervised deep embedding for clustering analysis.
In
International Conference on Machine Learning
, pages 478–487, 2016.  [43] R. Xu and D. I. Wunsch. Survey of clustering algorithms. Transactions on Neural Networks, 16(3):645–678, May 2005.
 [44] Z. Yu, W. Liu, W. Liu, Y. Yang, M. Li, and B. V. K. V. Kumar. On orderconstrained transitive distance clustering. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pages 2293–2299. AAAI Press, 2016.
 [45] Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55:311–331, 2004.
 [46] L. Zheng, T. Li, and C. Ding. A framework for hierarchical ensemble clustering. ACM Transactions on Knowledge Discovery from Data, 9(2):9:1–9:23, September 2014.
 [47] Z.H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, 1st edition, 2012.
Comments
There are no comments yet.