Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

10/09/2017
by   Dong Huang, et al.
0

The emergence of high-dimensional data in various areas has brought new challenges to the ensemble clustering research. To deal with the curse of dimensionality, considerable efforts in ensemble clustering have been made by incorporating various subspace-based techniques. Besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large number of diversified metrics, and furthermore, how to jointly exploit the multi-level diversity in the large number of metrics, subspaces, and clusters, in a unified framework. To tackle this problem, this paper proposes a novel multi-diversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can thereby be constructed. Further, an entropy-based criterion is adopted to explore the cluster-wise diversity in ensembles, based on which the consensus function is therefore presented. Experimental results on twenty high-dimensional datasets have confirmed the superiority of our approach over the state-of-the-art.

READ FULL TEXT

page 7

page 14

research
11/07/2018

Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

Subspace clustering aims to find groups of similar objects (clusters) th...
research
10/13/2019

Unsupervised Discovery of Sparse Multimodal Representations in High Dimensional Data

Extracting an understanding of the underlying system from high dimension...
research
08/05/2014

Determining the Number of Clusters via Iterative Consensus Clustering

We use a cluster ensemble to determine the number of clusters, k, in a g...
research
11/11/2017

CUR Decompositions, Similarity Matrices, and Subspace Clustering

A general framework for solving the subspace clustering problem using th...
research
03/11/2019

Similarity Learning via Kernel Preserving Embedding

Data similarity is a key concept in many data-driven applications. Many ...
research
11/18/2019

Subspace Shapes: Enhancing High-Dimensional Subspace Structures via Ambient Occlusion Shading

We test the hypothesis whether transforming a data matrix into a 3D shad...
research
01/25/2019

Subspace Clustering of Very Sparse High-Dimensional Data

In this paper we consider the problem of clustering collections of very ...

Please sign up or login with your details

Forgot password? Click here to reset