Determinantal consensus clustering

02/07/2021
by   Serge Vicente, et al.
0

Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. These two properties of DPP are key to make DPPs achieve good performance with small ensembles. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperform classical algorithms such as k-medoids and k-means consensus clusterings which are based on uniform random sampling of center points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Large-data determinantal clustering

Determinantal consensus clustering is a promising and attractive alterna...
research
07/07/2023

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Consensus clustering (or clustering aggregation) inputs k partitions of ...
research
04/03/2023

DivClust: Controlling Diversity in Deep Clustering

Clustering has been a major research topic in the field of machine learn...
research
06/24/2020

Ensemble Kernel Methods, Implicit Regularization and Determinental Point Processes

By using the framework of Determinantal Point Processes (DPPs), some the...
research
06/28/2019

Consensus Monte Carlo for Random Subsets using Shared Anchors

We present a consensus Monte Carlo algorithm that scales existing Bayesi...
research
09/07/2019

On the clustering of correlated random variables

In this work, the possibility of clustering correlated random variables ...
research
01/22/2015

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...

Please sign up or login with your details

Forgot password? Click here to reset