Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

01/17/2022
by   Luca Insolia, et al.
12

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in presence of data contamination. We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion. Our proposal outperforms state-of-the-art robust, model-based methods in our numerical simulations and real-world applications related to color quantization for image analysis, human mobility patterns based on GPS data, biomedical images of diabetic retinopathy, and functional data across weather stations.

READ FULL TEXT

page 8

page 10

research
12/23/2017

Merging K-means with hierarchical clustering for identifying general-shaped groups

Clustering partitions a dataset such that observations placed together i...
research
02/15/2021

DAC: Deep Autoencoder-based Clustering, a General Deep Learning Framework of Representation Learning

Clustering performs an essential role in many real world applications, s...
research
10/30/2010

Fast Color Quantization Using Weighted Sort-Means Clustering

Color quantization is an important operation with numerous applications ...
research
04/29/2012

Dissimilarity Clustering by Hierarchical Multi-Level Refinement

We introduce in this paper a new way of optimizing the natural extension...
research
12/16/2020

Interpretable Image Clustering via Diffeomorphism-Aware K-Means

We design an interpretable clustering algorithm aware of the nonlinear s...
research
01/02/2011

Improving the Performance of K-Means for Color Quantization

Color quantization is an important operation with many applications in g...
research
01/22/2018

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...

Please sign up or login with your details

Forgot password? Click here to reset