A New Coreset Framework for Clustering

04/13/2021
by   Vincent Cohen-Addad, et al.
0

Given a metric space, the (k,z)-clustering problem consists of finding k centers such that the sum of the of distances raised to the power z of every point to its closest center is minimized. This encapsulates the famous k-median (z=1) and k-means (z=2) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as coresets, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases.

READ FULL TEXT
research
02/25/2022

Towards Optimal Lower Bounds for k-median and k-means Coresets

Given a set of points in a metric space, the (k,z)-clustering problem co...
research
04/29/2019

Accurate MapReduce Algorithms for k-median and k-means in General Metric Spaces

Center-based clustering is a fundamental primitive for data analysis and...
research
03/02/2023

Coresets for Clustering in Geometric Intersection Graphs

Designing coresets–small-space sketches of the data preserving cost of t...
research
11/15/2022

Improved Coresets for Euclidean k-Means

Given a set of n points in d dimensions, the Euclidean k-means problem (...
research
02/16/2022

Distributed k-Means with Outliers in General Metrics

Center-based clustering is a pivotal primitive for unsupervised learning...
research
06/03/2013

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two po...
research
10/27/2021

Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

Recent advances in center-based clustering continue to improve upon the ...

Please sign up or login with your details

Forgot password? Click here to reset