Improved Coresets for Euclidean k-Means

11/15/2022
by   Vincent Cohen-Addad, et al.
0

Given a set of n points in d dimensions, the Euclidean k-means problem (resp. the Euclidean k-median problem) consists of finding k centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a (1±ε) factor. The current state of the art coreset size is Õ(min(k^2·ε^-2,k·ε^-4)) for Euclidean k-means and Õ(min(k^2·ε^-2,k·ε^-3)) for Euclidean k-median. The best known lower bound for both problems is Ω(k ε^-2). In this paper, we improve the upper bounds Õ(min(k^3/2·ε^-2,k·ε^-4)) for k-means and Õ(min(k^4/3·ε^-2,k·ε^-3)) for k-median. In particular, ours is the first provable bound that breaks through the k^2 barrier while retaining an optimal dependency on ε.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Towards Optimal Lower Bounds for k-median and k-means Coresets

Given a set of points in a metric space, the (k,z)-clustering problem co...
research
11/09/2020

Hardness of Approximation of Euclidean k-Median

The Euclidean k-median problem is defined in the following manner: given...
research
02/27/2023

On Coresets for Clustering in Small Dimensional Euclidean Spaces

We consider the problem of constructing small coresets for k-Median in E...
research
09/03/2020

Optimal Load Balanced Demand Distribution under Overload Penalties

Input to the Load Balanced Demand Distribution (LBDD) consists of the fo...
research
04/13/2021

A New Coreset Framework for Clustering

Given a metric space, the (k,z)-clustering problem consists of finding k...
research
10/31/2012

On the Relation Between the Common Labelling and the Median Graph

In structural pattern recognition, given a set of graphs, the computatio...
research
06/29/2014

An Efficient Hybrid CS and K-Means Algorithm for the Capacitated PMedian Problem

Capacitated p-median problem (CPMP) is an important variation of facilit...

Please sign up or login with your details

Forgot password? Click here to reset