Local Search Yields a PTAS for k-Means in Doubling Metrics

03/29/2016
by   Zachary Friggstad, et al.
0

The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-means: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie R^d for some d≥ 2. k-means and the first algorithms for it were introduced in the 1950's. Since then, hundreds of papers have studied this problem and many algorithms have been proposed for it. The most commonly used algorithm is known as Lloyd-Forgy, which is also referred to as "the" k-means algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9+ϵ for any fixed ϵ>0 for k-means in Euclidean space. Finding an algorithm with a better approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-means in R^d for any fixed d. More precisely, for any error parameter ϵ>0, the local search algorithm that considers swaps of up to ρ=d^O(d)·ϵ^-O(d/ϵ) centres at a time finds a solution using exactly k centres whose cost is at most a (1+ϵ)-factor greater than the optimum. Finally, we provide the first demonstration that local search yields a PTAS for the uncapacitated facility location problem and k-median with non-uniform opening costs in doubling metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

An Improved Local Search Algorithm for k-Median

We present a new local-search algorithm for the k-median clustering prob...
research
08/24/2017

A Fast Approximation Scheme for Low-Dimensional k-Means

We consider the popular k-means problem in d-dimensional Euclidean space...
research
02/27/2019

Reconciliation k-median: Clustering with Non-Polarized Representatives

We propose a new variant of the k-median problem, where the objective fu...
research
07/14/2018

Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means

We investigate the complexity of solving stable or perturbation-resilien...
research
04/25/2018

HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering

Minimum sum-of-squares clustering (MSSC) is a widely used clustering mod...
research
03/27/2020

Crystal Structure Prediction via Oblivious Local Search

We study Crystal Structure Prediction, one of the major problems in comp...
research
02/18/2020

k-means++: few more steps yield constant approximation

The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a sta...

Please sign up or login with your details

Forgot password? Click here to reset