ε-Coresets for Clustering (with Outliers) in Doubling Metrics

04/07/2018
by   Lingxiao Huang, et al.
0

We study the problem of constructing ε-coresets for the (k, z)-clustering problem in a doubling metric M(X, d). An ε-coreset is a weighted subset S⊆ X with weight function w : S →R_≥ 0, such that for any k-subset C ∈ [X]^k, it holds that ∑_x ∈ Sw(x) · d^z(x, C)∈ (1 ±ε) ·∑_x ∈ Xd^z(x, C). We present an efficient algorithm that constructs an ε-coreset for the (k, z)-clustering problem in M(X, d), where the size of the coreset only depends on the parameters k, z, ε and the doubling dimension ddim(M). To the best of our knowledge, this is the first efficient ε-coreset construction of size independent of |X| for general clustering problems in doubling metrics. To this end, we establish the first relation between the doubling dimension of M(X, d) and the shattering dimension (or VC-dimension) of the range space induced by the distance d. Such a relation was not known before, since one can easily construct instances in which neither one can be bounded by (some function of) the other. Surprisingly, we show that if we allow a small (1±ϵ)-distortion of the distance function d, and consider the notion of τ-error probabilistic shattering dimension, we can prove an upper bound of O( ddim(M)·(1/ε) +1/τ ) for the probabilistic shattering dimension for even weighted doubling metrics. We believe this new relation is of independent interest and may find other applications. We also study the robust coresets and centroid sets in doubling metrics. Our robust coreset construction leads to new results in clustering and property testing, and the centroid sets can be used to accelerate the local search algorithms for clustering problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2019

Coresets for Clustering in Graphs of Bounded Treewidth

We initiate the study of coresets for clustering in graph metrics, i.e.,...
research
10/06/2018

Local Boxicity, Local Dimension, and Maximum Degree

In this short note we focus on two recently introduced parameters in the...
research
05/24/2020

The Weisfeiler-Leman dimension of distance-hereditary graphs

A graph is said to be distance-hereditary if the distance function in ev...
research
03/02/2017

Small Superposition Dimension and Active Set Construction for Multivariate Integration Under Modest Error Demand

Constructing active sets is a key part of the Multivariate Decomposition...
research
03/08/2022

New Coresets for Projective Clustering and Applications

(j,k)-projective clustering is the natural generalization of the family ...
research
10/20/2021

Transductive Robust Learning Guarantees

We study the problem of adversarially robust learning in the transductiv...
research
08/11/2023

Simplified and Improved Bounds on the VC-Dimension for Elastic Distance Measures

We study range spaces, where the ground set consists of polygonal curves...

Please sign up or login with your details

Forgot password? Click here to reset