Sets Clustering

03/09/2020
by   Ibrahim Jubran, et al.
0

The input to the sets-k-means problem is an integer k≥ 1 and a set P={P_1,...,P_n} of sets in R^d. The goal is to compute a set C of k centers (points) in R^d that minimizes the sum ∑_P∈Pmin_p∈ P, c∈ C p-c ^2 of squared distances to these sets. An ε-core-set for this problem is a weighted subset of P that approximates this sum up to 1±ε factor, for every set C of k centers in R^d. We prove that such a core-set of O(log^2n) sets always exists, and can be computed in O(nlogn) time, for every input P and every fixed d,k≥ 1 and ε∈ (0,1). The result easily generalized for any metric space, distances to the power of z>0, and M-estimators that handle outliers. Applying an inefficient but optimal algorithm on this coreset allows us to obtain the first PTAS (1+ε approximation) for the sets-k-means problem that takes time near linear in n. This is the first result even for sets-mean on the plane (k=1, d=2). Open source code and experimental results for document classification and facility locations are also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2019

k-Means Clustering of Lines for Big Data

The k-means for lines is a set of k centers (points) that minimizes the ...
research
03/06/2022

Coresets for Data Discretization and Sine Wave Fitting

In the monitoring problem, the input is an unbounded stream P=p_1,p_2⋯ o...
research
11/04/2021

Introduction to Coresets: Approximated Mean

A strong coreset for the mean queries of a set P in ℝ^d is a small weigh...
research
11/26/2020

Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in R^d and wish ...
research
11/18/2020

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items...
research
02/21/2018

Coresets For Monotonic Functions with Applications to Deep Learning

Coreset (or core-set) in this paper is a small weighted subset Q of the ...
research
07/23/2018

Minimizing Sum of Non-Convex but Piecewise log-Lipschitz Functions using Coresets

We suggest a new optimization technique for minimizing the sum ∑_i=1^n f...

Please sign up or login with your details

Forgot password? Click here to reset