Faster Projective Clustering Approximation of Big Data

11/26/2020
by   Adiel Statman, et al.
0

In projective clustering we are given a set of n points in R^d and wish to cluster them to a set S of k linear subspaces in R^d according to some given distance function. An -coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible S the sum of these distances is approximated up to a factor of (1+). We suggest to reduce the size of existing coresets by suggesting the first O(log(m)) approximation for the case of m lines clustering in O(ndm) time, compared to the existing (m) solution. We then project the points on these lines and prove that for a sufficiently large m we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2019

k-Means Clustering of Lines for Big Data

The k-means for lines is a set of k centers (points) that minimizes the ...
research
03/09/2020

Sets Clustering

The input to the sets-k-means problem is an integer k≥ 1 and a set P={P_...
research
06/12/2019

Coresets for Gaussian Mixture Models of Any Shape

An ε-coreset for a given set D of n points, is usually a small weighted ...
research
03/06/2022

Coresets for Data Discretization and Sine Wave Fitting

In the monitoring problem, the input is an unbounded stream P=p_1,p_2⋯ o...
research
11/18/2020

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items...
research
11/25/2015

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools ...
research
09/17/2021

Level Sets or Gradient Lines? A Unifying View of Modal Clustering

The paper establishes a strong correspondence, if not an equivalence, be...

Please sign up or login with your details

Forgot password? Click here to reset