k-Means Clustering of Lines for Big Data

03/16/2019
by   Yair Marom, et al.
0

The k-means for lines is a set of k centers (points) that minimizes the sum of squared distances to a given set of n lines in R^d. This is a straightforward generalization of the k-means problem where the input is a set of n points. Related problems minimize sum of (non-squared) distances, other norms, m-estimators or ignore the t farthest points (outliers) from the k centers. We suggest the first provable PTAS algorithms for these problems that compute (1+epsilon)-approximation in time O(n (n)/epsilon^2) for any given epsilon ∈ (0, 1), and constant integers k, d, t ≥ 1, including support for streaming and distributed input. Experimental results on Amazon EC2 cloud and open source are also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Sets Clustering

The input to the sets-k-means problem is an integer k≥ 1 and a set P={P_...
research
11/26/2020

Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in R^d and wish ...
research
11/18/2020

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items...
research
07/23/2018

Minimizing Sum of Non-Convex but Piecewise log-Lipschitz Functions using Coresets

We suggest a new optimization technique for minimizing the sum ∑_i=1^n f...
research
03/16/2022

Oversampling is a necessity for RBF-collocation method of lines

We study a radial basis functions least-squares (RBF-LS), a.k.a. kernel-...
research
02/27/2019

Provable Approximations for Constrained ℓ_p Regression

The ℓ_p linear regression problem is to minimize f(x)=||Ax-b||_p over x∈...
research
04/29/2019

Accurate MapReduce Algorithms for k-median and k-means in General Metric Spaces

Center-based clustering is a fundamental primitive for data analysis and...

Please sign up or login with your details

Forgot password? Click here to reset