Introduction to Core-sets: an Updated Survey

11/18/2020
by   Dan Feldman, et al.
0

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of k centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data", where every possible query has approximately the same answer on both data sets. Generic techniques enable efficient coreset maintenance of streaming, distributed and dynamic data. Traditional algorithms can then be applied on these coresets to maintain the approximated optimal solutions. The challenge is to design coresets with provable tradeoff between their size and approximation error. This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2019

k-Means Clustering of Lines for Big Data

The k-means for lines is a set of k centers (points) that minimizes the ...
research
03/09/2020

Sets Clustering

The input to the sets-k-means problem is an integer k≥ 1 and a set P={P_...
research
04/29/2019

Accurate MapReduce Algorithms for k-median and k-means in General Metric Spaces

Center-based clustering is a fundamental primitive for data analysis and...
research
11/26/2020

Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in R^d and wish ...
research
11/30/2015

Coresets for Kinematic Data: From Theorems to Real-Time Systems

A coreset (or core-set) of a dataset is its semantic compression with re...
research
08/30/2017

Improvements on the k-center problem for uncertain data

In real applications, there are situations where we need to model some p...
research
07/23/2018

Minimizing Sum of Non-Convex but Piecewise log-Lipschitz Functions using Coresets

We suggest a new optimization technique for minimizing the sum ∑_i=1^n f...

Please sign up or login with your details

Forgot password? Click here to reset