Introduction to Coresets: Approximated Mean

11/04/2021
by   Alaa Maalouf, et al.
0

A strong coreset for the mean queries of a set P in ℝ^d is a small weighted subset C⊆ P, which provably approximates its sum of squared distances to any center (point) x∈ℝ^d. A weak coreset is (also) a small weighted subset C of P, whose mean approximates the mean of P. While computing the mean of P can be easily computed in linear time, its coreset can be used to solve harder constrained version, and is in the heart of generalizations such as coresets for k-means clustering. In this paper, we survey most of the mean coreset construction techniques, and suggest a unified analysis methodology for providing and explaining classical and modern results including step-by-step proofs. In particular, we collected folklore and scattered related results, some of which are not formally stated elsewhere. Throughout this survey, we present, explain, and prove a set of techniques, reductions, and algorithms very widespread and crucial in this field. However, when put to use in the (relatively simple) mean problem, such techniques are much simpler to grasp. The survey may help guide new researchers unfamiliar with the field, and introduce them to the very basic foundations of coresets, through a simple, yet fundamental, problem. Experts in this area might appreciate the unified analysis flow, and the comparison table for existing results. Finally, to encourage and help practitioners and software engineers, we provide full open source code for all presented algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Sets Clustering

The input to the sets-k-means problem is an integer k≥ 1 and a set P={P_...
research
10/19/2019

Introduction to Coresets: Accurate Coresets

A coreset (or core-set) of an input set is its small summation, such tha...
research
06/11/2019

Fast and Accurate Least-Mean-Squares Solvers

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regressi...
research
06/08/2020

An Algorithmic Introduction to Clustering

This paper tries to present a more unified view of clustering, by identi...
research
05/19/2023

AutoCoreset: An Automatic Practical Coreset Construction Framework

A coreset is a tiny weighted subset of an input set, that closely resemb...
research
11/30/2015

Coresets for Kinematic Data: From Theorems to Real-Time Systems

A coreset (or core-set) of a dataset is its semantic compression with re...
research
03/06/2022

Coresets for Data Discretization and Sine Wave Fitting

In the monitoring problem, the input is an unbounded stream P=p_1,p_2⋯ o...

Please sign up or login with your details

Forgot password? Click here to reset