Discrepancy, Coresets, and Sketches in Machine Learning

06/11/2019
by   Zohar Karnin, et al.
0

This paper defines the notion of class discrepancy for families of functions. It shows that low discrepancy classes admit small offline and streaming coresets. We provide general techniques for bounding the class discrepancy of machine learning problems. As corollaries of the general technique we bound the discrepancy (and therefore coreset complexity) of logistic regression, sigmoid activation loss, matrix covariance, kernel density and any analytic function of the dot product or the squared distance. Our results prove the existence of epsilon-approximation O(sqrtd/epsilon) sized coresets for the above problems. This resolves the long-standing open problem regarding the coreset complexity of Gaussian kernel density estimation. We provide two more related but independent results. First, an exponential improvement of the widely used merge-and-reduce trick which gives improved streaming sketches for any low discrepancy problem. Second, an extremely simple deterministic algorithm for finding low discrepancy sequences (and therefore coresets) for any positive semi-definite kernel. This paper establishes some explicit connections between class discrepancy, coreset complexity, learnability, and streaming algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2020

New Nearly-Optimal Coreset for Kernel Density Estimation

Given a point set P⊂ℝ^d, kernel density estimation for Gaussian kernel i...
research
09/15/2023

The discrepancy of greater-than

The discrepancy of the n × n greater-than matrix is shown to be π/2 ln n...
research
11/21/2021

Low-Discrepancy Points via Energetic Variational Inference

In this paper, we propose a deterministic variational inference approach...
research
02/18/2021

Domain Adaptive Learning Based on Sample-Dependent and Learnable Kernels

Reproducing Kernel Hilbert Space (RKHS) is the common mathematical platf...
research
05/01/2023

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Functions of the ratio of the densities p/q are widely used in machine l...
research
04/05/2014

Density Estimation via Adaptive Partition and Discrepancy Control

Given iid samples from some unknown continuous density on hyper-rectangl...
research
09/23/2015

Density Estimation via Discrepancy

Given i.i.d samples from some unknown continuous density on hyper-rectan...

Please sign up or login with your details

Forgot password? Click here to reset