On Coresets for Regularized Loss Minimization

05/26/2019
by   Ryan Curtain, et al.
0

We design and mathematically analyze sampling-based algorithms for regularized loss minimization problems that are implementable in popular computational models for large data, in which the access to the data is restricted in some way. Our main result is that if the regularizer's effect does not become negligible as the norm of the hypothesis scales, and as the data scales, then a uniform sample of modest size is with high probability a coreset. In the case that the loss function is either logistic regression or soft-margin support vector machines, and the regularizer is one of the common recommended choices, this result implies that a uniform sample of size O(d √(n)) is with high probability a coreset of n points in ^d. We contrast this upper bound with two lower bounds. The first lower bound shows that our analysis of uniform sampling is tight; that is, a smaller uniform sample will likely not be a core set. The second lower bound shows that in some sense uniform sampling is close to optimal, as significantly smaller core sets do not generally exist.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

A Tight Lower Bound for Uniformly Stable Algorithms

Leveraging algorithmic stability to derive sharp generalization bounds i...
research
06/10/2018

On closeness to k-wise uniformity

A probability distribution over -1, 1^n is (eps, k)-wise uniform if, rou...
research
02/20/2022

Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings

We consider the following oblivious sketching problem: given ϵ∈ (0,1/3) ...
research
10/29/2021

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

We significantly improve the generalization bounds for VC classes by usi...
research
04/18/2023

Optimal PAC Bounds Without Uniform Convergence

In statistical learning theory, determining the sample complexity of rea...
research
09/19/2018

Improved Bounds for the Traveling Salesman Problem with Neighborhoods on Uniform Disks

Given a set of n disks of radius R in the Euclidean plane, the Traveling...
research
03/20/2020

Sample Complexity Result for Multi-category Classifiers of Bounded Variation

We control the probability of the uniform deviation between empirical an...

Please sign up or login with your details

Forgot password? Click here to reset