Online Lewis Weight Sampling

07/17/2022
by   David P. Woodruff, et al.
0

The seminal work of Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community, yielding fast row sampling algorithms for approximating d-dimensional subspaces of ℓ_p up to (1+ϵ) error. Several works have extended this important primitive to other settings, including the online coreset, sliding window, and adversarial streaming models. However, these results are only for p∈{1,2}, and results for p=1 require a suboptimal Õ(d^2/ϵ^2) samples. In this work, we design the first nearly optimal ℓ_p subspace embeddings for all p∈(0,∞) in the online coreset, sliding window, and the adversarial streaming models. In all three models, our algorithms store Õ(d^1(p/2)/ϵ^2) rows. This answers a substantial generalization of the main open question of [BDMMUWZ2020], and gives the first results for all p∉{1,2}. Towards our result, we give the first analysis of "one-shot” Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity Õ(d^p/2/ϵ^2) for p>2. Previously, this scheme was only known to have sample complexity Õ(d^p/2/ϵ^5), whereas Õ(d^p/2/ϵ^2) is known if a more sophisticated recursive sampling is used. The recursive sampling cannot be implemented online, thus necessitating an analysis of one-shot Lewis weight sampling. Our analysis uses a novel connection to online numerical linear algebra. As an application, we obtain the first one-pass streaming coreset algorithms for (1+ϵ) approximation of important generalized linear models, such as logistic regression and p-probit regression. Our upper bounds are parameterized by a complexity parameter μ introduced by [MSSW2018], and we show the first lower bounds showing that a linear dependence on μ is necessary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2023

Streaming and Query Once Space Complexity of Longest Increasing Subsequence

Longest Increasing Subsequence (LIS) is a fundamental problem in combina...
research
06/15/2020

Nearly Linear Row Sampling Algorithm for Quantile Regression

We give a row sampling algorithm for the quantile loss function with sam...
research
11/09/2021

Active Sampling for Linear Regression Beyond the ℓ_2 Norm

We study active sampling algorithms for linear regression, which aim to ...
research
10/03/2020

Spiking Neural Networks Through the Lens of Streaming Algorithms

We initiate the study of biological neural networks from the perspective...
research
04/10/2022

Closing the Gap between Weighted and Unweighted Matching in the Sliding Window Model

We consider the Maximum-weight Matching (MWM) problem in the streaming s...
research
06/11/2019

Communication and Memory Efficient Testing of Discrete Distributions

We study distribution testing with communication and memory constraints ...
research
05/11/2021

Targeting Makes Sample Efficiency in Auction Design

This paper introduces the targeted sampling model in optimal auction des...

Please sign up or login with your details

Forgot password? Click here to reset