Adaptive Sketches for Robust Regression with Importance Sampling

07/16/2022
by   Sepideh Mahabadi, et al.
0

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance sampling can significantly decrease the variance but is usually difficult to implement because computing the sampling probabilities requires additional passes over the data, in which case standard gradient descent (GD) could be used instead. In this paper, we introduce an algorithm that approximately samples T gradients of dimension d from nearly the optimal importance sampling distribution for a robust regression problem over n rows. Thus our algorithm effectively runs T steps of SGD with importance sampling while using sublinear space and just making a single pass over the data. Our techniques also extend to performing importance sampling for second-order optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2014

Stochastic Optimization with Importance Sampling

Uniform sampling of training data has been commonly used in traditional ...
research
06/30/2015

Online Learning to Sample

Stochastic Gradient Descent (SGD) is one of the most widely used techniq...
research
01/16/2013

Adaptive Importance Sampling for Estimation in Structured Domains

Sampling is an important tool for estimating large, complex sums and int...
research
02/12/2015

Weighted SGD for ℓ_p Regression with Randomized Preconditioning

In recent years, stochastic gradient descent (SGD) methods and randomize...
research
11/20/2015

Variance Reduction in SGD by Distributed Importance Sampling

Humans are able to accelerate their learning by selecting training mater...
research
03/23/2021

Stochastic Reweighted Gradient Descent

Despite the strong theoretical guarantees that variance-reduced finite-s...
research
09/12/2018

Efficient uniform generation of random derangements with the expected distribution of cycle lengths

We show how to generate random derangements with the expected distributi...

Please sign up or login with your details

Forgot password? Click here to reset