Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

03/02/2018
by   Rong Zhu, et al.
0

In modern data analysis, random sampling is an efficient and widely-used strategy to overcome the computational difficulties brought by large sample size. In previous studies, researchers conducted random sampling which is according to the input data but independent on the response variable, however the response variable may also be informative for sampling. In this paper we propose an adaptive sampling called the gradient-based sampling which is dependent on both the input data and the output for fast solving of least-square (LS) problems. We draw the data points by random sampling from the full data according to their gradient values. This sampling is computationally saving, since the running time of computing the sampling probabilities is reduced to O(nd) where n is the full sample size and d is the dimension of the input. Theoretically, we establish an error bound analysis of the general importance sampling with respect to LS solution from full data. The result establishes an improved performance of the use of our gradient- based sampling. Synthetic and real data sets are used to empirically argue that the gradient-based sampling has an obvious advantage over existing sampling methods from two aspects of statistical efficiency and computational saving.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2020

Bayesian Update with Importance Sampling: Required Sample Size

Importance sampling is used to approximate Bayes' rule in many computati...
research
09/11/2018

Rethinking the Effective Sample Size

The effective sample size (ESS) is widely used in sample-based simulatio...
research
10/11/2018

Analysis of Noisy Evolutionary Optimization When Sampling Fails

In noisy evolutionary optimization, sampling is a common strategy to dea...
research
12/12/2021

Markov subsampling based Huber Criterion

Subsampling is an important technique to tackle the computational challe...
research
09/07/2015

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data anal...
research
03/06/2021

Block-Randomized Gradient Descent Methods with Importance Sampling for CP Tensor Decomposition

This work considers the problem of computing the CANDECOMP/PARAFAC (CP) ...
research
05/19/2023

A Foray into Parallel Optimisation Algorithms for High Dimension Low Sample Space Generalized Distance Weighted Discrimination problems

In many modern data sets, High dimension low sample size (HDLSS) data is...

Please sign up or login with your details

Forgot password? Click here to reset