More Efficient Estimation for Logistic Regression with Optimal Subsample

02/08/2018
by   HaiYing Wang, et al.
0

Facing large amounts of data, subsampling is a practical technique to extract useful information. For this purpose, Wang et al. (2017) developed an Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for logistic regression that samples more informative data points with higher probabilities. However, the original OSMAC estimator use inverse of optimal subsampling probabilities as weights in the likelihood function. This reduces contributions of more informative data points and the resultant estimator may lose efficiency. In this paper, we propose a more efficient estimator based on OSMAC subsample without weighting the likelihood function. Both asymptotic results and numerical results show that the new estimator is more efficient. In addition, our focus in this paper is inference for the true parameter, while Wang et al. (2017) focuses on approximating the full data estimator. We also develop a new algorithm based on Poisson sampling, which does not require to approximate the optimal subsampling probabilities all at once. This is computationally advantageous when available random-access memory is not enough to hold the full data. Interestingly, asymptotic distributions also show that Poisson sampling produces more efficient estimator if the sampling rate, the ratio of the subsample size to the full data sample size, does not converge to zero. We also obtain the unconditional asymptotic distribution for the estimator based on Poisson sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2022

Unweighted estimation based on optimal sample under measurement constraints

To tackle massive data, subsampling is a practical approach to select th...
research
07/06/2020

Surprise sampling: improving and extending the local case-control sampling

Fithian and Hastie (2014) proposed a new sampling scheme called local ca...
research
05/17/2022

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

Faced with massive data, subsampling is a commonly used technique to imp...
research
06/18/2018

Optimal Subsampling Algorithms for Big Data Generalized Linear Models

To fast approximate the maximum likelihood estimator with massive data, ...
research
05/23/2019

Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

The information-based optimal subdata selection (IBOSS) is a computation...
research
08/06/2019

Semiparametric Wavelet-based JPEG IV Estimator for endogenously truncated data

A new and an enriched JPEG algorithm is provided for identifying redunda...
research
02/22/2023

A Note on "Towards Efficient Data Valuation Based on the Shapley Value”

The Shapley value (SV) has emerged as a promising method for data valuat...

Please sign up or login with your details

Forgot password? Click here to reset