Hyperparameter Selection for Subsampling Bootstraps

06/02/2020 ∙ by Yingying Ma, et al. ∙ 0

Massive data analysis becomes increasingly prevalent, subsampling methods like BLB (Bag of Little Bootstraps) serves as powerful tools for assessing the quality of estimators for massive data. However, the performance of the subsampling methods are highly influenced by the selection of tuning parameters ( e.g., the subset size, number of resamples per subset ). In this article we develop a hyperparameter selection methodology, which can be used to select tuning parameters for subsampling methods. Specifically, by a careful theoretical analysis, we find an analytically simple and elegant relationship between the asymptotic efficiency of various subsampling estimators and their hyperparameters. This leads to an optimal choice of the hyperparameters. More specifically, for an arbitrarily specified hyperparameter set, we can improve it to be a new set of hyperparameters with no extra CPU time cost, but the resulting estimator's statistical efficiency can be much improved. Both simulation studies and real data analysis demonstrate the superior advantage of our method.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • (1)
  • Bickel et al. (1997) Bickel, P., Götze, F. and vanZwet, W. (1997), ‘Resampling fewer than n observations: Gains, losses, and remedies for losses’, Statistica Sinica 7(1).
  • Efron (1990) Efron, B. (1990), ‘More efficient bootstrap computations’, Journal of the American Statistical Association 85(409), 79–89.
  • Efron and Tibshirani (1994) Efron, B. and Tibshirani, R. J. (1994), An introduction to the bootstrap, CRC press.
  • Hall (1994) Hall, P. (1994), ‘Methodology and theory for the bootstrap’, Handbook of econometrics 4, 2341–2381.
  • Kleiner et al. (2014) Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M. I. (2014), ‘A scalable bootstrap for massive data’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(4), 795–816.
  • Ross (2017) Ross, S. M. (2017), Introductory statistics, Academic Press.
  • Sengupta et al. (2016) Sengupta, S., Volgushev, S. and Shao, X. (2016), ‘A subsampled double bootstrap for massive data’, Journal of the American Statistical Association 111(515), 1222–1232.
  • Van Der Vaart and Wellner (1996) Van Der Vaart, A. W. and Wellner, J. A. (1996), Weak convergence, in ‘Weak convergence and empirical processes’, Springer, pp. 16–28.