Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

05/23/2019
by   HaiYing Wang, et al.
0

The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be processed in the available memory of a machine, it is infeasible to implement the IBOSS procedure. This paper develops a divide-and-conquer IBOSS approach to solving this problem, in which the full data set is divided into smaller partitions to be loaded into the memory and then subsets of data are selected from each partitions using the IBOSS algorithm. We derive both finite sample properties and asymptotic properties of the resulting estimator. Asymptotic results show that if the full data set is partitioned randomly and the number of partitions is not very large, then the resultant estimator has the same estimation efficiency as the original IBOSS estimator. We also carry out numerical experiments to evaluate the empirical performance of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2022

Unweighted estimation based on optimal sample under measurement constraints

To tackle massive data, subsampling is a practical approach to select th...
research
06/22/2022

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning ...
research
05/21/2020

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Nonuniform subsampling methods are effective to reduce computational bur...
research
02/08/2018

More Efficient Estimation for Logistic Regression with Optimal Subsample

Facing large amounts of data, subsampling is a practical technique to ex...
research
12/22/2018

Distributed sequential method for analyzing massive data

To analyse a very large data set containing lengthy variables, we adopt ...
research
07/26/2022

Functional Regression with Intensively Measured Longitudinal Outcomes: A New Lens through Data Partitioning

Modern longitudinal data from wearable devices consist of biological sig...
research
04/24/2019

Efficient Simulation Budget Allocation for Subset Selection Using Regression Metamodels

This research considers the ranking and selection (R&S) problem of selec...

Please sign up or login with your details

Forgot password? Click here to reset