Distributed sequential method for analyzing massive data

12/22/2018
by   Zhanfeng Wang, et al.
0

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly integrate their results while maintaining the desired statistical properties. Additionally, using a criterion from the statistical experiment design, we adopt an adaptive sample selection, together with an adaptive shrinkage estimation method, to simultaneously accelerate the estimation procedure and identify the effective variables. We confirm the cogency of our methods through theoretical justifications and numerical results derived from synthesized data sets. We then apply the proposed method to three real data sets, including those pertaining to appliance energy use and particulate matter concentration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2019

Sequential estimation for GEE with adaptive variables and subject selection

Modeling correlated or highly stratified multiple-response data becomes ...
research
02/22/2021

Divide-and-conquer methods for big data analysis

In the context of big data analysis, the divide-and-conquer methodology ...
research
12/29/2020

Scalable Multivariate Histograms

We give a distributed variant of an adaptive histogram estimation proced...
research
05/23/2019

Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

The information-based optimal subdata selection (IBOSS) is a computation...
research
02/01/2018

Greedy Active Learning Algorithm for Logistic Regression Models

We study a logistic model-based active learning procedure for binary cla...
research
05/07/2022

Determination of class-specific variables in nonparametric multiple-class classification

As technology advanced, collecting data via automatic collection devices...
research
11/22/2021

On Data-centric Myths

The community lacks theory-informed guidelines for building good data se...

Please sign up or login with your details

Forgot password? Click here to reset