A race-DC in Big Data

11/27/2019
by   Lu Lin, et al.
0

The strategy of divide-and-combine (DC) has been widely used in the area of big data. Bias-correction is crucial in the DC procedure for validly aggregating the locally biased estimators, especial for the case when the number of batches of data is large. This paper establishes a race-DC through a residual-adjustment composition estimate (race). The race-DC applies to various types of biased estimators, which include but are not limited to Lasso estimator, Ridge estimator and principal component estimator in linear regression, and least squares estimator in nonlinear regression. The resulting global estimator is strictly unbiased under linear model, and is acceleratingly bias-reduced in nonlinear model, and can achieve the theoretical optimality, for the case when the number of batches of data is large. Moreover, the race-DC is computationally simple because it is a least squares estimator in a pro forma linear regression. Detailed simulation studies demonstrate that the resulting global estimator is significantly bias-corrected, and the behavior is comparable with the oracle estimation and is much better than the competitors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2019

A Global Bias-Correction DC Method for Biased Estimation under Memory Constraint

This paper introduces a global bias-correction divide-and-conquer (GBC-D...
research
01/29/2022

Global Bias-Corrected Divide-and-Conquer by Quantile-Matched Composite for General Nonparametric Regressions

The issues of bias-correction and robustness are crucial in the strategy...
research
10/16/2021

Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials

In an influential critique of empirical practice, Freedman (2008) showed...
research
12/29/2021

A comment and erratum on "Excess Optimism: How Biased is the Apparent Error of an Estimator Tuned by SURE?"

We identify and correct an error in the paper "Excess Optimism: How Bias...
research
06/04/2019

How many variables should be entered in a principal component regression equation?

We study least squares linear regression over N uncorrelated Gaussian fe...
research
08/07/2017

Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network

Distributed learning is an effective way to analyze big data. In distrib...
research
07/14/2020

Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

In sports, individuals and teams are typically interested in final ranki...

Please sign up or login with your details

Forgot password? Click here to reset