Distributed Statistical Inference for Massive Data

05/29/2018
by   Song Xi Chen, et al.
0

This paper considers distributed statistical inference for general symmetric statistics context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2020

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive...
research
10/06/2021

Hypothesis Testing of One-Sample Mean Vector in Distributed Frameworks

Distributed frameworks are widely used to handle massive data, where sam...
research
03/30/2018

On inference validity of weighted U-statistics under data heterogeneity

Motivated by challenges on studying a new correlation measurement being ...
research
05/13/2022

An Information-theoretic Method for Collaborative Distributed Learning with Limited Communication

In this paper, we study the information transmission problem under the d...
research
09/30/2013

On statistics, computation and scalability

How should statistical procedures be designed so as to be scalable compu...
research
04/13/2023

A review of distributed statistical inference

The rapid emergence of massive datasets in various fields poses a seriou...
research
12/20/2019

Heterogeneity-aware and communication-efficient distributed statistical inference

In multicenter research, individual-level data are often protected again...

Please sign up or login with your details

Forgot password? Click here to reset