Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data

04/06/2020
by   Feifei Wang, et al.
0

Distributed systems have been widely used in practice to accomplish data analysis tasks of huge scales. In this work, we target on the estimation problem of generalized linear models on a distributed system with nonrandomly distributed data. We develop a Pseudo-Newton-Raphson algorithm for efficient estimation. In this algorithm, we first obtain a pilot estimator based on a small random sample collected from different Workers. Then conduct one-step updating based on the computed derivatives of log-likelihood functions in each Worker at the pilot estimator. The final one-step estimator is proved to be statistically efficient as the global estimator, even with nonrandomly distributed data. In addition, it is computationally efficient, in terms of both communication cost and storage usage. Based on the one-step estimator, we also develop a likelihood ratio test for hypothesis testing. The theoretical properties of the one-step estimator and the corresponding likelihood ratio test are investigated. The finite sample performances are assessed through simulations. Finally, an American Airline dataset is analyzed on a Spark cluster for illustration purpose.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2020

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attenti...
research
06/09/2022

Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models

We study the problem of learning generalized linear models under adversa...
research
08/14/2019

Least Squares Approximation for a Distributed System

In this work we develop a distributed least squares approximation (DLSA)...
research
03/28/2019

Large Deviations of Linear Models with Regularly-Varying Tails: Asymptotics and Efficient Estimation

We analyze the Large Deviation Probability (LDP) of linear factor models...
research
09/29/2022

Mixed-effects location-scale model based on generalized hyperbolic distribution

Motivated by better modeling of intra-individual variability in longitud...
research
11/04/2015

A Distributed One-Step Estimator

Distributed statistical inference has recently attracted enormous attent...
research
09/09/2020

Analysis of Deviance for Hypothesis Testing in Generalized Partially Linear Models

In this study, we develop nonparametric analysis of deviance tools for g...

Please sign up or login with your details

Forgot password? Click here to reset