Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

07/16/2020
by   Emily C. Hector, et al.
0

This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a fully distributed and parallelized computational scheme. Modelling, computational and theoretical challenges related to high-dimensional correlated outcomes are overcome by dividing data at both outcome and subject levels, estimating the parameter of interest from blocks of data using a broad class of supervised learning procedures, and combining block estimators in a closed-form meta-estimator asymptotically equivalent to estimates obtained by Hansen (1982)'s generalized method of moments (GMM) that does not require the entire data to be reloaded on a common server. We provide rigorous theoretical justifications for the use of distributed estimators with correlated outcomes by studying the asymptotic behaviour of the combined estimator with fixed and diverging number of data divisions. Simulations illustrate the finite sample performance of the proposed method, and we provide an R package for ease of implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis

This paper is motivated by a regression analysis of electroencephalograp...
research
11/30/2020

Joint integrative analysis of multiple data sources with correlated vector outcomes

We propose a distributed quadratic inference function framework to joint...
research
11/11/2020

Learning a high-dimensional classification rule using auxiliary outcomes

Correlated outcomes are common in many practical problems. Based on a de...
research
07/24/2022

Statistical inference for high-dimensional generalized estimating equations

We propose a novel inference procedure for linear combinations of high-d...
research
09/07/2020

Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data

This paper proposes a doubly robust two-stage semiparametric difference-...
research
01/17/2020

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attenti...
research
10/05/2022

Fused mean structure learning in data integration with dependence

Motivated by image-on-scalar regression with data aggregated across mult...

Please sign up or login with your details

Forgot password? Click here to reset