Heterogeneity-aware and communication-efficient distributed statistical inference

12/20/2019
by   Rui Duan, et al.
0

In multicenter research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed. The existing distributed algorithms usually assume the data are homogeneously distributed across sites. This assumption ignores the important fact that the data collected at different sites may come from various sub-populations and environments, which can lead to heterogeneity in the distribution of the data. Ignoring the heterogeneity may lead to erroneous statistical inference. In this paper, we propose distributed algorithms which account for the heterogeneous distributions by allowing site-specific nuisance parameters. The proposed methods extend the surrogate likelihood approach to the heterogeneous setting by applying a novel density ratio tilting method to the efficient score function. The proposed algorithms maintain same communication cost as the existing communication-efficient algorithms. We establish the non-asymptotic risk bound of the proposed distributed estimator and its limiting distribution in the two-index asymptotic setting. In addition, we show that the asymptotic variance of the estimator attains the Cramér-Rao lower bound. Finally, the simulation study shows the proposed algorithms reach higher estimation accuracy compared to several existing methods.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset