Fused mean structure learning in data integration with dependence

10/05/2022
by   Emily C. Hector, et al.
0

Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity between studies and outcome vectors. To determine the validity of jointly analyzing these data sources, we must learn which of these data sources share mean model parameters. We propose a new model fusion approach that delivers improved flexibility, statistical performance and computational speed over existing methods. Our proposed approach specifies a quadratic inference function within each data source and fuses mean model parameter vectors in their entirety based on a new formulation of a pairwise fusion penalty. We establish theoretical properties of our estimator and propose an asymptotically equivalent weighted oracle meta-estimator that is more computationally efficient. Simulations and application to the ABIDE neuroimaging consortium highlight the flexibility of the proposed approach. An R package is provided for ease of implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Joint integrative analysis of multiple data sources with correlated vector outcomes

We propose a distributed quadratic inference function framework to joint...
research
08/28/2021

A robust fusion-extraction procedure with summary statistics in the presence of biased sources

Information from various data sources is increasingly available nowadays...
research
08/28/2023

Data fusion using weakly aligned sources

We introduce a new data fusion method that utilizes multiple data source...
research
12/26/2019

Communication-Efficient Integrative Regression in High-Dimensions

We consider the task of meta-analysis in high-dimensional settings in wh...
research
10/12/2021

Identification and estimation of nonignorable missing outcome mean without identifying the full data distribution

We consider the problem of making inference about the population outcome...
research
11/30/2020

Data Fusion for Joining Income and Consumption Information Using Different Donor-Recipient Distance Metrics

Data fusion describes the method of combining data from (at least) two i...
research
07/16/2020

Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

This paper presents a unified framework for supervised learning and infe...

Please sign up or login with your details

Forgot password? Click here to reset