Paradoxes and resolutions for semiparametric fusion of individual and summary data

10/01/2022
by   Wenjie Hu, et al.
0

Suppose we have available individual data from an internal study and various types of summary statistics from relevant external studies. External summary statistics have been used as constraints on the internal data distribution, which promised to improve the statistical inference in the internal data; however, the additional use of external summary data may lead to paradoxical results: efficiency loss may occur if the uncertainty of summary statistics is not negligible and large estimation bias can emerge even if the bias of external summary statistics is small. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. This data-fused estimator is further regularized with adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and application to a Helicobacter pylori infection dataset are used to illustrate the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Regression inference for multiple populations by integrating summary-level data using stacked imputations

There is a growing need for flexible general frameworks that integrate i...
research
05/03/2018

REMI: Regression with marginal information and its application in genome-wide association studies

In this study, we consider the problem of variable selection and estimat...
research
10/20/2020

An ensemble meta-prediction framework to integrate multiple external models into a current study

Disease risk prediction models are used throughout clinical biomedicine....
research
03/23/2022

Treatment Effect Estimation with Efficient Data Aggregation

Data aggregation, also known as meta analysis, is widely used to synthes...
research
02/28/2022

Estimating Model Performance on External Samples from Their Limited Statistical Characteristics

Methods that address data shifts usually assume full access to multiple ...
research
06/28/2023

Mediation with External Summary Statistic Information (MESSI)

Environmental health studies are increasingly measuring endogenous omics...
research
03/06/2023

Integrative data analysis where partial covariates have complex non-linear effects by using summary information from a real-world data

A full parametric and linear specification may be insufficient to captur...

Please sign up or login with your details

Forgot password? Click here to reset