Regression inference for multiple populations by integrating summary-level data using stacked imputations

06/12/2021
by   Tian Gu, et al.
0

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. This paper proposes an imputation-based methodology where the goal is to fit an outcome regression model with all available variables in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each population, uses stacked multiple imputation to create a long dataset with complete covariate information, and finally analyzes the imputed data with weighted regression. This flexible and unified approach attains the following four objectives: (i) incorporating supplementary information from a broad class of externally fitted predictive models or established risk calculators which could be based on parametric regression or machine learning methods, as long as the external model can generate outcome values given covariates; (ii) improving statistical efficiency of the estimated coefficients in the internal study; (iii) improving predictions by utilizing even partial information available from models that uses a subset of the full set of covariates used in the internal study; and (iv) providing valid statistical inference for the external population with potentially different covariate effects from the internal population. Applications include prostate cancer risk prediction models using novel biomarkers that are measured only in the internal study.

READ FULL TEXT

page 29

page 33

page 34

page 35

page 37

research
10/20/2020

An ensemble meta-prediction framework to integrate multiple external models into a current study

Disease risk prediction models are used throughout clinical biomedicine....
research
02/22/2023

Incorporating External Risk Information with the Cox Model under Population Heterogeneity: Applications to Trans-Ancestry Polygenic Hazard Scores

Polygenic hazard score (PHS) models designed for European ancestry (EUR)...
research
10/01/2022

Paradoxes and resolutions for semiparametric fusion of individual and summary data

Suppose we have available individual data from an internal study and var...
research
04/25/2023

Multi-study factor regression model: an application in nutritional epidemiology

Diet is a risk factor for many diseases. In nutritional epidemiology, st...
research
01/04/2018

Cluster-weighted latent class modeling

Usually in Latent Class Analysis (LCA), external predictors are taken to...
research
03/06/2023

Integrative data analysis where partial covariates have complex non-linear effects by using summary information from a real-world data

A full parametric and linear specification may be insufficient to captur...
research
02/28/2022

Estimating Model Performance on External Samples from Their Limited Statistical Characteristics

Methods that address data shifts usually assume full access to multiple ...

Please sign up or login with your details

Forgot password? Click here to reset