Missing data interpolation in integrative multi-cohort analysis with disparate covariate information

11/01/2022
by   Ekaterina Smirnova, et al.
0

Integrative analysis of datasets generated by multiple cohorts is a widely-used approach for increasing sample size, precision of population estimators, and generalizability of analysis results in epidemiological studies. However, often each individual cohort dataset does not have all variables of interest for an integrative analysis collected as a part of an original study. Such cohort-level missingness poses methodological challenges to the integrative analysis since missing variables have traditionally: (1) been removed from the data for complete case analysis; or (2) been completed by missing data interpolation techniques using data with the same covariate distribution from other studies. In most integrative-analysis studies, neither approach is optimal as it leads to either loosing the majority of study covariates or challenges in specifying the cohorts following the same distributions. We propose a novel approach to identify the studies with same distributions that could be used for completing the cohort-level missing information. Our methodology relies on (1) identifying sub-groups of cohorts with similar covariate distributions using cohort identity random forest prediction models followed by clustering; and then (2) applying a recursive pairwise distribution test for high dimensional data to these sub-groups. Extensive simulation studies show that cohorts with the same distribution are correctly grouped together in almost all simulation settings. Our methods' application to two ECHO-wide Cohort Studies reveals that the cohorts grouped together reflect the similarities in study design. The methods are implemented in R software package relate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2019

Multiple imputation using dimension reduction techniques for high-dimensional data

Missing data present challenges in data analysis. Naive analyses such as...
research
05/02/2022

Systematically Missing Data in Causally Interpretable Meta-Analysis

Causally interpretable meta-analysis combines information from a collect...
research
04/12/2019

Conformal Prediction Under Covariate Shift

We extend conformal prediction methodology beyond the case of exchangeab...
research
05/15/2022

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data...
research
03/09/2022

A-Optimal Split Questionnaire Designs for Multivariate Continuous Variables

A split questionnaire design (SQD), an alternative to full questionnaire...
research
03/27/2023

A joint Bayesian framework for missing data and measurement error using integrated nested Laplace approximations

Measurement error (ME) and missing values in covariates are often unavoi...
research
07/17/2023

A Covariate-Adjusted Homogeneity Test with Application to Facial Recognition Accuracy Assessment

Ordinal scores occur commonly in medical imaging studies and in black-bo...

Please sign up or login with your details

Forgot password? Click here to reset