Semiparametric Gaussian Copula Regression modeling for Mixed Data Types (SGCRM)

05/13/2022
by   Debangan Dey, et al.
0

Many clinical and epidemiological studies encode collected participant-level information via a collection of continuous, truncated, ordinal, and binary variables. To gain novel insights in understanding complex interactions between collected variables, there is a critical need for the development of flexible frameworks for joint modeling of mixed data types variables. We propose Semiparametric Gaussian Copula Regression modeling (SGCRM) that allows to model a joint dependence structure between observed continuous, truncated, ordinal, and binary variables and to construct conditional models with these four data types as outcomes with a guarantee that derived conditional models are mutually consistent. Semiparametric Gaussian Copula (SGC) mechanism assumes that observed SGC variables are generated by - i) monotonically transforming marginals of latent multivariate normal random variable and ii) dichotimizing/truncating these transformed marginals. SGCRM estimates the correlation matrix of the latent normal variables through an inversion of "bridges" between Kendall's Tau rank correlations of observed mixed data type variables and latent Gaussian correlations. We derive a novel bridging result to deal with a general ordinal variable. In addition to the previously established asymptotic consistency, we establish asymptotic normality of the latent correlation estimators. We also establish the asymptotic normality of SGCRM regression estimators and provide a computationally efficient way to calculate asymptotic covariances. We propose computationally efficient methods to predict SGC latent variables and to do imputation of missing data. Using National Health and Nutrition Examination Survey (NHANES), we illustrate SGCRM and compare it with the traditional conditional regression models including truncated Gaussian regression, ordinal probit, and probit models.

READ FULL TEXT

page 24

page 29

research
08/20/2021

latentcor: An R Package for estimating latent correlations from mixed data types

We present `latentcor`, an R package for correlation estimation from dat...
research
09/17/2018

Rank-based approach for estimating correlations in mixed ordinal data

High-dimensional mixed data as a combination of both continuous and ordi...
research
10/31/2019

Connecting population-level AUC and latent scale-invariant R^2 via Semiparametric Gaussian Copula and rank correlations

Area Under the Curve (AUC) is arguably the most popular measure of class...
research
08/05/2020

A flexible and efficient algorithm for joint imputation of general data

Imputation of data with general structures (e.g., data with continuous, ...
research
07/17/2019

Patient-specific Conditional Joint Models of Shape, Image Features and Clinical Indicators

We propose and demonstrate a joint model of anatomical shapes, image fea...
research
05/06/2021

Longitudinal modeling of age-dependent latent traits with generalized additive latent and mixed models

We present generalized additive latent and mixed models (GALAMMs) for an...
research
11/21/2022

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data

Graphical models are an important tool in exploring relationships betwee...

Please sign up or login with your details

Forgot password? Click here to reset