Pursuing Sources of Heterogeneity in Modeling Clustered Population

03/10/2020
by   Yan Li, et al.
0

Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, i.e., to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. As the name suggests, the problem is formulated under an effects-model parameterization, in which the cluster labels are missing and the effect of each predictor on the outcome is decomposed to a common effect term and a set of cluster-specific terms. A constrained sparse estimation of these effects leads to the identification of both the variables with common effects and those with heterogeneous effects. We propose an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.

READ FULL TEXT
research
08/10/2022

Heterogeneity assessment in causal data fusion problems

Previous works have formalized the conditions under which findings from ...
research
09/28/2021

Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution

Children's health studies support an association between maternal enviro...
research
10/19/2022

Functional clustering methods for binary longitudinal data with temporal heterogeneity

In the analysis of binary longitudinal data, it is of interest to model ...
research
05/17/2019

Merging versus Ensembling in Multi-Study Machine Learning: Theoretical Insight from Random Effects

A critical decision point when training predictors using multiple studie...
research
06/16/2022

A new tool for clustered survival data and multiple treatments: Estimation of treatment effect heterogeneity and variable selection

A new tool, riAFT-BART, was recently developed to draw causal inferences...
research
09/25/2021

Disentangling the effects of traits with shared clustered genetic predictors using multivariable Mendelian randomization

When genetic variants in a gene cluster are associated with a disease ou...
research
04/01/2021

Identifying brain hierarchical structures associated with Alzheimer's disease using a regularized regression method with tree predictors

Brain segmentation at different levels is generally represented as hiera...

Please sign up or login with your details

Forgot password? Click here to reset