Bayesian indicator variable selection of multivariate response with heterogeneous sparsity for multi-trait fine mapping
Variable selection has been played a critical role in contemporary statistics and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but they mainly target at only one response. As more data being collected nowadays, it is common to obtain and analyze multiple correlated responses from the same study. Running separate regression for each response ignores their correlation thus multivariate analysis is recommended. Existing multivariate methods select variables related to all responses without considering the possible heterogeneous sparsity of different responses, i.e. some features may only predict a subset of responses but not the rest. In this paper, we develop a novel Bayesian indicator variable selection method in multivariate regression model with a large number of grouped predictors targeting at multiple correlated responses with possibly heterogeneous sparsity patterns. The method is motivated by the multi-trait fine mapping problem in genetics to identify the variants that are causal to multiple related traits. Our new method is featured by its selection at individual level, group level as well as specific to each response. In addition, we propose a new concept of subset posterior inclusion probability for inference to prioritize predictors that target at subset(s) of responses. Extensive simulations with varying sparsity and heterogeneity levels and dimension have shown the advantage of our method in variable selection and prediction performance as compared to existing general Bayesian multivariate variable selection methods and Bayesian fine mapping methods. We also applied our method to a real data example in imaging genetics and identified important causal variants for brain white matter structural change in different regions.
READ FULL TEXT