Large-scale inference of correlation among mixed-type biological traits with Phylogenetic multivariate probit models

by   Zhenyu Zhang, et al.

Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient flexibility and computational efficiency to incorporate multiple continuous and discrete traits as data size increases. To accomplish this, we jointly model mixed-type traits by assuming latent parameters for binary outcome dimensions at the tips of an unknown tree informed by molecular sequences. This gives rise to a phylogenetic multivariate probit model. With large sample sizes, posterior computation under this model is problematic, as it requires repeated sampling from a high-dimensional truncated normal distribution. Current best practices employ multiple-try rejection sampling that suffers from slow-mixing and a computational cost that scales quadratically in sample size. We develop a new inference approach that exploits 1) the bouncy particle sampler based on piecewise deterministic Markov processes and 2) novel dynamic programming that reduces the cost of likelihood and gradient evaluations to linear in sample size. In an application with 535 HIV viruses and 24 traits that necessitates sampling from a 12,840-dimensional truncated normal, our method makes it possible to estimate the across-trait correlation and detect factors that affect the pathogen's capacity to cause disease. This inference framework is also applicable to a broader class of covariance structures beyond comparative biology.


page 1

page 2

page 3

page 4


Hamiltonian zigzag accelerates large-scale inference for conditional dependencies between complex biological traits

Inferring dependencies between complex biological traits while accountin...

Inferring phenotypic trait evolution on large trees with many incomplete measurements

Comparative biologists are often interested in inferring covariation bet...

A copula-based set-variant association test for bivariate continuous or mixed phenotypes

In genome wide association studies (GWAS), researchers are often dealing...

CRP-Tree: A phylogenetic association test for binary traits

An important problem in evolutionary genomics is to investigate whether ...

Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance

This paper develops mixed-normal approximations for probabilities that v...

Fast Multivariate Probit Estimation via a Two-Stage Composite Likelihood

The multivariate probit is popular for modeling correlated binary data, ...

Statistical Species Identification

Identification of taxa can be significantly assisted by statistical clas...

Please sign up or login with your details

Forgot password? Click here to reset