Random-effects substitution models for phylogenetics via scalable gradient approximations

03/23/2023
by   Andrew F. Magee, et al.
0

Phylogenetic and discrete-trait evolutionary inference depend heavily on appropriate characterization of the underlying substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of both sampling-based (Bayesian inference via HMC) and maximization-based inference (MAP estimation) under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is more adequate than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. On a dataset of 28 taxa spanning the Metazoa, a random-effects amino acid substitution model finds evidence of notable departures from the current best-fit amino acid model in seconds. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2021

Bayesian inference for continuous-time hidden Markov models with an unknown number of states

We consider the modeling of data generated by a latent continuous-time M...
research
06/27/2018

A Robustified posterior for Bayesian inference on a large number of parallel effects

Many modern experiments, such as microarray gene expression and genome-w...
research
08/02/2018

Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes

We present an approximate Bayesian inference approach for estimating the...
research
09/28/2015

Unbiased Bayesian Inference for Population Markov Jump Processes via Random Truncations

We consider continuous time Markovian processes where populations of ind...
research
06/19/2018

Large-Scale Stochastic Sampling from the Probability Simplex

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popul...
research
05/29/2019

Gradients do grow on trees: a linear-time O( N )-dimensional gradient for statistical phylogenetics

Calculation of the log-likelihood stands as the computational bottleneck...
research
08/30/2023

Scalable Estimation of Probit Models with Crossed Random Effects

Crossed random effects structures arise in many scientific contexts. The...

Please sign up or login with your details

Forgot password? Click here to reset