Many-core algorithms for high-dimensional gradients on phylogenetic trees

03/08/2023
by   Karthik Gangavarapu, et al.
0

The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes 𝒪(N^2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in 𝒪(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples: carnivores, dengue and yeast, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0, an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Gradients do grow on trees: a linear-time O( N )-dimensional gradient for statistical phylogenetics

Calculation of the log-likelihood stands as the computational bottleneck...
research
02/03/2022

m-CUBES An efficient and portable implementation of multi-dimensional integration for gpus

The task of multi-dimensional numerical integration is frequently encoun...
research
06/11/2019

Relaxed random walks at scale

Relaxed random walk (RRW) models of trait evolution introduce branch-spe...
research
10/25/2021

Scalable Bayesian divergence time estimation with ratio transformations

Divergence time estimation is crucial to provide temporal signals for da...
research
07/29/2020

Accelerating Multi-attribute Unsupervised Seismic Facies Analysis With RAPIDS

Classification of seismic facies is done by clustering seismic data samp...
research
05/15/2021

Shrinkage-based random local clocks with scalable inference

Local clock models propose that the rate of molecular evolution is const...
research
05/11/2019

Massive parallelization boosts big Bayesian multidimensional scaling

Big Bayes is the computationally intensive co-application of big data an...

Please sign up or login with your details

Forgot password? Click here to reset