The Tajima heterochronous n-coalescent: inference from heterochronously sampled molecular data

04/14/2020
by   Lorenzo Cappello, et al.
0

The observed sequence variation at a locus informs about the evolutionary history of the sample and past population size dynamics. The standard Kingman coalescent model on genealogies - timed trees that represent the ancestry of the sample - is used in a generative model of molecular sequence variation to infer evolutionary parameters. However, the state space of Kingman's genealogies grows superexponentially with sample size n, making inference computationally unfeasible already for small n. We introduce a new coalescent model called Tajima heterochronous n-coalescent with a substantially smaller cardinality of the genealogical space. This process allows to analyze samples collected at different times, a situation that in applications is both met (e.g. ancient DNA and RNA from rapidly evolving pathogens like viruses) and statistically desirable (variance reduction and parameter identifiability). We propose an algorithm to calculate the likelihood efficiently and present a Bayesian nonparametric procedure to infer the population size trajectory. We provide a new MCMC sampler to explore the space of Tajima's genealogies and model parameters. We compare our procedure with state-of-the-art methodologies in simulations and applications. We use our method to re-examine the scientific question of how Beringian bison went extinct analyzing modern and ancient molecular sequences of bison in North America, and to reconstruct population size trajectory of SARS-CoV-2 from viral sequences collected in France and Germany.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2019

Estimating effective population size changes from preferentially sampled genetic sequences

Coalescent theory combined with statistical modeling allows us to estima...
research
12/28/2020

Deep Evolutionary Learning for Molecular Design

In this paper, we propose a deep evolutionary learning (DEL) process tha...
research
02/14/2019

Sequential importance sampling for multi-resolution Kingman-Tajima coalescent counting

Statistical inference of evolutionary parameters from molecular sequence...
research
11/15/2017

Exact Limits of Inference in Coalescent Models

Recovery of population size history from sequence data and testing of hy...
research
05/20/2023

Power and sample size calculations for testing the ratio of reproductive values in phylogenetic samples

The quality of the inferences we make from pathogen sequence data is det...
research
12/13/2017

Geometry of the sample frequency spectrum and the perils of demographic inference

The sample frequency spectrum (SFS), which describes the distribution of...
research
03/25/2020

A micro-macro Markov chain Monte Carlo method for molecular dynamics using reaction coordinate proposals II: indirect reconstruction

We introduce a new micro-macro Markov chain Monte Carlo method (mM-MCMC)...

Please sign up or login with your details

Forgot password? Click here to reset