Multivariate phase-type theory for the site frequency spectrum

01/13/2021
by   Asger Hobolth, et al.
0

Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package phasty, and R code for the reproduction of our results is available as an accompanying vignette.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Phase-type distributions in population genetics

Probability modelling for DNA sequence evolution is well established and...
research
05/06/2019

Maximum likelihood (ML) estimators for scaled mutation parameters with a strand symmetric mutation model in equilibrium

With the multiallelic parent-independent mutation-drift model, the equil...
research
07/04/2022

Joint discrete and continuous matrix distribution modelling

In this paper we introduce a bivariate distribution on ℝ_+×ℕ arising fro...
research
06/01/2021

Statistical tests based on Rényi entropy estimation

Entropy and its various generalizations are important in many fields, in...
research
10/26/2020

The Frequency Spectrum and Geometry of the Hal Saflieni Hypogeum Appear Tuned

The Hal Saflieni Hypogeum is a unique subterranean Maltese Neolithic san...
research
06/15/2019

Linear regression with stationary errors : the R package slm

This paper introduces the R package slm which stands for Stationary Line...
research
07/12/2021

Faster Math Functions, Soundly

Standard library implementations of functions like sin and exp optimize ...

Please sign up or login with your details

Forgot password? Click here to reset