A nonparametric HMM for genetic imputation and coalescent inference

11/02/2016
by   Lloyd T. Elliott, et al.
0

Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data.

READ FULL TEXT

page 18

page 20

research
03/07/2012

Bayesian Nonparametric Hidden Semi-Markov Models

There is much interest in the Hierarchical Dirichlet Process Hidden Mark...
research
04/06/2020

Disentangled sticky hierarchical Dirichlet process hidden Markov model

The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) has bee...
research
04/19/2018

Bayesian nonparametric analysis of Kingman's coalescent

Kingman's coalescent is one of the most popular models in population gen...
research
07/16/2021

Bayesian Markov Renewal Mixed Models for Vocalization Syntax

Studying the neurological, genetic and evolutionary basis of human vocal...
research
08/31/2023

Haplotype frequency inference from pooled genetic data with a latent multinomial model

In genetic studies, haplotype data provide more refined information than...
research
12/31/2014

Detailed Derivations of Small-Variance Asymptotics for some Hierarchical Bayesian Nonparametric Models

In this note we provide detailed derivations of two versions of small-va...
research
08/13/2018

Locally-adaptive Bayesian nonparametric inference for phylodynamics

Phylodynamics is an area of population genetics that uses genetic sequen...

Please sign up or login with your details

Forgot password? Click here to reset