Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation

by   Federico Camerlenghi, et al.

There is a growing interest in the estimation of the number of unseen features, mostly driven by applications in biological sciences. A recent work brought out the upside and the downside of the popular stable-Beta process prior, and generalizations thereof, in Bayesian nonparametric inference for the unseen-features problem: i) the downside lies in the limited use of the sampling information in the posterior distributions, which depend on the observable sample only through the sample size; ii) the upside lies in the analytical tractability and interpretability of the posterior distributions, which are simple Poisson distributions whose parameters are simple to compute, and depend on the sample size and the prior's parameter. In this paper, we introduce and investigate an alternative nonparametric prior, referred to as the stable-Beta scaled process prior, which is the first prior that allows to enrich the posterior distribution of the number of unseen features, through the inclusion of the sampling information on the number of distinct features in the observable sample, while maintaining the same analytical tractability and interpretability as the stable-Beta process prior. Our prior leads to a negative Binomial posterior distribution, whose parameters depends on the sample size, the observed number of distinct features and the prior's parameter, providing estimates that are simple, linear in the sampling information and computationally efficient. We apply our approach to synthetic and real genetic data, showing that it outperforms parametric and nonparametric competitors in terms of estimation accuracy.



There are no comments yet.


page 1

page 2

page 3

page 4


Bayesian Nonparametric Inference for "Species-sampling" Problems

"Species-sampling" problems (SSPs) refer to a broad class of statistical...

Combinatorial clustering and the beta negative binomial process

We develop a Bayesian nonparametric approach to a general family of late...

Modelling for Poisson process intensities over irregular spatial domains

We develop nonparametric Bayesian modelling approaches for Poisson proce...

State estimation for aoristic models

Aoristic data can be described by a marked point process in time in whic...

A Bayesian Nonparametric Approach to Species Sampling Problems with Ordering

Species-sampling problems (SSPs) refer to a vast class of statistical pr...

Quantifying Observed Prior Impact

We distinguish two questions (i) how much information does the prior con...

Models for Genetic Diversity Generated by Negative Binomial Point Processes

We develop a model based on a generalised Poisson-Dirichlet distribution...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.