Bayesian Nonparametric Inference for "Species-sampling" Problems

03/11/2022
by   Cecilia Balocchi, et al.
0

"Species-sampling" problems (SSPs) refer to a broad class of statistical problems that, given an observable sample from an unknown population of individuals belonging to some species, call for estimating features of the unknown species composition of additional unobservable samples from the same population. Among SSPs, the problems of estimating coverage probabilities, the number of unseen species and coverages of prevalences have emerged over the past three decades for being the objects of numerous studies, both in methods and applications, mostly within the field of biological sciences but also in machine learning, electrical engineering and information theory. In this paper, we present an overview of Bayesian nonparametric (BNP) inference for such three SSPs under the popular Pitman–Yor process (PYP) prior: i) we introduce each SSP in the classical (frequentist) nonparametric framework, and review its posterior analyses in the BNP framework; ii) we improve on computation and interpretability of existing posterior distributions, typically expressed through complicated combinatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. The critical question of estimating the discount and scale parameters of the PYP prior is also considered and investigated, establishing a general property of Bayesian consistency with respect to the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be always estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in posterior inference. We conclude our work by discussing other SSPs, and presenting some emerging generalizations of SSPs, mostly in biological sciences, which deal with "feature-sampling" problems, multiple populations of individuals sharing species and classes of Markov chains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

A Bayesian Nonparametric Approach to Species Sampling Problems with Ordering

Species-sampling problems (SSPs) refer to a vast class of statistical pr...
research
06/29/2021

Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation

There is a growing interest in the estimation of the number of unseen fe...
research
04/19/2018

Bayesian nonparametric analysis of Kingman's coalescent

Kingman's coalescent is one of the most popular models in population gen...
research
02/11/2021

The Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative type

The Pitman-Yor process is a nonparametric species sampling prior with nu...
research
04/07/2021

Near-optimal estimation of the unseen under regularly varying tail populations

Given n samples from a population of individuals belonging to different ...
research
07/12/2017

Estimating the unseen from multiple populations

Given samples from a distribution, how many new elements should we expec...
research
02/27/2019

A Good-Turing estimator for feature allocation models

Feature allocation models generalize species sampling models by allowing...

Please sign up or login with your details

Forgot password? Click here to reset