Hierarchical Species Sampling Models

by   Federico Bassetti, et al.

This paper introduces a general class of hierarchical nonparametric prior distributions. The random probability measures are constructed by a hierarchy of generalized species sampling processes with possibly non-diffuse base measures. The proposed framework provides a general probabilistic foundation for hierarchical random measures with either atomic or mixed base measures and allows for studying their properties, such as the distribution of the marginal and total number of clusters. We show that hierarchical species sampling models have a Chinese Restaurants Franchise representation and can be used as prior distributions to undertake Bayesian nonparametric inference. We provide a method to sample from the posterior distribution together with some numerical illustrations. Our class of priors includes some new hierarchical mixture priors such as the hierarchical Gnedin measures, and other well-known prior distributions such as the hierarchical Pitman-Yor and the hierarchical normalized random measures.



There are no comments yet.


page 28


Modelling and computation using NCoRM mixtures for density regression

Normalized compound random measures are flexible nonparametric priors fo...

BNPdensity: Bayesian nonparametric mixture modeling in R

Robust statistical data modelling under potential model mis-specificatio...

Enriched Pitman-Yor processes

In Bayesian nonparametrics there exists a rich variety of discrete prior...

On Johnson's "sufficientness" postulates for features-sampling models

In the 1920's, the English philosopher W.E. Johnson introduced a charact...

Contaminated Gibbs-type priors

Gibbs-type priors are widely used as key components in several Bayesian ...

Video-based Hierarchical Species Classification for Longline Fishing Monitoring

The goal of electronic monitoring (EM) of longline fishing is to monitor...

Bayesian Nonparametric Inference for "Species-sampling" Problems

"Species-sampling" problems (SSPs) refer to a broad class of statistical...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cluster structures in multiple groups of observations can be modelled by means of hierarchical random probability measures or hierarchical processes that allow for heterogenous clustering effects across groups and for sharing clusters among the groups. As an effect of the heterogeneity, in these models the number of clusters in each group (marginal number of clusters) can differ, and due to cluster sharing, the number of clusters in the entire sample (total number of clusters) can be smaller than the sum of the marginal number of clusters. An important example of hierarchical random measure is the Hierarchical Dirichlet Process (HDP), introduced in the seminal paper of Teh et al. (2006)

. The HDP involves a simple Bayesian hierarchy where the common base measure for a set of Dirichlet processes is itself distributed according to a Dirichlet process. This means that the joint law of a vector of random probability measures



where denotes the Dirichlet process with base measure and concentration parameter . Once the joint law of has been specified, observations are assumed to be conditionally independent given with

If the observations take values in a Polish space, then this is equivalent to partial exchangeability of the array (de Finetti’s representation Theorem). Hierarchical processes are widely used as prior distributions in Bayesian nonparametric inference (see Teh and Jordan (2010) and reference therein), by assuming are hidden variables describing the clustering structures of the data and the observations in the -th group, , are conditionally independent given with

where is a suitable kernel density.

In this paper we provide a new class of hierarchical random probability measures constructed by a hierarchy of generalized species sampling sequences, and call them Hierarchical Species Sampling Models (HSSM). A species sampling sequence is an exchangeable sequence whose directing measure is a discrete random probability


where and are stochastically independent, are i.i.d. with common distribution and the non-negative weights sum to one almost surely. By Kingman’s theory on exchangeable partitions, any random sequence of positive weights such that can be associated to an exchangeable random partition of integers . Moreover, the law of an exchangeable random partition is completely described by the so-called exchangeable probability partition function (EPPF). Hence, the law of the above defined random probability measure , turns out to be parametrized by an EPPF and by a base measure , which is diffuse in species sampling sequences, and possibly non-diffuse in our generalized species sampling construction.

A vector of random measures in the HSSM class can be described as follows. Denote by the law of the random probability measure defined in (2) and let be a vector of random probably measures such that


where is a base measure and and are two EPPFs. A HSSM is an array of observations conditionally independent given with , and .

The proposed framework is general enough to provide a probabilistic foundation of both existing and novel hierarchical random measures, also allowing for non diffuse base measures. Our HSSM class includes the HDP, its generalization given by the Hierarchical Pitman–Yor process (HPYP), see Teh (2006); Du et al. (2010); Lim et al. (2016) and the hierarchical normalized random measures with independent increments (HNRMI), recently studied in Camerlenghi et al. (2018). Among the novel measures, we study the hierarchical generalization of the Gnedin process (Gnedin (2010)) and of finite mixtures (e.g., Miller and Harrison (2017)) and the asymmetric hierarchical constructions with and of different type (Du et al. (2010); Buntine and Mishra (2014)). Also, we consider new HSSM with non-diffuse base measure in the spike-and-slab class of prior introduced by George and McCulloch (1993) and now widely studied in Bayesian parametric (e.g., Castillo et al. (2015), Rockova and George (2017) and Rockova (2018)) and nonparametric (e.g., Canale et al. (2017)) inference. Finally, note that non-diffuse base measures are also used in other random measures models (e.g., see Prunster and Ruggiero (2013)), although these models are not in our class, the hierarchical species sampling specification can be used to generalize them.

By exploiting the properties of hierarchical species sampling sequences, we are able to provide the finite sample distribution of the number of clusters for each group of observations and the total number of clusters. Moreover, we provide some new asymptotic results when the number of observations goes to infinity, thus extending the asymptotic approximations for species sampling given in Pitman (2006)) and for hierarchical normalized random measures given in Camerlenghi et al. (2018).

We show that the measures in the proposed class have a Chinese Restaurant Franchise representation, that is appealing for the applications to Bayesian nonparametrics, since it sheds light on the clustering mechanism of the processes and suggests a simple sampling algorithm for posterior computations whenever the EPPFs and are known explicitly. In the Chinese Restaurant Franchise metaphor, observations are attributed to “customers”, identified by the indexes , denote the restaurants (groups) and the customers are clustered according to “tables”. Hence, the first step of the clustering process (from now on, bottom level) is the restaurant-specific sitting plan. Tables are then clustered, in an higher level of the hierarchy (top level), by means of “dishes” served to the tables. In a nutshell, observations driven by HSSM can be described as follows:


where is the (random) table at which the -th “customer” of “restaurant” sits, is the (random) index of the “dish” served at table in restaurant and the are the ”dishes” drawn from the base probability measure . The distribution of the observation process is completely described once the law of the process is specified. We will see that the process plays a role similar to the one of the random partition associated with exchangeable species sampling sequences.

The paper is organized as follows. Section 2 reviews exchangeable random partitions, generalized species sampling sequences and species sampling random probability measures. Some examples are discussed and new results obtained under the assumption of non-diffuse base measure. Section 3 introduces hierarchical species sampling models, presents some special cases and shows some properties such as the Chinese restaurant franchise representation, which are useful for applications to Bayesian nonparametric inference. Section 4 provides the finite-sample and the asymptotic distribution of the marginal and total number of clusters under both assumptions of diffuse and non-diffuse base measure. A Gibbs sampler for hierarchical species sampling mixtures is established in Section 5. Section 6 presents some simulation studies and a real data application.

2 Background material

Exchangeable random partitions provide an important probabilistic tool for a wide range of theoretical and applied problems. They have been used in various fields such as population genetics Ewens (1972); Kingman (1980); Donnelly (1986); Hoppe (1984), combinatorics, algebra and number theory Donnelly and Grimmett (1993); Diaconis and Ram (2012); Arratia et al. (2003)

, machine learning

Teh (2006); Wood et al. (2009), psychology Navarro et al. (2006), and model-based clustering Lau and Green (2007); Müller and Quintana (2010). In Bayesian nonparametrics they are used to describe the latent clustering structure of infinite mixture models, see e.g. Hjort et al. (2010) and the references therein. For a comprehensive review on exhangeable random partitions from a probabilistic perspective see Pitman (2006).

Our HSSM build on exchangeable random partitions and related processes, such as species sampling sequences and species sampling random probability measures. We present their definitions and some properties which will be used in this paper.

2.1 Exchangeable partitions

A (set) partition of is an unordered collection of disjoint non-empty subsets (blocks) of such that . A partition has blocks (with ) and we denote by , the number of elements of the block . We denote by the collection of all partitions of and, given a partition, we list its blocks in ascending order of their smallest element. In other words, a partition is coded with elements in order of appearance.

A sequence of random partitions, , is called random partition of if for each

the random variable

takes values in and, for , the restriction of to is (consistency property). A random partition of is said to be exchangeable if for every the distribution of is invariant under the action of all permutations (acting on in the natural way).

Exchangeable random partitions are characterized by the fact that their distribution depends on only through its block sizes. In point of fact, a random partition on is exchangeable if and only if its distribution can be expressed by an exchangeable partition probability function (EPPF). An EPPF is a family111 An EPPF can be seen as a family of symmetric functions defined on . To lighten the notation we simply write in place of . Alternatively, one can think that is a function on . of symmetric functions defined on the integers , with , that satisfy the additions rule

(see Pitman (2006)). In particular, if is an exchangeable random partition of , there exists an EPPF such that for every and


where . In other words, corresponds to the probability that is equal to any particular partition of having distinct blocks with frequencies .

Given an EPPF , one deduces the corresponding sequence of predictive distributions. Starting with , given (with ), the conditional probability of adding a new block (containing ) to is


while the conditional probability of adding to the -th block of (for ) is


An important class of exchangeable random partitions is the Gibbs-type partitions, introduced in Gnedin and Pitman (2005) and characterized by an EPPF with a product form, that is

where is the rising factorial (or Pochhammer polynomial), and are positive real numbers such that and


for every and . Hereafter, we report some important examples of Gibbs-type random partitions.

Example 1 (Pitman-Yor two-parameter distribution).

A noteworthy example of Gibbs-type EPPF is the so-called Pitman-Yor two-parameters family, . It is defined by

where and ; or and for some integer , see Pitman (1995); Pitman and Yor (1997). This leads to the following predictive rules

The Pitman-Yor two-parameters family generalizes the Ewens distribution Ewens (1972), which is obtained for

If and , then for , which means that the maximum number of blocks in a random partition of length is with probability one. It is possible to show that these random partitions can be obtained by sampling individuals from a population composed by different species with proportions distributed according to a symmetric Dirichlet distribution of parameter , see Gnedin and Pitman (2005).

Example 2 (Partitions induced by Mixtures of Finite Mixtures).

In Gnedin and Pitman (2005), it has been proved that any Gibbs-type EPPF with is a mixture of partitions with respect to , with mixing probability measure on the positive integers. In this case , where and


These Gibbs type EPPFs can also be obtained by considering the random partitions induced by the so-called Mixture of Finite Mixtures (MFM), see Miller and Harrison (2017). When , Gnedin (2010) shows a distribution on for which has closed-form and this special case will be described in the following example.

Example 3 (Gnedin model).

Gnedin (2010) introduced a sequence of exchangeable partitions with explicit predictive weights


where the parameter must be chosen such that and is (i) either (strictly) positive for all or (ii) positive for and has a root at . In point of fact, the Gnedin model, denoted with GN(, can be deduced as special case of Gibbs-type EPPF with negative described in the previous example. As shown in Theorem 1 by Gnedin (2010), these random partitions have representation (9) with


where are complex root for the equation , that is . See also Cerquetti (2013).

Example 4 (Poisson-Kingman partitions).

Using the ranked random discrete distribution derived from an inhomogeneous Poisson point process, Pitman (2003) introduced a very broad class of EPPF, the so-called Poisson-Kingman exchangeable partition probability functions,


with Lévy density , where and is a measure on (absolutely continuous with respect to the Lebesgue measure) such that


and . This EPPF is related to normalized homogeneous completely random measures of James et al. (2009) (see Example 7 and Appendix A for details).

2.2 Species Sampling Models

Kingman’s theory of random partitions sets up a one-one correspondence (Kingman’s correspondence) between EPPFs and distributions for decreasing sequences of random variables with and a.s., by using the notion of random partition induced by a sequence of random variables.

A sequence of random variables induces a random partition on by equivalence classes if and only if . Note that if is exchangeable then the induced random partition is also exchangeable.

Kingman’s correspondence theorem states that for any exchangeable random partition with EPPF , it exists a random decreasing sequence with and , such that if are conditionally independent allocation variables with


the partition induced by has the same law of . See Kingman (1978) and Aldous (1985). When is such that a.s., Kingman’s correspondence can be made more explicit by the following result: let be the EPPF of a random partition built following the above construction and let be any (possibly random) permutation of , then


where ranges over all ordered -tuples of distinct positive integers, see equation (2.14) in Pitman (2006).

We call Species Sampling random probability of parameter and , , a random distribution , where are i.i.d. with common distribution (not necessarily diffuse) and the EPPF defined by via (16) is . Such random probability measures are sometimes called species sampling models.

Given an EPPF and a diffuse probability measure (i.e., for every ) on a Polish space , an exchangeable sequence taking values on is a Species Sampling Sequence, , if the law of is characterized by the predictive system:

  • ;

  • the conditional distribution of given is


    where is the vector of distinct observations in order of appearance, , , , is the random partition induced by and and are related with the by (6)-(7).

See Pitman (1996).

In point of fact, as shown in Proposition 11 of Pitman (1996), is a if and only if the are conditionally i.i.d. given , with common distribution


where a.s., and are stochastically independent and are i.i.d. with common diffuse distribution . The random probability measure in (18) is said to be proper if a.s. In this case, is a and the EPPF of the exchangeable partition induced by is , where and are related by (16). For more details see Pitman (1996).

The name species sampling sequences is due to the following interpretation: think to as an infinite population of individuals belonging to possibly infinite different species. The number of partition blocks takes on the interpretation of the number of distinct species in the sample , the are the observed distinct species types and the are the corresponding species frequencies. In this species metaphor, is the probability of observing a new species at the -th sample, while is the probability of observing an already sampled species of type .

2.3 Generalized species sampling sequences

Usually a species sampling sequence is defined by the predictive system (17) assuming that the measure is diffuse (i.e. non-atomic). While this assumption is essential for the results recalled above to be valid, an exchangeable sequence can be defined by sampling from a for any measure . More precisely, we say that a sequence is a generalized species sampling sequence, , if the variables are conditionally i.i.d. (given ) with law or equivalently if the directing measure of is . From the previous discussion, it should be clear that if is a with diffuse, then it is a , see Proposition 13 in Pitman (1996). When is not diffuse, the relationship between the random partition induced by the sequence and the EPPF is not as simple as in the non-atomic case. Understanding this relation and the partition structure of the is of paramount importance in order to define and study hierarchical models of type (3), since the random measure in the hierarchical specification (3) is almost surely discrete (i.e. atomic). Moreover, properties of these sequences are also relevant for studying non-parametric prior distributions with mixed base measure, such as the Pitman-Yor process with spike-and-slab base measure introduced in Canale et al. (2017).

Given a random partition , let be the random index of the block containing , that is


or equivalently if for some (and hence all) . In the rest of the paper, if is a partition of and is any EPPF, we will write in place of .

Proposition 1.

Let , with proper and not necessarily diffuse. Let be the allocation sequence defined in (15) by taking the weights of in decreasing order. Assume that and (in the definition of ) are independent. Finally, let be a random partition, with EPPF equal to , also independent of . We define, for every ,


  • the sequence is a ;

  • the sequences and have the same law;

  • for any in the Borel -field of

Remark 1.

If is a and is not diffuse, then is not necessarily the EPPF induced by . To see this, take , and . Let be the random partition induced by and a random partition with EPPF . Using Proposition 1, one gets , if , which shows that the EPPF of is not .

When the base measure is not diffuse, the representation in Proposition 1 can be used to derive the EPPF of the partition induced by any . Since this property is not used in the rest of the paper we leave it for further research. Here we focus on the distribution of the number of distinct observations in , i.e. , for any base measure . We specialize the result for the spike-and-slab type of base measures, which have been used by Suarez and Ghosal (2016) in DP and by Canale et al. (2017) in PY processes.

Corollary 1.

Let the assumptions of Proposition 1 hold.

  • If (for ) is the probability of observing exactly distinct values in the vector and let be the random partition induced by then,

  • If the base measure is in the spike-and-slab class, i.e. where , is a point of and is a diffuse measure on , then

Corollary 2.

Under the assumptions of Proposition 1, for every and one has with and


where and are related to by (6)-(7),

Remark 2.

Equation (20) in Corollary 2 differs substantially from the predictive system in (17) due to the conditioning on the latent partition . Nevertheless, if is diffuse then a.s., conditioning on is the same as conditioning on and is equal to the -th distinct observation in order of appearance. Hence, in this case, (20) reduces to (17).

Hereafter, we discuss some examples of and which will be used in our hierarchical constructions.

Example 5 (Pitman-Yor and Dirichlet processes).

If is the two-parameter Pitman-Yor distribution and , then is a Pitman-Yor process (PYP), denoted with where and are the discount and concentration parameters, respectively, and is the base measure (see Pitman (1996)). To see this equivalence, recall the description of the PYP in terms of its stick-breaking construction


where is an i.i.d. sequence of random variables with law and , with independent random variables. From (21), it is clear that a is a . Moreover, it is well-known that the EPPF associated to the weights defined above is the Pitman-Yor EPPF of Example 1 (see Pitman (1995, 1996); Pitman and Yor (1997)). As a special case, if is the Ewens distribution, , and , then is a Dirichlet process (DP) denoted with . Note that this is true even if have atoms.

Example 6 (Mixture of finite mixture processes).

If is the distribution described in Example 2 and , then is a mixture of finite mixture process denoted with and can be written as


where , is a p.m. on , , and


see Miller and Harrison (2017). For and given by (12), an analytical expression for is available (see Example 3) and the process is called Gnedin process (GP) denoted with .

Example 7 (Normalized completely random measures).

Assume and let be a measure satisfying (14), a Poisson-Kingman EPPF defined in (13) and a probability measure (possibly not diffuse) on . If , then is a normalized homogeneous completely random measure, , of parameters . See Appendix A for the definition. The sequence obtained by sampling from , i.e. a , is a sequence from a . All these facts are well known when is a non-atomic measure (see James et al. (2009)). The results for general measures and are implicitly contained, although not explicitly stated, in Sangalli (2006). A detailed proof of the general case is given in Proposition 13 of Appendix A.

3 Hierarchical Species Sampling Models

In this section we introduce hierarchical species sampling models (HSSMs) and study the relationship between HSSMs and a general class of hierarchical random measures which contains some well-known random measures (e.g., the HDP of Teh et al. (2006)). Some examples of HSSM are provided and some relevant properties of the HSSMs are given, such as the clustering structure and the induced random partitions.

3.1 HSSM definition, properties and examples

In the following definition a hierarchy of exchangeable random partitions is used to build hierarchical species sampling models.

Definition 3.

Let and be two EPPFs and

a probability distribution on

. We define an array as a Hierarchical Species Sampling model, , of parameters , if for every vector of integer numbers and every collection of Borel sets it holds


with . Moreover, the directing random measures of the array are called Hierarchical Species Sampling random measures, .

The next result states a relationship between hierarchies of and hierarchies of random probabilities, which are widely used in Bayesian nonparametrics, thus motivating the choice of name Hierarchical Species Sampling Model (HSSM) for the stochastic representation in Definition 3.

Proposition 2.

Let be a vector of random probably measures such that

where is a base measure. Let be conditionally independent given with , where and . Then, is a .

Proposition 2 provides a probabilistic foundation to a wide class of hierarchical random measures. It is worth noticing that the base measure is not necessarily diffuse and, thanks to the properties of the SSrp and of the gSSS (see Proposition 1), the hierarchical random measures in Proposition 2 are well defined also for non-diffuse (e.g., atomic or mixed) probability measures . Our result is general enough to be valid for many of the existing hierarchical random measures (e.g., Teh et al. (2006), Teh (2006), Bacallado et al. (2017)). As with species sampling sequences, HSSMs enjoy some exchangeability and clustering properties stated in the following proposition, where, recalling (19), denotes the random index of the block of the random partition that contains .

Proposition 3.

Let be i.i.d. exchangeable partitions with EPPF and a random probability measure independent of . If are conditionally i.i.d. with law given , then the random variables


are partially exchangeable and satisfy Eq. (24). Furthermore, if is a then the sequence is a .

The stochastic representation given in Proposition 3 allows us to find a simple representation of the HSSM clustering structure (see Section 3.2). In Bayesian nonparametric inference, such representation turns out to be very useful because it leads to a generative interpretation of the nonparametric-priors in the HSSM class, and also makes possible to design general procedures for approximated posterior inference (see Section 5).

The definition of Hierarchical Species Sampling models includes some well known hierarchical processes and allows for the definition of new processes, as showed in the following set of examples.

Example 8 (Hierarchical Pitman-Yor process).

Let and denote PY and DP processes, respectively, given in Example 5. A vector of dependent random measures , with law characterized by the following hierarchical structure


is called Hierarchical Pitman-Yor Process, , of parameters