A multiset is a set with possible repetitions of its elements. A popular class of models for random multisets in Bayesian nonparametric applications are the negative binomial processes
, which have been applied as topic models in document analysis and as latent factor models for image segmentation and object detection in computer vision, among other applications[HR2014, BMPJpre, ZHDC12]. In this article, we study exchangeable sequences of point processes on a measurable space that are rendered conditionally-i.i.d. by a random measure called the base measure. Borrowing language from the theory of exchangeable sequences, we say that directs the exchangeable sequence . Unconditionally, the measures will in general not be negative binomial processes, and we therefore refer to as an exchangeable sequence of multisets directed by .
In this work, we present algorithms to construct from any exchangeable sequence of Bernoulli processes directed by . (We review Bernoulli processes in Section 2.) So long as the total mass of is almost surely (a.s.) finite, our constructions are also finitary. That is, even if the support of is a.s. infinite, our construction of each
is, with probability one, entirely determined by the finite set of atoms in the support of some prefix of. In particular, our construction makes no direct use of and so need not even be represented explicitly. Such constructions are useful for several reasons, notably:
In Bayesian nonparametric applications, we are interested in cases when has a countably infinite set of atoms, e.g., in the popular beta process [Hjort1990, TJ2007].
Different models may be imposed on the base measure for various applications, e.g., generalizations of the beta process [TG2009, Roy13CUP, HR14Gibbs] and hierarchies thereof [TJ2007, Roy13CUP], in which case it is convenient to have a black-box method.
For the case when is a beta process (a precise definition is given in Section 3), a finitary construction for was given by HR2014 (as well as by ZMPS2016 for a reparameterization of the beta process), which takes advantage of conjugacy between beta processes and negative binomial processes [BMPJpre, ZHDC12, Hjort1990, Kim1999]. However, this approach does not generalize easily to other classes of base measures. Therefore, instead of tailoring constructions to different cases, our approach provides a black-box method to construct , assuming only that we have access to some exchangeable sequence of Bernoulli processes directed by .
Finitary constructions for exchangeable sequences of Bernoulli processes are known for several different classes of directing random base measures. For example, when is a beta process, one finitary construction for is provided by the Indian buffet process (IBP) [GG06, GGS2007]. TG2009 generalized the beta process to the stable beta process and provided a finitary construction for in this case by generalizing the IBP to the stable IBP (studied further by BJP2012), which was shown to exhibit power-law behavior in latent feature modeling applications. Roy13CUP provided a further generalization to a large class of random base measures called generalized beta processes, along with a corresponding generalization of the IBP. A special subclass called Gibbs-type beta processes (corresponding to a Gibbs-type IBP) was studied by HR14Gibbs, which broadened the profile of attainable power-law behaviors beyond those achieved with the stable IBP.
Another useful modeling paradigm is obtained by organizing random base measures into hierarchies (see TJ2007 for the prototypical example). Such random base measures are useful in admixture or mixed-membership models, where there is latent structure shared between several distinct groups of data. Roy13CUP provided a finitary construction for directed by a hierarchy of generalized beta processes, which, as discussed, includes hierarchies of all previously mentioned random base measures as special cases. In Section 3, we will illustrate the application of our construction when the directing random measure is a hierarchy of beta processes.
Finally, we note that alternative methods to construct directed by random base measures with a countably infinite number of atoms may be obtained with stick-breaking constructions [TGG07, PZWGC2010] or inverse Lévy measure methods [WI1998]
. These constructions truncate the number of atoms in the underlying base measure and are therefore not exact, so Markov chain Monte Carlo (MCMC) techniques need to be introduced in order to remove this error, as in BMPJpre,ZHDC12. Again, these approaches must be tailored to each specific case and are only accessible if such alternative representations for the random base measure exist. Moreover, the representation is only exact in the asymptotic regime of the Markov chain. Our approach is to instead avoid representing the underlying random base measure altogether, which has practical benefits (in addition to it being a black-box method) as MCMC subroutines need not be implemented for the simulation of.
The remainder of the article is organized as follows. We provide background and formally define notation in Section 2. In Section 3, we present our black-box construction in the case when the parameter (of the law of the negative binomial process) is an integer, which takes an intuitive approach. We conclude in Section 4 by applying a rejection sampling subroutine in order to generalize our constructions to any parameter .
2. Notation and background
The focus of this article is on exchangeable sequences of random multisets and their de Finetti (mixing) measures. Let be a complete, separable metric space equipped with its Borel -algebra and let denote the non-negative integers. We represent multisets of by -valued random measures. In particular, by a point process, we will mean a random measure on such that is a
-valued random variable for every. Because is Borel, we may write for some random elements in and (not necessarily distinct) in . We will take to represent the multiset of its unique elements with corresponding multiplicities .
2.1. Completely random measures
We build on the theory of completely random measures [Kallenberg2002]*Ch. 12; [Kingman1967]. Recall that every completely random measure can be written as a sum of three independent parts
called the diffuse, fixed, and ordinary components, respectively, where:
is a non-random, non-atomic measure;
is a non-random countable set whose elements are referred to as the fixed atoms and whose masses are independent random variables in (the non-negative real numbers);
is a Poisson process on whose intensity measure is -finite and has diffuse projections onto , i.e., the measure on is non-atomic.
2.2. Base measures
Let denote the space of -finite measures on whose atoms have measure less than one111Equipped with the -algebra generated by the projection maps , for all .. Elements in are called base measures. For the remainder of the article, fix a base measure in given by
for some non-atomic measure ; a countable set ; and constants in .
2.3. Negative binomial processes
We say that a random variable in has a negative binomial distribution with parameters
negative binomial distribution with parametersand , written , if its probability mass function (p.m.f.) is given by
where denotes the -th rising factorial (and its analytic continuation).
Definition 2.1 (negative binomial process).
We call a point process on a negative binomial process with parameter and base measure , written , if it is purely atomic and completely random with fixed component
and with an ordinary component that has intensity measure
The fixed component of this process was originally defined in BMPJpre,ZHDC12, and by ThibauxThesis for the case when , corresponding to a geometric process. The ordinary component was additionally specified in HR2014, which we note is simply a Poisson (point) process on with intensity measure , and in Section 3 we will see that this specification is natural.
We may alternatively characterize the law of a negative binomial process with its Laplace functional; the following may be verified with an application of the Lévy-Khinchin theorem (see HR2014*Sec. 2.2).
Let . The Laplace functional of the law of is given by
for every measurable function , where .
2.4. Bernoulli processes
As mentioned in the introduction, our algorithms require an exchangeable sequence of Bernoulli processes, a class of completely random measures defined in this context by Hjort1990,TJ2007, though it should not be confused with the classic Bernoulli process studied in statistics and probability.
Definition 2.2 (Bernoulli process).
We call a point process on a Bernoulli process with base measure , written , if it is purely atomic and completely random with fixed component
and with an ordinary component that has intensity measure
Note that the ordinary component here is a Poisson process on with intensity measure . Also note that the Bernoulli process is a.s. simple (i.e., has unit-valued atomic masses) and finite.
We now summarize the article more formally: our results provide an algorithm parameterized by some and takes as input an exchangeable sequence of simple point processes on , and outputs a sequence of point processes on . If satisfies
for some random element in , then satisfies
Importantly, we need not explicitly represent , and note in particular that other than being -finite, the random base measure may be arbitrary: it need not be purely atomic nor completely random.
3. A negative binomial urn scheme
We now present our main construction for the case when is a positive integer, followed by a few demonstrative examples. Let and let be an array of simple point processes on . For every ,
where denotes the support of ;
We may write for some random element in and a.s. unique random elements in ;
Define a sequence of point processes on where, for every ,
For every , put These definitions imply that the measure is a.s. concentrated on a subset of , i.e., a.s.
Definition 3.1 (negative binomial urn scheme).
We call a negative binomial urn scheme induced by with parameter .
For intuition, if we think of in Item 3 as a sequence of independent Bernoulli trials each with unknown success probability, then simply counts the number of successes in the sequence before failures, i.e., it has a negative binomial distribution. The construction of negative binomial variates from Bernoulli variates is central to the article (see creftype A.1 for a precise algorithm) and the following result may be thought of as an infinite dimensional analogue of this construction.
Let , let be a random element in , and let be an exchangeable array of Bernoulli processes directed by . Let be a negative binomial urn scheme induced by with parameter . Then, conditioned on , the are i.i.d. negative binomial processes with parameter and base measure .
It is clear that the random measures are conditionally independent given . Fix . We must show that . Let be a measurable function and recall the notation . We have
for some measurable function . Let , and note that is -measurable, so there exists a measurable function such that a.s.. We have , where the right-hand side denotes the infinite dimensional product measure, and so by the disintegration theorem [Kallenberg2002]*Thm. 6.4 we have
We may therefore characterize using the structure of (i.e., without regard to the random base measure ), and if we show that it has the form of the Laplace functional of the (law of the) negative binomial process, then Section 3 extends the result to the randomization in Section 3.
Let a.s. for some non-random measure whose set of atoms we denote by . We have that are independent random variables and
Let , for every , be the restrictions of the Bernoulli processes to their ordinary components (which we recall are independent from the fixed components). Then the are independent Poisson processes (independent also from ), each with intensity measure . It follows that a.s. Next, let ; it is straightforward to verify that a.s., and so is a Poisson process with intensity measure . Then
where the factors in the second term of the last line are obtained from the Laplace transform of the negative binomial distribution. This is the Laplace functional of the law of the negative binomial process with parameter and base measure , as desired. ∎
We now provide two illustrative examples: when the directing measure is (1) the beta process [Hjort1990] and (2) a hierarchy of beta processes [TJ2007]. Both of these random base measures are purely atomic and completely random, however, we note that in general the directing measure in creftype 3.1 may have a diffuse component or may not even be completely random.
For the remainder of the section, let be a non-negative measurable function, which we call a concentration function.
Definition 3.2 (beta process).
We call a random base measure in a beta process with concentration function and base measure and we write , if it is purely atomic and completely random with fixed component
and with an ordinary component that has intensity measure
Because the measure in Definition 3.2 is not finite, the beta process has an infinite number of atoms a.s. However, consider the following construction: Let be a sequence of simple point processes on , where and
where . TJ2007 showed that is an exchangeable sequence of Bernoulli processes directed by a beta process , and that the combinatorial structure of this sequence is in a sense described by the Indian buffet process [GG06, GGS2007], which has found many uses in latent feature modeling applications [GG2011]. Passing the sequence into the construction in creftype 3.1, we would obtain an exchangeable sequence of multisets directed by , which is an alternative to the construction already provided for this case by HR2014.
Hierarchies of random base measures have also found many uses in Bayesian nonparametrics as admixture or mixed-membership models [TJ2007]. In particular, we call a random base measure in a hierarchy of beta processes if there exists a beta process such that
Roy13CUP provides the following construction for an exchangeable sequence of Bernoulli processes directed by , which takes as input the exchangeable sequence of Bernoulli process directed by defined in Section 3: Let be a sequence of simple point processes on with a.s. and
Proposition 3.1 (one-parameter process; Roy13CUP).
There exists an a.s. unique random element in such that, conditioned on , is a beta process with concentration function and base measure . Furthermore, conditioned on , the are i.i.d. Bernoulli processes with base measure .
Roy calls a one-parameter process induced by with concentration function , and comparing creftype 3.1 to creftype 3.1, we can think of the negative binomial urn scheme as a negative binomial extension of the one-parameter process. The following construction for an exchangeable sequence of negative binomial processes directed by a hierarchy of beta processes follows straightforwardly from creftype 3.1 and creftype 3.1:
Let and be as in creftype 3.1, and arbitrarily arrange into an array . Let be a negative binomial urn scheme induced by with parameter . Then, conditioned on , the are i.i.d. negative binomial processes with parameter and base measure .
It is straightforward to see that the one-parameter process can be repeatedly applied to produce an exchangeable sequence of Bernoulli processes (and thus negative binomial processes) directed by an arbitrarily deep hierarchy of beta processes.
As discussed in the introduction, Roy13CUP also defined a generalization of the beta process with a broad class of random base measures called generalized beta processes, which contains the beta process, the stable beta process [TG2009, BJP2012], and the Gibbs-type beta process [HR14Gibbs] as special cases. A finitary construction for an exchangeable sequence of Bernoulli processes directed by a generalized beta process (as well as its hierarchies) is provided therein. Passing such a sequence through the negative binomial urn scheme as in creftype 3.2 immediately yields an exchangeable sequence of negative binomial processes directed by any member from among these subclasses, e.g., stable beta processes, hierarchies of stable beta processes, hierarchies of Gibbs-type beta processes, etc.
4. Generalization to
The negative binomial urn scheme in creftype 3.1 is only valid for positive integer values of , and we now generalize this construction to any positive parameter value . Recall that we require a method to simulate a negative binomial variate given only an i.i.d. sequence of -coins, i.e., an i.i.d. sequence of Bernoulli variates , where is unknown. For integer , we simply recorded the number of heads before tails in the sequence, accomplished by Item 3. However, for non-integer , we require a different approach. In the following algorithm, we propose a variate (simulated with the -coins), and accept or reject the proposal with a simple rejection sampler. Recall that denotes the -th rising factorial (and its analytic continuation).
Let and let be an i.i.d. sequence of -coins. Set .
Simulate with -coins.
If , then set ;
If , then set and GO TO step 1.
Algorithm 4.1 outputs , and the expected number of iterations is
This is a rejection sampler [robert1999monte] with a proposal distribution , a target distribution , and a constant chosen such that , for every . Choosing , the probability that a proposed sample is accepted is
The expected number of rejected samples
is geometrically distributed with mean, and the expected number of iterations has mean . ∎
In analogy to the work on Bernoulli factories [keane1994bernoulli, nacu2005fast, latuszynski2011], where one wants to simulate -coins (for some function ) from -coins when is unknown, we call Algorithm 4.1 a negative binomial factory and write
to denote that is simulated from a negative binomial factory with parameter and input sequence . We generalize creftype 3.1 to parameters using random variables from a negative binomial factory in the following algorithm that slightly alters the negative binomial urn scheme. Let and let be an array of simple point processes on . For every ,
Put , and let be a collection of random variables, conditionally independent given , and
Define a sequence of point processes on where, for every ,
and , otherwise. For every , put .
When , creftype 3.1 holds with this construction for .
Fix . The proof parallels that for creftype 3.1, except that the construction of the negative binomial variate in Section 3 is now given by Algorithm 4.1 and creftype 4.1. This verifies the form of the fixed component of , however, verifying the form of the ordinary component differs slightly.
Recall the definition of the non-random measure in with set of atoms and non-atomic part . We must show that the ordinary component of is still a Poisson process with intensity measure .
Recall the definitions of , for every , which are independent Poisson processes on with intensity measure . We have that is a Poisson process on with intensity measure , and by construction . Because a.s., then for every , the sequence satisfies
that is, only one entry in the sequence is equal to one a.s., which must occur within the first entries. Therefore, independently for every , executing step 1 of Algorithm 4.1 with input sequence will compute
The algorithm therefore outputs on its first iteration with probability
otherwise it outputs a.s. It follows that is a.s. simple and, by a Poisson process thinning argument, we have that is a Poisson process with intensity measure , as desired. ∎
Appendix A Decompositions of negative binomial distributions
Here we present several basic results on the negative binomial distribution. The following classic result can be shown using moment generating functions:
Proposition A.1 (Sums of negative binomial random variables).
Let be a sequence of positive real numbers and let . Let be a collection of independent random variables with
Then , where . ∎
For , the negative binomial distribution has an interpretation as describing the number of successes before failures in a sequence of independent Bernoulli trials, with the probability of success in each trial equal to :
Let , let be an i.i.d. sequence of Bernoulli random variables with success probability , and let be the random variable in given by
First consider when , so that
Because the are i.i.d., it follows that
which is the p.m.f. of the geometric distribution with parameter . The remainder of the proof follows by induction. Assume . Put
so that a.s. By the same argument as above, we have that , and by creftype A.1, the result follows. ∎
We thank Krzysztof Łatuszyński for helpful discussions.