1 Introduction
In this paper, we introduce a new threeparameter generalization of the Indian buffet process (IBP). The IBP, which is firstly introduced by [2], is an exchangeable distribution over binary matrices with a finite number of rows but an infinite number of columns. In the context of latent feature models, a binary matrix for describes feature allocation for objects by letting if the th object possesses the th feature and otherwise. The IBP and its two and threeparameter generalizations [9, 8] has been widely used in various applications [e.g., 4, 6, 5, 1].
It is well known that the expected number of features increases in a certain rate (logarithmic or polynomial) as the number of objects increases under both the one, two, and threeparameter IBPs [2, 9, 8]. Therefore, these IBPs, which can produce many unnecessary features, may not be suitable for modelling data sets that are believed to have a finite number of features. For example, in macroeconomic applications, fluctuations in data such as stock return can boil down to several important sources, so it is natural to assume that the number of features is fixed even if the data dimension increases [7].
In this paper, we propose a new stochastic process for latent feature models, under which the distribution of the number of features converges to a certain fixed distribution. Under the proposed process, fewer unnecessary features are generated than under the standard IBPs, and thus both interpretability and prediction ability of the model can be improved.
1.1 Convergent Indian buffet process
Our proposed variant of the IBP, which we call the convergent Indian buffet process (CIBP) can be described by the following restaurant analogy.
Definition 1 (The restaurant analogy of the CIBP).
Let , and . We call the stochastic process given below the restaurant analogy of :

The first customer tries dishes, where denotes the beta function with parameters and .

For every , the th customer

tries each previously tasted dish independently according to
(1.1) where is the number of previous customers (before th customer) who have tried the th dish;

and tries
(1.2) new dishes.

The restaurant analogy leads to the binary matrix with the number of rows being the number of costumers and the number of columns being unbounded, where the th element of the binary matrix is equal to 1 if the th customer tried the th dish and 0 otherwise. We denote by the distribution of the binary matrix induced by the above restaurant analogy. In this note, we discuss properties and construction of the CIBP.
1.2 Organization
The rest of the paper is organized as follows. In Section 2, we show that the number of features under the CIBP follows the Poisson distribution, with mean monotonically increasing but converging to a certain value as the number of objects goes to infinity. The name convergent IBP is named after this property. We also describe connection between CIBP and the twoparameter IBP. In Section 3
, we provides two alternative representations of the CIBP, where the first one is based on a hierarchical distribution of Poisson, Beta and Bernoulli distributions and the second one is based on random measures. In
Section 4, as an application, we use the CIBP as the prior distribution on the factor loading matrix for Bayesian estimation of a sparse factor model. We provide a straightforward posterior computation algorithm and some numerical examples. In
Section 5, we give the proofs for the results of Section 3. Section 6 concludes the paper.1.3 Notation
We denote by the indicator function. Let be the set of real numbers and be the set of positive numbers. Let be the set of natural numbers. For , we let . For noational convenience, we let be the ratio of two beta functions defined as
2 Properties
2.1 Distribution of the number of features
In this section, we show that the number of features (i.e., dishes) under the CIBP follows a Poisson distribution with mean being fixed as the number of objects increases. Let be the number of nonzero columns of , which represents the number of features. Formally, we can define
where denotes the th column of . The following proposition describes the distribution of .
Proposition 2.1.
If , then
(2.1) 
where Moreover, the Poisson mean monotonically increases and converges to as , which, in particular, implies that
converges to the random variable
in distribution.Proof.
From the restaurant analogy of the CIBP, we have that
Therefore, by the additive property of independent Poisson random variables,
From the identity , we have
which implies 2.1. The fact that follows from that .
For the second assertion, note that
where denotes the gamma function. Since , it follows that as . ∎
2.2 Exchangeability
Exchangeability of the IBP makes corresponding posterior computation algorithms tractable. The CIBP is an exchangeable distribution also, as shown in the following corollary. This is a direct consequence of Proposition 3.1 and Proposition 3.2 which are presented in the next section.
Corollary 2.2.
Assume that a dimensional binary matrix follows
. Then the random vectors
are exchangeable, where denotes the th row of the matrix .2.3 Connection to the twoparameter IBP
The restaurant analogy of the twoparameter IBP with parameters and is as follows: The first customer tries dishes. The th customer for tries each previously tasted dish independently according to and tries new dishes. We denote by the distribution induced by the above restaurant analogy.
By comparing the restaurant analogies of and , we then have the following proposition that connects these two stochastic processes.
Proposition 2.3.
For two dimensional binary matrices and , converges to in distribution as and .
Proof.
We visualize the result of the above propostion. Figure 1 shows four binary matrices generated by and with , but with , and . We can see that the IBP tends to generate more features than the CIBP.
3 Alternative representations
In this section, we provides two alternative representations of the CIBP. The first one is based on a hierarchical distribution of Poisson, Beta and Bernoulli distributions and the second one is based on random measures. The proofs of all the results in this section are deferred to Section 5.
3.1 Hierarchical representation
In this section we show that the CIBP is equivalent to the following hierarchical distribution.
Definition 2 (Hierarchical representation of the CIBP).
Let , and
. We call the probability distribution given below the hierarchical representation of
:(3.1)  
To state the result rigorously, we need a concept of lofequivalence classes. Under the latent feature model, the ordering of the features does not affect the likelihood of the data. Hence, we say that two dimensional binary matrices are equivalent if they are identical up to a permutation of columns. It is convenient to choose a representative of every equivalence class by the leftordering procedure. The leftordering procedure maps each dimensional binary matrix to its leftordered version whose columns are ordered by the score , which is defined by
i.e., the colums are ordered so that We call the equivalence class defined by the leftordering procedure lofequivalence class and we denote the lofequivalence class of a binary matrix by .
We introduce useful notations. Let which is a set of dimensional binary vectors and where is the vector or zero. For each , we define
(3.2) 
where denotes the th column of . In words, is the number of columns equal to the binary vector . Note that . Moreover, let
be the number of rows that have the th feature.
In the next proposition, we provide the explicit form of the probability mass function of the lofequivalence class
.Proposition 3.1.
If a dimensional random binary matrix follows the distribution in 3.1, then
(3.3) 
From Proposition 3.1, we can show that the restaurant analogy and the hierarchical representation of the CIBP are equivalent.
Proposition 3.2.
Suppose that a dimensional binary matrix follow the hierarchical distribution presented in 3.1. Then the lofequivalence class follows .
3.2 Random measure representation
In this section, we provide another representation of the CIBP, which is based on random measures.
We first briefly review completely random measures. Let a Polish space with its Borel field and let be a set of all measures on with its Borel field. A completely random measure (CRM) on is a random measure such that for all disjoint measurable sets are mutually independent. Every CRM can be decomposed into three independent parts:
where is a nonrandom measure, are fixed atoms in , are independent random variables on and is a Poisson process on . Here we only consider purelyatomic CRMs such that . We write
if is the purelyatomic CRM represented by with for and for some probability measures on and on . In particular, we write if with .
It is well known that the twoparameter IBP, with and , has the following random measure representation:
for some smooth probability measure , i.e., . Here, denotes the Bernoulli process with mean , which is equivalent to on with
where denotes a point mass at 1, and denotes the Beta process which is equivalent to on with
We introduce another stochastic process represented by a random measure, which will be shown to be related to the CIBP.
Definition 3 (Random measure representation of the CIBP).
Let , and . We call the stochastic process given below the random measure representation of :
(3.4)  
with
(3.5) 
for some smooth probability measure .
The next theorem shows that the hierarchical representation in Definition 2 and random measure representation in Definition 3 of the CIBP are equivalent.
Proposition 3.3.
Let be random measures following the distribution given in 3.4
. Then the joint distribution of
is given by(3.6) 
where there are atoms such that for , and denotes the density of .
The function is integrable on , which means that there would be a finite number of features.
4 Application to Bayesian sparse factor models
In this section, we consider an application of the CIBP prior distribution to Bayesian estimation of the factor model.
4.1 Model and prior
We consider the following factor model where a dimensional random vector is distributed as
(4.1) 
with , being a factor loading matrix, a dimensional factor and
a noise variance.
We consider the following prior on the loading matrix . Let be the th entry of the dimensional loading matrix . We impose the prior distribution based on the CIBP distribution such that
where and . That is, we impose on the binary matrix . We refer to the above distribution on as , which is an abbreviation of spike and slab CIBP.
4.2 Posterior computation
We provide an Markov chain Monte Carlo (MCMC) algorithm for sampling from the posterior distribution under the
prior on and inverse Gamma prior on . Let be the number of nonzero columns of the loading matrix . The MCMC algorithm is as follows:Sample for and .
The factor loading is sampled from the conditional posterior
where
Sample for and .
When we sample , we use the fact that the CIBP is exchangeable to assume that the th customer is the last customer to enter the restaurant. Therefore, for each , is sampled with probability
where . We then sample for each of the infinitely many allzero columns. To do this, we use the Metropolis–Hastings (MH) steps as follows. We propose and from the proposal distribution
Then we accept the proposal with probability
where
If the proposal is accepted, we update
Sample for .
The latent variable is sampled from
where .
Sample .
The noise variance is sampled from
4.3 Simulation
We conduct simulation to compare the CIBP and the twoparameter IBP when they are used as prior distributions for the sparse factor model.
We generate simulated data sets as follows. For each value , we generate a dimensional loading matrix with the number of nonzero rows
. The loadings in the sampled nonzero rows are generated from the uniform distribution on
. Then we samplerandom vectors from the multivariate normal distribution with mean
and variance independently. We repeat this generating procedure 100 times.For each synthetic data set, we compute the posterior distribution under the and prior, respectively. For the CIBP prior, we set , and . For the IBP prior, we set which is equal to . In Figure 2, we present the posterior mean of the number of factors under the and prior, respectively, for over 100 replications. As the dimension increases, the prior tends to more largely overestimate the number of factors. But the prior provides accurate estimates of the number of factors for all the values of
5 Proofs for Section 3
5.1 Proof of Proposition 3.1
Proof.
Recall that . If , we have that
where the second equality follows from reordering the columns such that if and otherwise. Recall that . Therefore, since the cardinality of the lofequivalence class is , the probability of a lof equivalence class of given is given by
If , it is clear that .
Let the probability mass function of , i.e., for . Marginalizing over , we have that
The summation term of the preceding display can be written as
(5.1)  
where we use the identity for the second inequality. Lastly, from the identity , it follows that
(5.2)  
5.2 Proof of Proposition 3.2
Proof.
The proof is by induction. Let be the th row of . For , from a Poisson likelihood, we have
where is a number of nonzero elements of . It is same as 3.3 with and .
For , consider the conditional distribution of given , which is given by
(5.3)  
where , is the number of new features sampled by the th customer and is the set of dishes taken by the th customer, i.e., . Let and . By the inductive hypothesis, we have
Since for and otherwise, we have
and similarly,
Therefore,
(5.4)  