1 Introduction
There has been increasing interest in count modeling using the Poisson process, geometric process [1, 2, 3, 4] and recently the negative binomial (NB) process [5, 6]. Notably, it has been independently shown in [5] and [6] that the NB process, originally constructed for count analysis, can be naturally applied for mixture modeling of grouped data , where each group . For a territory long occupied by the hierarchical Dirichlet process (HDP) [7] and related models, the inference of which may require substantial bookkeeping and suffer from slow convergence [7], the discovery of the NB process for mixture modeling can be significant. As the seemingly distinct problems of count and mixture modeling are united under the NB process framework, new opportunities emerge for better data fitting, more efficient inference and more flexible model constructions. However, neither [5] nor [6] explore the properties of the NB distribution deep enough to achieve fully tractable closedform inference. Of particular concern is the NB dispersion parameter, which was simply fixed or empirically set [6], or inferred with a MetropolisHastings algorithm [5]. Under these limitations, both papers fail to reveal the connections of the NB process to the HDP, and thus may lead to false assessments on comparing their modeling abilities.
We perform joint count and mixture modeling under the NB process framework, using completely random measures [1, 8, 9]
that are simple to construct and amenable for posterior computation. We propose to augmentandconquer the NB process: by “augmenting” a NB process into both the gammaPoisson and compound Poisson representations, we “conquer” the unification of count and mixture modeling, the analysis of fundamental model properties, and the derivation of efficient Gibbs sampling inference. We make two additional contributions: 1) we construct a gammaNB process, analyze its properties and show how its normalization leads to the HDP, highlighting its unique theoretical, structural and computational advantages relative to the HDP. 2) We show that a variety of NB processes can be constructed with distinct model properties, for which the shared random measure can be selected from completely random measures such as the gamma, beta, and betaBernoulli processes; we compare their performance on topic modeling, a typical example for mixture modeling of grouped data, and show the importance of inferring both the NB dispersion and probability parameters, which respectively govern the overdispersion level and the variancetomean ratio in count modeling.
1.1 Poisson process for count and mixture modeling
Before introducing the NB process, we first illustrate how the seemingly distinct problems of count and mixture modeling can be united under the Poisson process. Denote as a measure space and for each Borel set , denote
as a count random variable describing the number of observations in
that reside within . Given grouped data , for any measurable disjoint partition of , we aim to jointly model the count random variables . A natural choice would be to define a Poisson process , with a shared completely random measure on , such that for each . Denote and . Following Lemma 4.1 of [5], the joint distributions of
are equivalent under the following two expressions:(1)  
(2) 
Thus the Poisson process provides not only a way to generate independent counts from each , but also a mechanism for mixture modeling, which allocates the observations into any measurable disjoint partition of , conditioning on and the normalized mean measure .
To complete the model, we may place a gamma process [9] prior on the shared measure as , with concentration parameter and base measure , such that for each , where can be continuous, discrete or a combination of both. Note that now becomes a Dirichlet process (DP) as , where and . The normalized gamma representation of the DP is discussed in [10, 11, 9] and has been used to construct the grouplevel DPs for an HDP [12]. The Poisson processhas an equaldispersion assumption for count modeling. As shown in (2), the construction of Poisson processes with a shared gamma process mean measure implies the same mixture proportions across groups, which is essentially the same as the DP when used for mixture modeling when the total counts are not treated as random variables. This motivates us to consider adding an additional layer or using a different distribution other than the Poisson to model the counts. As shown below, the NB distribution is an ideal candidate, not only because it allows overdispersion, but also because it can be augmented into both a gammaPoisson and a compound Poisson representations.
2 AugmentandConquer the Negative Binomial Distribution
The NB distribution has the probability mass function (PMF) . It has a mean smaller than the variance , with the variancetomean ratio (VMR) as and the overdispersion level (ODL, the coefficient of the quadratic term in ) as . It has been widely investigated and applied to numerous scientific studies [13, 14, 15]. The NB distribution can be augmented into a gammaPoisson construction as
, where the gamma distribution is parameterized by its shape
and scale . It can also be augmented under a compound Poisson representation [16] as , where is the logarithmic distribution [17] with probabilitygenerating function (PGF) In a slight abuse of notation, but for added conciseness, in the following discussion we use to denote .The inference of the NB dispersion parameter has long been a challenge [13, 18, 19]. In this paper, we first place a gamma prior on it as . We then use Lemma 2.1 (below) to infer a latent count for each conditioning on and . Since by construction, we can use the gamma Poisson conjugacy to update . Using Lemma 2.2 (below), we can further infer an augmented latent count for each , and then use these latent counts to update , assuming . Using Lemmas 2.1 and 2.2, we can continue this process repeatedly, suggesting that we may build a NB process to model data that have subgroups within groups. The conditional posterior of the latent count was first derived by us but was not given an analytical form [20]. Below we explicitly derive the PMF of , shown in (3), and find that it exactly represents the distribution of the random number of tables occupied by customers in a Chinese restaurant process with concentration parameter [21, 22, 7]. We denote as a Chinese restaurant table (CRT) count random variable with such a PMF and as proved in the supplementary material, we can sample it as .
Both the gammaPoisson and compoundPoisson augmentations of the NB distribution and Lemmas 2.1 and 2.2 are key ingredients of this paper. We will show that these augmentandconcur methods not only unite count and mixture modeling and provide efficient inference, but also, as shown in Section 3, let us examine the posteriors to understand fundamental properties of the NB processes, clearly revealing connections to previous nonparametric Bayesian mixture models.
Lemma 2.1.
Denote as Stirling numbers of the first kind [17]. Augment under the compound Poisson representation as , then the conditional posterior of has PMF
(3) 
Proof.
Denote , . Since is the summation of iid random variables, the PGF of becomes . Using the property that [17], we have Thus for , we have Denote , we have . ∎
Lemma 2.2.
Let , denote , then can also be generated from a compound distribution as
(4) 
Proof.
Augmenting leads to . Marginalizing out leads to . Augmenting using its compound Poisson representation leads to (4). ∎
3 GammaNegative Binomial Process
We explore sharing the NB dispersion across groups while the probability parameters are group dependent. We define a NB process as for each and construct a gammaNB process for joint count and mixture modeling as , which can be augmented as a gammagammaPoisson process as
(5) 
In the above PP() and GaP() represent the Poisson and gamma processes, respectively, as defined in Section 1.1. Using Lemma 2.2, the gammaNB process can also be augmented as
(6)  
(7) 
These three augmentations allow us to derive a sequence of closedform update equations for inference with the gammaNB process. Using the gamma Poisson conjugacy on (5), for each , we have , thus the conditional posterior of is
(8) 
Define as a CRT process that for each . Applying Lemma 2.1 on (6) and (7), we have
(9) 
If is a continuous base measure and is finite, we have and thus
(10) 
which is equal to , the total number of used discrete atoms; if is discrete as , then if , thus . In either case, let , with the gamma Poisson conjugacy on (6) and (7), we have
(11)  
(12) 
Since the data are exchangeable within group , the predictive distribution of a point , conditioning on and , with marginalized out, can be expressed as
(13) 
3.1 Relationship with the hierarchical Dirichlet process
Using the equivalence between (1) and (2) and normalizing all the gamma processes in (5), denoting , , , and , we can reexpress (5) as
(14) 
which is an HDP [7]. Thus the normalized gammaNB process leads to an HDP, yet we cannot return from the HDP to the gammaNB process without modeling and as random variables. Theoretically, they are distinct in that the gammaNB process is a completely random measure, assigning independent random variables into any disjoint Borel sets of ; whereas the HDP is not. Practically, the gammaNB process can exploit conjugacy to achieve analytical conditional posteriors for all latent parameters. The inference of the HDP is a major challenge and it is usually solved through alternative constructions such as the Chinese restaurant franchise (CRF) and stickbreaking representations [7, 23]. In particular, without analytical conditional posteriors, the inference of concentration parameters and is nontrivial [7, 24] and they are often simply fixed [23]. Under the CRF metaphor governs the random number of tables occupied by customers in each restaurant independently; further, if the base probability measure is continuous, governs the random number of dishes selected by tables of all restaurants. One may apply the data augmentation method of [22] to sample and .However, if is discrete as , which is of practical value and becomes a continuous base measure as [11, 7, 24], then using the method of [22] to sample
is only approximately correct, which may result in a biased estimate in practice, especially if
is not large enough. By contrast, in the gammaNB process, the shared gamma process can be analytically updated with (12) and plays the role of in the HDP, which is readily available as(15) 
and as in (11), regardless of whether the base measure is continuous, the total mass has an analytical gamma posterior whose shape parameter is governed by , with if is continuous and finite and if . Equation (15) also intuitively shows how the NB probability parameters govern the variations among in the gammaNB process. In the HDP, is not explicitly modeled, and since its value becomes irrelevant when taking the normalized constructions in (14), it is usually treated as a nuisance parameter and perceived as when needed for interpretation purpose. Fixing is also considered in [12] to construct an HDP, whose grouplevel DPs are normalized from gamma processes with the scale parameters as ; it is also shown in [12] that improved performance can be obtained for topic modeling by learning the scale parameters with a log Gaussian process prior. However, no analytical conditional posteriors are provided and Gibbs sampling is not considered as a viable option [12].
3.2 Augmentandconquer inference for joint count and mixture modeling
For a finite continuous base measure, the gamma process can also be defined with its Lévy measure on a product space , expressed as [9]. Since the Poisson intensity and is finite, a draw from this process can be expressed as [9]. Here we consider a discrete base measure as , then we have , , which becomes a draw from the gamma process with a continuous base measure as . Let be observation in group , linked to a mixture component through a distribution . Denote , we can express the gammaNB process with the discrete base measure as
(16) 
where marginally we have . Using the equivalence between (1) and (2), we can equivalently express and in the above model as , where . Since the data are fully exchangeable, rather than drawing once, we may equivalently draw the index
(17) 
for each and then let . This provides further insights on how the seemingly disjoint problems of count and mixture modeling are united under the NB process framework. Following (8)(12), the block Gibbs sampling is straightforward to write as
(18) 
which has similar computational complexity as that of the direct assignment block Gibbs sampling of the CRFHDP [7, 24]. If is conjugate to the likelihood , then the posterior would be analytical. Note that when , we have .
Using (1) and (2) and normalizing the gamma distributions, (16) can be reexpressed as
(19) 
which loses the count modeling ability and becomes a finite representation of the HDP, the inference of which is not conjugate and has to be solved under alternative representations [7, 24]. This also implies that by using the Dirichlet process as the foundation, traditional mixture modeling may discard useful count information from the beginning.
4 The Negative Binomial Process Family and Related Algorithms
The gammaNB process shares the NB dispersion across groups. Since the NB distribution has two adjustable parameters, we may explore alternative ideas, with the NB probability measure shared across groups as in [6], or with both the dispersion and probability measures shared as in [5]. These constructions are distinct from both the gammaNB process and HDP in that has space dependent scales, and thus its normalization no longer follows a Dirichlet process.
It is natural to let the probability measure be drawn from a beta process [25, 26], which can be defined by its Lévy measure on a product space as A draw from the beta process with concentration parameter and base measure can be expressed as A betaNB process [5, 6] can be constructed by letting , with a random draw expressed as Under this construction, the NB probability measure is shared and the NB dispersion parameters are group dependent. As in [5], we may also consider a markedbetaNB^{1}^{1}1We may also consider a beta markedgammaNB process, whose performance is found to be very similar. process that both the NB probability and dispersion measures are shared, in which each point of the beta process is marked with an independent gamma random variable. Thus a draw from the markedbeta process becomes , and the NB process becomes Since the beta and NB processes are conjugate, the posterior of is tractable, as shown in [5, 6]. If it is believed that there are excessive number of zeros, governed by a process other than the NB process, we may introduce a zero inflated NB process as , where is drawn from the Bernoulli process [26] and is drawn from a markedbeta process, thus . This construction can be linked to the model in [27] with appropriate normalization, with advantages that there is no need to fix and the inference is fully tractable. The zero inflated construction can also be linked to models for real valued data using the Indian buffet process (IBP) or betaBernoulli process spikeandslab prior [28, 29, 30, 31].
4.1 Related Algorithms
To show how the NB processes can be diversely constructed and to make connections to previous parametric and nonparametric mixture models, we show in Table 1 a variety of NB processes, which differ on how the dispersion and probability measures are shared. For a deeper understanding on how the counts are modeled, we also show in Table 1 both the VMR and ODL implied by these settings. We consider topic modeling of a document corpus, a typical example of mixture modeling of grouped data, where each abagofwords document constitutes a group, each word is an exchangeable group member, and is simply the probability of word in topic .
We consider six differently constructed NB processes in Table 1: (i) Related to latent Dirichlet allocation (LDA) [32] and Dirichlet Poisson factor analysis (DirPFA) [5], the NBLDA is also a parametric topic model that requires tuning the number of topics. However, it uses a document dependent and to automatically learn the smoothing of the gamma distributed topic weights, and it lets to share statistical strength between documents, with closedform Gibbs sampling inference. Thus even the most basic parametric LDA topic model can be improved under the NB count modeling framework. (ii) The NBHDP model is related to the HDP [7], and since is an irrelevant parameter in the HDP due to normalization, we set it in the NBHDP as 0.5, the usually perceived value before normalization. The NBHDP model is comparable to the DILNHDP [12] that constructs the grouplevel DPs with normalized gamma processes, whose scale parameters are also set as one. (iii) The NBFTM model introduces an additionalbetaBernoulli process under the NB process framework to explicitly model zero counts. It is the same as the sparsegammagammaPFA (SPFA) in [5] and is comparable to the focused topic model (FTM) [27], which is constructed from the IBP compound DP. Nevertheless, they apply about the same likelihoods and priors for inference. The ZeroInflatedNB process improves over them by allowing to be inferred, which generally yields better data fitting. (iv) The GammaNB process explores the idea that the dispersion measure is shared across groups, and it improves over the NBHDP by allowing the learning of . It reduces to the HDP [7] by normalizing both the grouplevel and the shared gamma processes. (v) The BetaNB process explores sharing the probability measure across groups, and it improves over the beta negative binomial process (BNBP) proposed in [6], allowing inference of . (vi) The MarkedBetaNB process is comparable to the BNBP proposed in [5], with the distinction that it allows analytical update of . The constructions and inference of various NB processes and related algorithms in Table 1 all follow the formulas in (16) and (3.2), respectively, with additional details presented in the supplementary material.
Note that as shown in [5], NB process topic models can also be considered as factor analysis of the termdocument count matrix under the Poisson likelihood, with as the th factor loading that sums to one and as the factor score, which can be further linked to nonnegative matrix factorization [33] and a gamma Poisson factor model [34]. If except for proportions and , the absolute values, e.g., , and , are also of interest, then the NB process based joint count and mixture models would apparently be more appropriate than the HDP based mixture models.
Algorithms  VMR  ODL  Related Algorithms  
NBLDA  LDA [32], DirPFA [5]  
NBHDP  HDP[7], DILNHDP [12]  
NBFTM  FTM [27], SPFA [5]  
BetaNB  BNBP [5], BNBP [6]  
GammaNB  CRFHDP [7, 24]  
MarkedBetaNB  BNBP [5] 
5 Example Results
Motivated by Table 1, we consider topic modeling using a variety of NB processes, which differ on which parameters are learned and consequently how the VMR and ODL of the latent counts are modeled. We compare them with LDA [32, 35] and CRFHDP [7, 24]. For fair comparison, they are all implemented with block Gibbs sampling using a discrete base measure with atoms, and for the first fifty iterations, the GammaNB process with and is used for initialization. For LDA and NBLDA, we search for optimal performance and for the other models, we set as an upperbound. We set the parameters as , and . For LDA, we set the topic proportion Dirichlet smoothing parameter as , following the topic model toolbox provided for [35]. We consider 2500 Gibbs sampling iterations, with the last 1500 samples collected. Under the NB processes, each word would be assigned to a topic based on both and the topic weights ; each topic is drawn from a Dirichlet base measure as , where is the number of unique terms in the vocabulary and is a smoothing parameter. Let denote the location of word in the vocabulary, then we have We consider the Psychological Review^{2}^{2}2http://psiexp.ss.uci.edu/research/programsdata/toolbox.htm corpus, restricting the vocabulary to terms that occur in five or more documents. The corpus includes 1281 abstracts from 1967 to 2003, with 2,566 unique terms and 71,279 total word counts. We randomly select , , or of the words from each document to learn a document dependent probability for each term as , where is the probability of term in topic and is the total number of collected samples. We use to calculate the perword perplexity on the heldout words as in [5]. The final results are averaged from five random training/testing partitions. Note that the perplexity per test word is the fair metric to compare topic models. However, when the actual Poisson rates or distribution parameters for counts instead of the mixture proportions are of interest, it is obvious that a NB process based joint count and mixture model would be more appropriate than an HDP based mixture model.
Figure 1 compares the performance of various algorithms. The MarkedBetaNB process has the best performance, closely followed by the GammaNB process, CRFHDP and BetaNB process. With an appropriate , the parametric NBLDA may outperform the nonparametric NBHDP and NBFTM as the training data percentage increases, somewhat unexpected but very intuitive results, showing that even by learning both the NB dispersion and probability parameters and in a document dependent manner, we may get better data fitting than using nonparametric models that share the NB dispersion parameters across documents, but fix the NB probability parameters.
Figure 2 shows the learned model parameters by various algorithms under the NB process framework, revealing distinct sharing mechanisms and model properties. When is used, as in the NBLDA, different documents are weakly coupled with , and the modeling results show that a typical document in this corpus usually has a small and a large , thus a large ODL and a large VMR, indicating highly overdispersed counts on its topic usage. When is used to model the latent counts , as in the BetaNB process, the transition between active and nonactive topics is very sharp that is either close to one or close to zero. That is because controls the mean as and the VMR as on topic , thus a popular topic must also have large and thus large overdispersion measured by the VMR; since the counts are usually overdispersed, particularly true in this corpus, a middle range indicating an appreciable mean and small overdispersion is not favored by the model and thus is rarely observed. When is used, as in the GammaNB process, the transition is much smoother that gradually decreases. The reason is that controls the mean as and the ODL on topic , thus popular topics must also have large and thus small overdispersion measured by the ODL, and unpopular topics are modeled with small and thus large overdispersion, allowing rarely and lightly used topics. Therefore, we can expect that would allow more topics than , as confirmed in Figure 1 (a) that the GammaNB process learns 177 active topics, significantly more than the 107 ones of the BetaNB process. With these analysis, we can conclude that the mean and the amount of overdispersion (measure by the VMR or ODL) for the usage of topic is positively correlated under and negatively correlated under .
When is used, as in the MarkedBetaNB process, more diverse combinations of mean and overdispersion would be allowed as both and are now responsible for the mean . For example, there could be not only large mean and small overdispersion (large and small ), but also large mean and large overdispersion (small and large ). Thus may combine the advantages of using only or to model topic , as confirmed by the superior performance of the MarkedBetaNB over the BetaNB and GammaNB processes. When is used, as in the NBFTM model, our results show that we usually have a small and a large , indicating topic is sparsely used across the documents but once it is used, the amount of variation on usage is small. This modeling properties might be helpful when there are excessive number of zeros which might not be well modeled by the NB process alone. In our experiments, we find the more direct approaches of using or generally yield better results, but this might not be the case when excessive number of zeros are better explained with the underlying betaBernoulli or IBP processes, e.g., when the training words are scarce.
It is also interesting to compare the GammaNB and NBHDP. From a mixturemodeling viewpoint, fixing is natural as
becomes irrelevant after normalization. However, from a count modeling viewpoint, this would make a restrictive assumption that each count vector
has the same VMR of 2, and the experimental results in Figure 1 confirm the importance of learning together with . It is also interesting to examine (15), which can be viewed as the concentration parameter in the HDP, allowing the adjustment of would allow a more flexible model assumption on the amount of variations between the topic proportions, and thus potentially better data fitting.6 Conclusions
We propose a variety of negative binomial (NB) processes to jointly model counts across groups, which can be naturally applied for mixture modeling of grouped data. The proposed NB processes are completely random measures that they assign independent random variables to disjoint Borel sets of the measure space, as opposed to the hierarchical Dirichlet process (HDP) whose measures on disjoint Borel sets are negatively correlated. We discover augmentandconquer inference methods that by “augmenting” a NB process into both the gammaPoisson and compound Poisson representations, we are able to “conquer” the unification of count and mixture modeling, the analysis of fundamental model properties and the derivation of efficient Gibbs sampling inference. We demonstrate that the gammaNB process, which shares the NB dispersion measure across groups, can be normalized to produce the HDP and we show in detail its theoretical, structural and computational advantages over the HDP. We examine the distinct sharing mechanisms and model properties of various NB processes, with connections to existing algorithms, with experimental results on topic modeling showing the importance of modeling both the NB dispersion and probability parameters.
Acknowledgments
The research reported here was supported by ARO, DOE, NGA, and ONR, and by DARPA under the MSEE and HIST programs.
References
 [1] J. F. C. Kingman. Poisson Processes. Oxford University Press, 1993.
 [2] M. K. Titsias. The infinite gammaPoisson feature model. In NIPS, 2008.

[3]
R. J. Thibaux.
Nonparametric Bayesian Models for Machine Learning
. PhD thesis, UC Berkeley, 2008.  [4] K. T. Miller. Bayesian Nonparametric Latent Feature Models. PhD thesis, UC Berkeley, 2011.
 [5] M. Zhou, L. Hannah, D. Dunson, and L. Carin. Betanegative binomial process and Poisson factor analysis. In AISTATS, 2012.
 [6] T. Broderick, L. Mackey, J. Paisley, and M. I. Jordan. Combinatorial clustering and the beta negative binomial process. arXiv:1111.1802v3, 2012.
 [7] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. JASA, 2006.
 [8] M. I. Jordan. Hierarchical models, nested models and completely random measures. 2010.
 [9] R. L. Wolpert, M. A. Clyde, and C. Tu. Stochastic expansions using continuous dictionaries: Lévy Adaptive Regression Kernels. Annals of Statistics, 2011.
 [10] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973.
 [11] H. Ishwaran and M. Zarepour. Exact and approximate sumrepresentations for the Dirichlet process. Can. J. Statist., 2002.

[12]
J. Paisley, C. Wang, and D. M. Blei.
The discrete infinite logistic normal distribution.
Bayesian Analysis, 2012. 
[13]
C. I. Bliss and R. A. Fisher.
Fitting the negative binomial distribution to biological data.
Biometrics, 1953.  [14] A. C. Cameron and P. K. Trivedi. Regression Analysis of Count Data. Cambridge, UK, 1998.
 [15] R. Winkelmann. Econometric Analysis of Count Data. Springer, Berlin, 5th edition, 2008.
 [16] M. H. Quenouille. A relation between the logarithmic, Poisson, and negative binomial series. Biometrics, 1949.
 [17] N. L. Johnson, A. W. Kemp, and S. Kotz. Univariate Discrete Distributions. John Wiley & Sons, 2005.
 [18] S. J. Clark and J. N. Perry. Estimation of the negative binomial parameter by maximum quasilikelihood. Biometrics, 1989.
 [19] M. D. Robinson and G. K. Smyth. Smallsample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 2008.
 [20] M. Zhou, L. Li, D. Dunson, and L. Carin. Lognormal and gamma mixed negative binomial regression. In ICML, 2012.
 [21] C. E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 1974.
 [22] M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. JASA, 1995.
 [23] C. Wang, J. Paisley, and D. M. Blei. Online variational inference for the hierarchical Dirichlet process. In AISTATS, 2011.
 [24] E. Fox, E. Sudderth, M. Jordan, and A. Willsky. Developing a tempered HDPHMM for systems with state persistence. MIT LIDS, TR #2777, 2007.
 [25] N. L. Hjort. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist., 1990.
 [26] R. Thibaux and M. I. Jordan. Hierarchical beta processes and the Indian buffet process. In AISTATS, 2007.
 [27] S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. The IBP compound Dirichlet process and its application to focused topic modeling. In ICML, 2010.
 [28] T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In NIPS, 2005.
 [29] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D. Dunson, G. Sapiro, and L. Carin. Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. IEEE TIP, 2012.

[30]
M. Zhou, H. Yang, G. Sapiro, D. Dunson, and L. Carin.
Dependent hierarchical beta process for image interpolation and denoising.
In AISTATS, 2011.  [31] L. Li, M. Zhou, G. Sapiro, and L. Carin. On the integration of topic modeling and dictionary learning. In ICML, 2011.
 [32] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 2003.
 [33] D. D. Lee and H. S. Seung. Algorithms for nonnegative matrix factorization. In NIPS, 2000.
 [34] J. Canny. Gap: a factor model for discrete data. In SIGIR, 2004.
 [35] T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 2004.
Appendix A Generating a CRT random variable
Lemma A.1.
A CRT random variable can be generated with the summation of independent Bernoulli random variables as
(20) 
Proof.
Since is the summation of independent Bernoulli random variables, its PGF becomes
Thus we have ∎
Appendix B DirPFA and LDA
The Dirichlet Poisson factor analysis (DirPFA) model [5] is constructed as
Comments
There are no comments yet.