1 Introduction
Spatial regression models have been universally used in many different fields such as environmental science (Hu and Bradley, 2018), biological science (Zhang and Lawson, 2011), and econometrics (Brunsdon et al., 1996) to explore the relation between a response variable and a set of predictors over a region. One of the most important tasks for a spatial regression model is to capture the spatial dependent structure for the response variable. Cressie (1992) first proposed a regression model with Gaussian spatial random effects. Diggle et al. (1998) extended linear mixed effect model to generalized linear mixed regression. In both models, spatial random effects are accounted for by the intercepts, and the regression coefficients are assumed to be constant over space. Brunsdon et al. (1996) proposed a geographically weighted regression (GWR) to capture spatially varying pattern in regression coefficients. The idea of GWR has been subsequently extended to the Cox model framework by Xue et al. (2019). GWR does not give any distribution assumption for spatially varying coefficients. Furthermore, Gelfand et al. (2003)
incorporated spatial Gaussian process to linear regression model to build a spatially varying coefficients regression model. The literatures mentioned above all assume that each location has its own set of regression parameters, which sometimes leads to overfitting. Heterogeneity effects over the space of interest has not been taken into account. Detection of clustered covariate effects has significant merits in many different fields, such as environmental science, spatial econometrics, and disease mapping. For example, a country’s demographical information often has different economic statuses and development patterns. More advanced regions and less developed regions could be put into separate clusters and analyzed. Such heterogeneity pattern is of great interest to regional economics researchers.
Spatial cluster detection method such as the scan statistic based method (Kulldorff and Nagarwalla, 1995; Jung et al., 2007) is a remedy for spatial heterogeneity detection. A scan statistic is constructed via a likelihood ratio statistic to test the potential clusters. Another important approach for spatial heterogeneity detection is to use the Bayesian framework to pursue spatial cluster (Carlin et al., 2014; Li et al., 2010). These two important approaches mainly focus on estimating cluster configurations of spatial response. Recently, methods for cluster detection of spatial regression coefficients have been proposed to detect the homogeneity of the covariates effects among sub areas. Lee et al. (2017) detected the spatial cluster under the GWR framework based on a hypothesis testing procedure. Lee et al. (2019) extended their framework to a spatial mixed effects model. The clustering configuration in both works is detected via a series of hypothesis testing. The slopes and intercepts estimation are performed in two consecutive stages. Li and Sang (2019) incorporated spatial neighborhood information based on minimum spanning trees in a penalized approach to detect spatially clustered coefficients.
There are several major challenges in developing clustering algorithms for Poisson spatially clustered regression coefficients. First, imposing certain spatial contiguous constraints on the clustering configuration to facilitate interpretations in the spatially clustered coefficients regression. Furthermore, in many regional science applications, spatially contiguous constraints should not dominate global clustering configuration. In other words, the clustering results should contain both spatially contiguous pattern and spatially disconnected pattern. The aforementioned methods (Lee et al., 2017, 2019; Li and Sang, 2019) guarantee spatial contiguity, but fail to obtain globally discontiguous clusters which allow two clusters with long geographical distance belonging to the same cluster.
Second, an important consideration in clustering algorithm is to estimate the number of clusters. The frequentist approach requires twostage procedures which assumes the knowledge of the number of clusters or estimate it using either of BIC or crossvalidation. Bayesian inference provides a probabilistic framework for simultaneous inference of the number of clusters and the clustering configurations. Nonparametric Bayesian approaches, such as the Dirichlet process mixture (DPM) model
(Ferguson, 1973), offer choices to estimate the number of clusters and the clustering configurations simultaneously. Ma et al. (2019) proposed a Bayesian clustered regression for spatially dependent data based on Dirichlet process mixture model. But their methods do not contain consistent estimator of the number of the clusters due to inconsistency of the Dirichlet process mixture model (Miller and Harrison, 2013). To solve this over clustering problem of DPM, Miller and Harrison (2018) proposed mixture of finite mixtures (MFM) model which puts a prior on the number of clusters. Xie and Xu (2019)developed a general class of Bayesian repulsive Gaussian mixture models to well separate the different cluster.
Lu et al. (2018) proposed a powered Chinese restaurant process (pCRP), which encourages the data points to go to existing clusters. Most existing literatures give the remedy for over clustering problem. But they do not consider other information such as geographical information which will help the clustering performance. From Tobler’s first law of geography (Tobler, 1970), “Everything is related to everything else, but near things are more related than distant things,” it is reasonable to consider the similar pattern of the data due to similar environmental circumstances.Finally, a significant concern of Bayesian nonparametric algorithm is computational efficiency. A reversible jump Markov chain Monte Carlo (MCMC) algorithm (Green, 1995) is proposed to search variable dimensional parameter space based on assigning a prior on the number of clusters. Such algorithms are not efficient to implement and automate. Collapsed Gibbs sampling algorithm (Neal, 2000) requires MetropolisHastings algorithm (Chib and Greenberg, 1995) or incorporating auxiliary variables (Neal, 2000) when the base distribution is not conjugate. Therefore, Gaussian prior for regression coefficients is not an efficient choice because of its nonconjugacy.
To address these challenges, in this paper, we develop a Markov random field (MRF) constraint MFM (MRFMFM) model to capture the spatial homogeneity in regression coefficients. Our main contributions in this paper are in three folds. First, we develop a new nonparametric Bayesian methods for spatially clustered coefficients Poisson regression which leveraging geographical information based on Markov random field constraint MFM model. To the best of our knowledge, this is the first time to leverage geographical information in Bayesian modelbased clustering algorithm for Poisson regression. MRFMFM is able to capture both locally spatially contiguous clusters and globally discontiguous cluster simultaneously. The Pólya urn scheme, exchangeability, and estimation consistency of MRFMFM are fully discussed in this article. Second, a multivariate log gamma (MLG) process (Bradley et al., 2018) is chosen as the base distribution of MRFMFM for Poisson regression model. Hence, an efficient MCMC algorithm is developed for our model without MetropolisHastings algorithm (Chib and Greenberg, 1995) or incorporating auxiliary variables (Neal, 2000). We firstly introduce MLG process in nonparametric Bayesian algorithm. Finally, our proposed Bayesian approach reveals interesting features of the countylevel premature data in the state of Georgia, which provides important information to study homogeneity effects of public health.
The remainder of the paper is organized as follows. In Section 2, we develop a spatial clustered coefficients Poisson regression model with MRFMFMMLG process and examine the properties of our proposed methods. In Section 3, an MCMC sampling algorithm is developed and post MCMC inference is discussed. Extensive simulation studies are carried out in Section 4. For illustration, our proposed methodology is applied to premature deaths data in the state of Georgia in Section 5. Finally, we conclude this paper with a brief discussion in Section 6. For ease of exposition, proofs and additional technical results are given in the supplementary materials.
2 Methodology
In this section, we will first introduce the clustered coefficients regression for count value data based on Poisson regression. In addition, MRFMFM are proposed for spatial clustered coefficients regression model in order to leverage geographical information. The theoretical properties of MRFMFM are fully examined. The Bayesian hierarchical model in conjunction with MLG is proposed in the end of this section.
2.1 Spatial Poisson Regression
Consider a Poisson regression model with spatially varying coefficients as follows
(1) 
where is a dimensional regression coefficients at location . From Gelfand et al. (2003), a Gaussian process prior can be assigned on regression coefficients to obtain spatially varying pattern. Compared with spatially varying pattern, heterogeneity pattern of covariate effects over subareas is also universally discussed in many different fields, such as real estate applications, spatial econometrics, and environmental science. The model in (1) can be rewritten as
(2) 
where . Through the popular Chinese restaurant process, are defined through the following conditional distribution (Pólya urn scheme, Blackwell et al., 1973):
(3) 
where is the size of cluster .
While CRP has a very attractive feature of simultaneous estimation on the number of clusters and the cluster configuration, a striking consequence of this has been recently discovered (Miller and Harrison, 2018) where it is shown that the CRP produces extraneous clusters in the posterior leading to inconsistent estimation of the number of clusters even when the sample size grows to infinity. A modification of the CRP called Mixture of finite mixtures (MFM) model is proposed to circumvent this issue (Miller and Harrison, 2018):
(4) 
where
is a proper probability mass function on
and is a pointmass at . Compared to the CRP, the introduction of new tables is slowed down by the factor , which allows a modelbased pruning of the tiny extraneous clusters. is a coefficient that need to be precomputed:where , and . under (4) can be defined in a Pólya urn scheme similar to CRP:
(5) 
Where is the number of existing clusters.
2.2 Markov Random Field Constrained MFM
For the spatial data, leveraging geographical information is an important task. We apply the pairwise MRF in the level of coefficients to bring spatial interactions. Based on MRF, we can leverage information from nearby locations for Bayesian nonparametric clustering algorithm. This model can be more adapted to the count data since directly applying Poisson MRF can only incorporate negative correlations among response variables (Yang et al., 2012). Consider an undirected random graph , where is the vertex set while is the set of graph edges, with weights on the corresponding edges. Each vertex
is associated with a random variable
for . The pairwise MRF model is defined as(6)  
where is the normalizing constant. For example, for Gaussian MRF, and ; while for binary MRF, which is the celebrated Ising model, and . We can then decompose the pairwise MRF into vertexwise term and interaction term , then
(7)  
where and is the set of all cliques for the random graph . For the spatial clustered coefficient regression introduced in the next subsection, we study the component defined in equation (7) with a MFM prior.
2.3 Theoretical Properties
Our first theorem provides the generalized urnmodel induced by MRFMFM, thus a collapsed Gibbs sampler can be applied.
Theorem 2.1.
Remark 2.1.
This theorem shows how the MRF constraints directly affect the Pólya urn sampling scheme compared with MFM. For example, when , with . Then the spatial smoothness can be controlled by the magnitude of .
In nonparametric Bayesian Statistics such as Dirichlet Process, the
exchangeable random variable is an important concept. When are infinite exchangeable, for any finite ,(10) 
where is the set of all permutations for the index set . If are i.i.d. sampled from the distribution , then they are exchangeable, but the converse is not true. Examples for exchangeable random variables that are not independent are Pólya’s Urn (Blackwell et al., 1973) and Gaussian random variables that has the same marginal distribution and the same correlation between any two variables. The famous de Finetti’s Theorem (De Finetti, 1929) reveals the intrinsic characterization of exchangeable random variables: there is a latent random variable , such that sampled from is equivalent to the following sampling procedure:
(11) 
where only depends on . In other words, exchangeable random variables are conditionally i.i.d. given their latent label. There are two levels of labeling during the sampling procedure, the first labels are created by the Dirichlet prior, while the second one comes from the fact that are sampled from exchangeable but could be dependent distribution, thus the dependence of can be introduced.
Our next theorem establishes the key point on why the MFM can be extended to MRFMFM without loss of consistency when , are not i.i.d but only exchangeable. By marginalizing the coefficients , the distribution of can be totally decided by the cluster configuration .
Theorem 2.2.
Assume are exchangeable random variables. Given the cluster configuration , the data and the number of components are independent.
Remark 2.2.
Compared with MFM, we generalize the same result from i.i.d. to the exchangeable cases. Since the dependence between is totally decided by , when are exchangeable, all the play the same role in generating . Then are marginalized, the cluster configuration covers the same information with the number of components and the latent labels .
Consider the pairwise interactions, we model the conditional cost functions as
(12) 
where is the smoothness parameter, denotes the set of the neighbors of observation . When , the MRFMFM reduces to MFM (Miller and Harrison, 2018). Note that since is continuous, with probability , are exchangeable random variables.
Our Final theorem shows that the estimation of the number of components is consistent asymptotically under MRFMFM.
Theorem 2.3.
Remark 2.3.
This theorem demonstrates the efficacy of our proposed MRFMFM compared with Dirichlet process mixture model with Markov random fields (DPMRF)(Orbanz and Buhmann, 2008). For DPMRF, there could be a lot of small spurious clusters due to inconsistency of the Dirichlet process mixture model (Miller and Harrison, 2013). However, since a prior distribution for the number of components is specified for our method, the posterior number of components is properly regularized, and can achieve consistency when the model is wellspecified.
2.4 Spatial Clustered Coefficient Regression for Count Value Data
For the MRFMFM, the natural choice of the base distribution of
is the multivariate normal distribution. However, the multivariate normal distribution is not the conjugate prior for Poisson regression. Using multivariate normal distribution as the base distribution requires MetropolisHastings updates or auxiliary parameters
(Neal, 2000) in Gibbs sampling algorithm for MRFMFM. Bradley et al. (2018)developed a multivariate loggamma distribution (MLG), which is conjugate with the Poisson distribution. Based on MLG, an MRFMFMMLG is proposed for spatial clustered coefficients for Poisson regression.
We now review the multivariate loggamma distribution from Bradley et al. (2018). We define the
dimensional random vector
, which consists of mutually independent loggamma random variables with shape and scale parameters organized into the dimensional vectors , and , respectively. Then define the dimensional random vector as follows(14) 
where the matrix and . Bradley et al. (2018) called as the multivariate loggamma random vector. The random vector
has the following probability density function:
(15) 
where “det” represents the determinant function. As a shorthand we use the notation, “” for the probability density function in (15). We give a MLG prior with parameters , then the posterior distribution of is given as:
(16) 
where , , ,, and . Compared with Gaussian prior of , the MLG prior of does not require MetropolisHastings algorithm (Chib and Greenberg, 1995) or Slice sampler (Neal et al., 2003) to obtain the posterior distribution of . Another good property for MLG prior is the multivariate loggamma distribution has asymptotic relation with the multivariate normal distribution. Bradley et al. (2018) show that if , and then converges in distribution to a multivariate normal random vector with mean and covariance matrix as . In practice, is sufficiently large for this normal approximation.
Next, we look at the Theorem 2 from Bradley et al. (2018), let , and partition this ndimensional random vector so that , where is gdimensional and is (ng)dimensional. Additionally, consider the class of MLG random vectors that satisfy the following:
(17) 
where in general is a matrix of zeros; is a identity matrix;
(18) 
is the QR decomposition of the
matrix H; the matrix satisfies , the matrix satisfies , and ; is a upper triangular matrix; and . Hence, the marginal distribution of the gdimensional random vector is given by(19) 
where the normalizing constant is
(20) 
so the is equal to the pdf in equation (19). The posterior distribution of with MLG prior under Poisson regression framework is the .
Adapting the MRFMFM in conjunction with MLG to the spatial Poisson regression setting, we focus on the clustering of spatiallyvarying coefficients , where is the dimensional coefficient vector for location . In our setting, we assume that the parameter vectors can be clustered into groups, i.e., , by letting the multivariate loggamma as the base measure for , the hierarchal model can be written as
(21) 
The choice of the hyperparameters and probability distribution of the number of clusters will be disccused in Section
3.3 Bayesian Inference
MCMC is used to draw samples from the posterior distributions of the model parameters. In this section we present the sampling scheme, the posterior inference of cluster belongings, and measures to evaluate the estimation performance and clustering accuracy.
3.1 The MCMC Sampling Schemes
Our goal is to sample from the posterior distribution of the unknown parameters , and . We choose and in (21), , and for all the simulations and real data analysis, where is a ndimensional vector with 0, is a ndimensional vector with 1, and is a ndimensional identity matrix. The sampler is presented in Algorithm (1), which efficiently cycles through the full conditional distributions of for and , where . The details of the full conditional distributions are in the supplementary materials. The marginalization over can avoid complicated reversible jump MCMC algorithms or even allocation samplers.
Parameter  Form 

3.2 Inference of MCMC results
The posterior mean or median of clustering configurations is not suitable. Dahl’s method (Dahl, 2006) provides a remedy for posterior inference of the clustering configurations . The inference of Dahl’s method is based on the posterior samples of membership matrices, , where is the total number of Monte Carlo iterations. The definition of membership matrix is
(22) 
with for all . indicates observations and are in the same cluster in the th posterior sample after burnin iterations. Based on the posterior samples of membership matrices , a Euclidean mean for membership matrices is calculated by
The posterior iteration with the least squares distance to is obtained by
(23) 
The estimated parameters, together with the cluster assignments , are obtained from th post burnin iteration.
The second measure used in our performance evaluation is the Rand index Rand (1971), which can be used to measure the accuracy of clustering. The rand index is obtained by using Rpackage fossil (Vavrek, 2011). The RI ranges from 0 to 1 with a higher value indicating better agreement between the two partitions. In particular, indicates that two cluster assignments are identical in terms of modulo labeling of the nodes.
In the proposed spatial clustered coefficients model, the tuning parameter in Markov random fields needs to be selected. The Logarithm of the PseudoMarginal Likelihood (LPML) (Ibrahim et al., 2013) is applied for tuning parameter selection. The LPML can be obtained through the Conditional Predictive Ordinate (CPO) values. Let denote the observations with the th subject response deleted. The CPO for the th subject is defined as:
(24) 
where
and is the normalizing constant. Following (Gelfand and Dey, 1994), a Monte Carlo estimate of the CPO can be obtained as:
(25) 
where is th posterior samples of . An estimate of the LPML can subsequently be calculated as:
(26) 
A model with a larger LPML value is more preferred.
4 Simulation
In this section, we firstly detail the simulation settings in Section 4.1, as well as how performance of the proposed method is evaluated. In addition, extensive simulation results are shown in Section 4.2.
4.1 Simulation Setting and Evaluation Metrics
We use the spatial structure of the state of Georgia, which contains 159 counties in Georgia. We consider two different spatial cluster designs. The first design is a threecluster setting as illustrated in Figure 1, where geographical proximity becomes the factor that determines true cluster configuration. Another design is the twocluster partition in Figure 2. The first cluster consists of two disjoint parts in the top and bottom region of Georgia state. The counties in the middle are in second cluster. It is designed to mimic a common premature deaths pattern that geographically distant regions can share similar distribution pattern, and geographical proximity is not the sole determining factor for homogeneity in premature deaths distribution.
For each design, we have two different scenarios. The first scenario for each design is without spatial random effects. The second scenario for each design is with spatial random effects. We assume the spatial random effects follow a multivariate Normal distribution with mean zero and exponential covariogram. In total, we have four different scenarios in our simulation study. The details of data generation are given as

, where , , , , .

, where , , , , . , where , we set and .

, where , , , .

, where , , , . , where , we set and .
The final clustering performance is evaluated based on the estimated number of clusters and Rand Index (RI). The RI is calculated for each iteration for each replicate, and we calculate an average RI over all replicates. Also, the final number of clusters estimated is computed for each replicate. A total of 100 sets of data are generated under different scenarios. We run 5,000 iterations MCMC chain and burnin the first 1,000 for each replicates.
4.2 Simulation Results
We generate sets of data from the model in four different scenarios. To each replicated data set, we fit the MFMMLG and MRFMFMMLG with different values of the smoothness parameter and select the best smoothness parameter for each replicate based on LPML. In addition, we present the average value of RI for each case over 100 replicates to compare the clustering performance with MFMMLG model based on RI. We also compare model fitness of our proposed model with MFMMLG model in terms of LPML. We see that our model outperforms the MFMMLG model in term of LPML on four different scenarios. We also evaluate the performance in terms of estimation results of the number of clusters. We report the proportion of times the true cluster recovered among the 100 replicates. For two clusters without spatial random effect design, we find out our model can recover the true number of clusters 100% of the replicates. And the MFMMLG model can recover 99% of the replicates. In this case, both models perform well in number of clusters estimation. But our model outperforms the MFMMLG model in terms of LPML value. For the two clusters design with spatial random effects design, we see that our model can recover the true number of clusters 99% of the replicates, but the MFMMLG model only recover 45% of the replicates. For three clusters without spatial random effect design, we find out our model can recover the true number of clusters 98% of the replicates. On the other hand, MFMMLG model recover 89% of the replicates. In this case, although the rand index of our model is slight worse than MFMMLG but we think our method overcome the overcluster issue. For three clusters with spatial random effect design, we find out our model can recover the true number of clusters 53% of the replicates. However the MFMMLG model only recover 3% of the replicates.
All results including comparison of LPML, rand index, and estimating the number of clusters for each design are provided in Figure 3, 4, 5, and 6.
From the results shown in Figures above, our method can successfully estimate the true number of clusters. Otherwise, the MFM will overestimate the number of clusters when there exist spatial random effects.
Furthermore, we show the average mean square error (AMSE) of our proposed method and MFMMLG in Table 1.
Method  Parameter  No Spatial Random effect  With Spatial Random effect  

Two Clusters  Three Clusters  Two Clusters  Three Clusters  
MRFMFMMLG  0.0848  0.2508  0.0966  0.3918  
0.0839  0.2435  0.0967  0.3814  
MFMMLG  0.1170  0.2841  0.3675  0.6996  
0.1164  0.2781  0.3668  0.6898 
From the results in Table 1, we see that in four different scenarios, our proposed methods outperform MFM in terms of coefficient estimation. For the data generated from the model with spatial random effect, the improvement of our proposed methods is more obvious.
5 Illustration: Premature Deaths in Georgia
In this section, we will analyze premature deaths in state of Georgia as the illustration of our proposed methods.
5.1 Data Description
We consider analyzing influential factors for the number of premature deaths in state of Georgia using the proposed methods. The dataset is available at www.countyhealthrankings.org with 159 observations corresponding to the 159 counties in state of Georgia of the year 2015. For each county, the dependent variable is the number of the premature deaths of each county. The premature death is the death that occurs before the average age of death in a certain population. The Figure presents the number of the premature death of each county. In the United States, the average age of death is about 75 years. The dependent variable is the number of life lost per 100,000 population before age 75 in each county. The two covariates we consider in this paper is PM 2.5 () and food environment index (). PM 2.5 is the average daily density of fine particulate matter in micrograms per cubic meter. The food environment index is the index of factors that contribute to a healthy food environment, 0 (worst) to 10 (best). Figure 8 presents a visualization of the two covariates on the Georgia map.
5.2 Analysis
In this section, we apply the proposed methodology to present a detailed analysis of premature death data in the state of Georgia. First, we rescale the data to a decent range, because the variance in the Poisson distribution is equal to the mean. The count of the premature death is scaled to in hundreds. We run 15,000 MCMC iterations and burnin the first 5,000 iterations. The smoothing parameter is tuned over the grid
. All other parameters are set to be consistent with the simulation study. The final clustering result is set to be the one corresponding to the largest LPML (Ibrahim et al., 2013), hence we choose the smoothing parameter equal 0.3. The 159 counties turned out to be put into six clusters as illustrated in Figure 9. The number of the counties in each cluster are 148, 2, 1, 5, 2 and 1, respectively. We also compare our model with best LPML to MFM, Latent Gaussian Process (LGP) (Hadfield et al., 2010), conditional autoregressive (CAR) (Lee, 2013) models. The LPML values of all candidate models are shown in Table 2. Based on the Table 2, our proposed model outperforms other models. From the estimation results shown in 9, we see that for all the counties higher PM 2.5 will cause higher premature deaths. For most counties, better food environment will cause less premature death.Method  LPML 

MRFMFM ()  2221.45 
MFM  3614.38 
LGP  2461.31 
CAR  5015.93 
Cluster  

1  0.422  0.032  0.174 
2  1.445  1.018  1.792 
3  1.776  0.158  0.645 
4  0.928  0.937  1.496 
5  1.546  0.179  0.410 
6  0.393  0.221  0.170 
6 Discussion
In this paper, we have proposed a Bayesian spatial clustered coefficients regression for count value data. Our proposed MRFMFMMLG has two merits. First, MRFMFMMLG leverages the geographical information to detect the spatial homogeneity pattern of regression coefficients. Second, an efficient MCMC algorithm is developed for Poisson regression with spatial clustered coefficients. The usage of proposed method is illustrated in simulation studies, where it shows accurate estimation performance, and also outperforms other popular models. For premature death data, our method dominates the other benchmark methods in terms of LPML.
In addition, three topics beyond the scope of this paper are worth further investigation. First, in our MCMC algorithm, one numerical integration is required for Gibbs sampling. Proposing an efficient calculation algorithm of the numerical integration will broad the applications of our proposed methods. Furthermore, different clusters may have different sparsity patterns of the covariates. Incorporating spatial clustered sparsity structure of regression coefficients into the model will enable selection and identification of most important covariates. One parameter of Markov random field is require to be selected. Proposing a hierarchical model for tuning parameter is also an interesting future work.
Acknowledgement
Dr. Hu’s Research was supported by the Dean’s office of the College of Liberal Arts and Sciences at University of Connecticut.
Appendix A Proof of the Theorem 2.1
By Bayes’ theorem, we have:
(27) 
As shown in Miller and Harrison (2018), by conditioning on the different possible situations of the cluster for the new observations, we have
(28) 
Let . When considering the full conditional distribution
(29) 
where only depends on for . Note that
(30) 
where specifies the cluster that belongs to. With the property in equation (30) and the assumption that is continuous, almost surely for . Then given the density for and any subset for the domain of ,
(31)  
where the constant only depends on the constant part of . Hence, the full conditional of can be derived
(32) 
Appendix B Proof of the Theorem 2.2
We show that the interaction term does not change the conditional independence among data and the number of components given the cluster configuration when all are exchangeable.
Let , based on the definition of and , we have
(33) 
where , are the distinct values of decided by and , and . Given , the transformation from variable to , is totally decided, so when marginalizing the unused , given any function , we have the identity
(34) 
Note that are exchangeable based on assumption, then the density after marginalizing can be seen
(35)  
where is a function only depends on and . In addition, directly follows from de Finetti’s Theorem; for , we apply the Fubini’s theorem; is because the expression depends only on through since there is no correspondence between and after integrating out
Comments
There are no comments yet.