Bayesian Clustered Coefficients Regression with Auxiliary Covariates Assistant Random Effects

04/25/2020 ∙ by Guanyu Hu, et al. ∙ Microsoft 0

In regional economics research, a problem of interest is to detect similarities between regions, and estimate their shared coefficients in economics models. In this article, we propose a mixture of finite mixtures (MFM) clustered regression model with auxiliary covariates that account for similarities in demographic or economic characteristics over a spatial domain. Our Bayesian construction provides both inference for number of clusters and clustering configurations, and estimation for parameters for each cluster. Empirical performance of the proposed model is illustrated through simulation experiments, and further applied to a study of influential factors for monthly housing cost in Georgia.



There are no comments yet.


page 16

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Analysis of spatial data referenced over different locations has received widespread attention in many fields such as environmental science (Hu and Bradley, 2018; Yang et al., 2019), social science (Bradley et al., 2018), and biostatistics (Xu et al., 2019). There are two major approaches to analyzing spatial data. The first approach, where spatial variations in the outcome are accounted for by an additive spatial random effect term at each location, has been studied both for the linear model (Cressie, 1992) and generalized linear model (Diggle et al., 1998) settings. The second approach, a regression model with spatially varying coefficients, is developed to capture spatial variations within the covariate effects themselves. For the spatially varying coefficients model, there are two major approaches for estimation of the regression coefficients. One is geographically weighted regression (GWR; Brunsdon et al., 1996)

, the basic idea of which is to assign different weights to the observations based on a certain measure of distance between them and the target location. This work also has various extensions in generalized linear regression

(Nakaya et al., 2005) and analysis of survival data (Hu and Huffer, 2020; Xue et al., 2019). It has been further extended to multiscale models that allow kernels of different variables to be on different scales, which greatly enhances its model flexibility (Fotheringham et al., 2017). Another major approach is to give a Gaussian process prior to the spatially varying coefficients (Gelfand et al., 2003), which provides a natural and flexible way to view the coefficient surface as a realization from a spatial process. This work is universally applied into different models such as Poisson regression model (Reich et al., 2010) and survival models (Hu et al., 2020).

Most recently, heterogeneous covariate effects in many different fields, such as real estate applications, spatial econometrics, and environmental science are receiving increasing attention. For example, a country’s big cities and small cities could be put into separate clusters and analyzed, as each subgroup share more similarities in development patterns, and thus similar covariate effects in regression models can be expected. Such clustering information is of great interest to regional economics researchers. Existing frequentist approaches include those based on scan statistics (Kulldorff and Nagarwalla, 1995), two-step spatial hypothesis testing (Lee et al., 2017, 2019), and a penalized method based on minimum spanning tree (Li and Sang, 2019). Under the Bayesian paradigm, an integrated framework (Ma et al., 2019a) to detect clusters in the covariate effects as well as producing the parameter estimates for spatially dependent data are proposed, in which clustering is done via the Dirichlet process mixture model (DPM; Ferguson, 1973). The DPM, however, has turned out to produce extremely small clusters and make the estimation for number of clusters inconsistent (Miller and Harrison, 2013). The mixture of finite mixture (MFM) model proposed by Miller and Harrison (2018) provides a remedy to over-clustering problem for Bayesian nonparametric methods. Both DPM and MFM allow for uncertainty in the number of clusters instead of relying on a given number, which needs to be tuned based on certain criteria.

While spatial random effects have been used to account for the influence of geographical proximity on the similarity of outcomes for two neighboring observations, in existing approaches, the correlation between spatial random effect terms only depends on distance, and all other factors are ignored. For improvement in describing spatial correlation, some works have been done to bring in auxiliary information to help the estimation of spatial regression model (White and Ghosh, 2009; Lee et al., 2014; Gao and Bradley, 2019). These works focus on the estimation of the edge based on some covariates or spatial structure. The regression relationship between the -dimensional covariance matrix and auxiliary information has been explored by Zou et al. (2017) and Liu et al. (2020), which can help reveal the true correlation structure of spatial data. Given the importance of spatial random effects in a spatial regression model, their covariance structure need to be appropriately specified.

In this work, we propose a Bayesian clustered linear regression model with MFM, which when compared to the DPM, consistently estimates the number of clusters. The estimation of parameters remains precise. To our best knowledge, we firstly introduce the MFM in clustered coefficients regression model. In addition, information from auxiliary covariates is incorporated into a linear mixed effects Dirichlet regression to better model the correlation structure of random effects.

The remainder of the paper is organized as follows. In Section 2, we present the spatial clustered linear regression model with MFM. In Section 3 , a MCMC sampling algorithm based on nimble (de Valpine et al., 2017) and post MCMC inference are discussed. Extensive simulation studies are performed in Section 4. For illustration, our proposed methodology is applied to Georgia housing cost data in Section 5. We conclude this paper with a brief discussion in Section 6.

2 Method

2.1 Spatial Linear Regression

The basic geostatistical model (Gelfand and Schliep, 2016) for observations made over a spatial domain can be written as:


where denotes the

-dimensional vector of spatial responses observed at locations

, denotes the  matrix of covariates, is the vector of coefficients, is a vector of spatial random effects, and is the “nugget effect” with being the precision of the response (Chapter 6, Carlin et al., 2014). The above spatial regression model can be formulated alternatively as

where denotes the spatial structure with a covariance matrix 

, and N and MVN denote the univariate and multivariate normal distributions, respectively. The covariance matrix

is often defined to be , with constructed using the great circle distance (GCD) matrix among different locations via the following three popular weighting schemes:


where is a bandwidth parameter that controls the spatial correlation.

Unlike in model (1), where variations in outcomes for spatial units are accounted for by the vector of spatial random effects, the spatially varying coefficients model (Gelfand et al., 2003) attributes such effects to variations in the parameters over the spatial domain, i.e., the parameter  itself varies. Such a model is formulated as:


where is vector of covariates at location of the th subject , and  is assumed to be generated from a -variate spatial process model. With observations for , the model can be written as

where , is an block diagonal matrix whose -th diagonal entry is , , and with

being the identity matrix. To characterize the 

-variate spatial process that generates , we rewrite the model as


where is a vector, is an -dimensional matrix measuring spatial correlations between the  observed locations, is a covariance matrix associated with an observation vector at any spatial location, and  denotes the Kronecker product.

In the above construction, the correlation between coefficients for locations is solely dependent on the matrix , which is often constructed using the distance matrix. In many economics applications, two distant regions can still share similarity in terms of demographics, transportation, and other characteristics, and thus have similar economic development patterns. See Ma et al. (2019b) for an example of Chinese provinces.

2.2 Mixture of Finite Mixture Model

Based on the heterogeneity pattern, we focus on the clustering of spatially-varying coefficients. A latent clustering structure can be introduced to accommodate the spatial heterogeneity on parameters of sub-areas. Let us denote the cluster belongings of the th observation as for . The Dirichlet process (DP; Ferguson, 1973) offers a nonparametric approach for capturing heterogeneity effects in the data. The DP prior for the cluster belonging of the th observation can be written as


where ,

denotes the random probability weight,

is the Dirac  with point mass at , and

is the concentration parameter. The joint distribution for

can also be written as a conditional distribution, known as the Chinese restaurant process (CRP; Pitman, 1995; Neal, 2000). The distribution of  is marginally represented by the stick-breaking construction of Sethuraman (1991) as


Miller and Harrison (2013) showed that the posterior distribution on the number of clusters does not converge to the true number of components, and extraneous clusters are often produced by the CRP. Later, Miller and Harrison (2018) proposed a modification of the CRP called a mixture of finite mixtures (MFM) model to circumvent this issue, which can be formulated as


where  is a proper probability mass function (p.m.f) on . A default choice of is a distribution truncated to be positive (Miller and Harrison, 2018), which is assumed through the rest of the paper. Analogous to the stick-breaking representation in (6), the MFM also has a similar construction. If we choose and in (7), the mixture weights  can be constructed as:

  1. Generate ,

  2. ,

  3. , for ,

  4. .

Gibbs samplers are easily constructed in stick-breaking framework (Ishwaran and James, 2001). For ease of exposition, we refer to the formulation in (7) as .

2.3 Auxiliary Covariates Assistant Covariance Matrix

In the regression model (4) where spatial random effects are present, their covariance structure often depend on the geographical distance between pairs of locations, which in general indicates that closer locations have stronger correlation, and is in accordance with Tobler’s first law of geography that “everything is related to everything else, but near things are more related than distant things”. In most economics problems, however, spatial proximity might not be the sole indicator for similarity, as there can be geographically distant locations that share similar demographical characteristics. For example, despite the relatively short distance between New York City and Albany, the difference between their population densities is greater than that between New York City and San Francisco, which are more than 2000 miles apart (see To incorporate such similarities into the covariance structure for random effects, motivated by covariance regression (Zou et al., 2017) and Bayesian model averaging (Raftery et al., 1997), we propose the following auxiliary covariates assistant covariance (ACAC) matrix for random effects in a mixture regression model:



is a constant accounting for the overall magnitude of variance,

, and is the similarity matrix of the th auxiliary covariate. Entries of the similarity matrix have values between 0 and 1, and are usually decreasing with respect to the absolute difference between values of the auxiliary covariates. The three aforementioned weighting schemes in (2) can be used to define . For example, an exponential decay similarity matrix  can be constructed so that its -th element is


where is the range parameter for the exponential kernel, and denotes the Euclidean distance. In order to solve the identifiability issue, we set a constraint for :

Proposition 1

If we have positive definite matrices , and a sequence positive numbers which satisfy and for , then the matrix is positive definite.

Based on the constraint in (10), a Dirichlet prior is assigned to . The prior distribution of is given as


where is the Dirichlet distribution with parameter .

2.4 Bccr-Acacr

Combining the MFM and ACAC matrix, we have the BCCR-ACACR model hierarchically as follows, for :


where , with being the dimension of the covariates , is the similarity matrix of the corresponding auxiliary covariate defined in (9), and MFM the clustering method introduced in Section 2.2

. The choice of hyperparameters will be discussed in Section 


3 Bayesian Inference

3.1 Bayesian Computation

Let denote the set of unknown parameters in the proposed model, and we assume that they are independent a priori. Therefore, we assign commonly used priors for these parameters: , , , , , , and . With the prior distributions specified above, the posterior distribution of these unknown parameters based on the data is given by

The analytical form of the posterior distribution of

is unavailable. Therefore, we employ the Markov chain Monte Carlo (MCMC) sampling algorithm to sample from the posterior distribution, and then obtain the posterior estimates for the unknown parameters. Computation is facilitated by the

nimble package in R, which uses syntax similar to WinBUGS and JAGS, but generates C++ code for faster computation.

3.2 Posterior Inference and Diagnostic

In the proposed spatial model, since the covariance matrix of the spatial random effects can be constructed in different ways, including the unity, exponential, and Gaussian weighting schemes in (2), a model selection criteria needs to be used for deciding which form of the covariance matrix is the most suitable for the data. A commonly used Bayesian model selection criterion, logarithm of the pseudo-marginal likelihood (LPML; Ibrahim et al., 2013) can be used for this purpose. The LPML can be obtained through the conditional predictive ordinate (CPO) values, which are the Bayesian estimates for the probability of observing  in the future after other observations are made. Let denote the observations with the th subject response removed. The CPO for the th subject is defined as:



with being the normalizing constant. A Monte Carlo estimate of the CPO is:


where is the total number of Monte Carlo iterations. Based on , the LPML can be estimated as:


A larger LPML value indicates better model fit.

4 Simulation

4.1 Simulation Settings and Evaluation Metrics

We study the estimation performance as well as the clustering performance in this section. Two designs of true cluster configuration of Georgia counties are considered. The first case is similar to in Ma et al. (2019a), where there are, respectively, 51, 49, and 59 counties in each cluster. The second is less balanced with 26, 44, and 89 counties in each cluster. The two partition schemes used in designing the simulation study are visualized in Fig.1.

Figure 1: Visualization of the random and regional true cluster designs.

Three data generation models are considered. Under the first model, no spatial random effect is present, and data are generated from a spatially clustered coefficients setting, i.e.,


where is an covariate matrix, is a coefficient matrix such that its th column is the parameter vector corresponding to location , and is the noise term assumed to follow with . The second model includes both spatial clustered parameters and spatial random effects, and can be characterized as


where is the spatial random effect solely dependent on distance, such that

Finally in the third model, is generated in a way similar to (17):


where is the distance- and auxiliary covariates-dependent vector of spatial random effects such that

with and being the two auxiliary covariates. The two similarity matrices are constructed using (9) with and . The true parameters for the similarity matrices are set to relatively small values compared to following Zou et al. (2017). For both partition schemes shown in Fig. 1, the true parameter vector for cluster 1 is set to , for cluster 2 , and for cluster 3 (1, -2, -1). For each of the three designs and two true cluster partitions, 100 replicates of simulation are performed. In each replicate, five covariates are generated i.i.d. from for each location.

The proposed approach is evaluated in terms of both estimation and clustering. For estimation of the vector of coefficients,

, we employ the mean absolute bias (MAB), mean standard deviation (MSD), and mean of mean squared error (MMSE) for assessment:

MAB (19)
MSD (20)
MMSE (21)

where denotes the posterior estimate for the th coefficient of county in the th replicate, , is the true underlying parameter value, and denotes the indicator function.

For each replicate, we set the chain length to 25,000 with thinning interval 2. The first 9,500 of retained samples are discarded as burn-in, and we use the remaining 3,000 iterations for posterior inference. The final cluster belonging inferred for each county is taken as the first mode of the posterior samples for , To evaluate the clustering performance, i.e., whether the final clusters align well with the truth, the Rand index (RI; Rand, 1971) is utilized. An explanation of the RI has been given in Ma et al. (2019a). Computation of the RI is done using the R package fossil (Vavrek, 2011).

4.2 Simulation Results

The RIs for each design and data generation model are reported in Table 1. We also present the histogram of the final number of clusters inferred for each combination of cluster design and data generation model in Fig. 2.

It can be seen from Table 1 that for both designs, our proposed method perform the best under data generation model 1, where there is no spatial random effect, and only clustered regression coefficients are present. Under data generation models 2 and 3, however, our proposed model still maintains RI values greater than 0.7, i.e., over 70% of the time, two counties that are in the same cluster are correctly grouped together. From the histograms in Fig. 2, we see that over in no less than 60 out of the 100 for each model and design combination, the number of clusters is correctly inferenced. In particular, for model 1 and design 2, in 92 replicates, the final is 3.

Model 1 Model 2 Model 3
Design 1 0.877 0.730 0.727
Design 2 0.925 0.765 0.748
Table 1: RIs under each cluster design and data generation model.
Figure 2: Histograms of under each combination of cluster design and data generation model.

The performance measures for parameter estimation are presented in Table 2. It can be seen that with the increase in the complexity of data generation model, the MAB, MSD, and MMSE all increase. The values, however, when compared to the true underlying parameter values, remain within reasonable range, and demonstrate favorable performance of our proposed approach.

Design Model Parameter MAB MSD MMSE
Design 1 1 0.306 0.654 0.586
1 0.221 0.521 0.313
1 0.205 0.393 0.232
Design 1 2 0.839 0.909 1.607
2 0.602 0.932 1.040
2 0.613 0.606 0.714
Design 1 3 0.818 0.937 1.605
3 0.584 0.905 0.995
3 0.588 0.588 0.694
Design 2 1 0.205 0.368 0.394
1 0.181 0.461 0.245
1 0.146 0.251 0.157
Design 2 2 0.646 0.567 1.159
2 0.621 0.849 1.057
2 0.493 0.395 0.495
Design 2 3 0.647 0.606 1.152
3 0.680 0.891 1.135
3 0.492 0.427 0.504
Table 2: MAB, MSD, and MMSE for posterior parameter estimates under each cluster design and data generation model.

5 Real Data Analysis

5.1 Georgia Housing Cost Data

In this section, the proposed methodology is used to analyze the factors on monthly housing cost in Georgia. The dataset can be accessed at For each of the 159 counties, the median monthly housing cost for occupied housing units is observed. In addition, several independent variables are available: the unemployment percentage for adults between 18 and 64 years of age (), the average per individual real and personal property taxes (), the median home market value in thousand dollars (), the White race population percentage (), the median age (), and population size in thousands ().

In our analysis, the first three economy-related covariates, , and , are used in the spatial regression part, while the remaining three demographic covariates are used in constructing the covariance matrix of spatial random effects. The final model is written as, for ,

where the -th element of for is for , respectively, while for , the entry is . The priors of the unknown parameters are assigned as mentioned in Section 3. Similar to in the simulation study, after burning in the first 9,500 of 12,500 iterations, 3,000 MCMC samples are collected the parameters.

5.2 Analysis Results

We firstly apply the LPML to select the most suitable covariance structure of spatial effects  for the model. The LPML values of the proposed auxiliary covariates assistant covariance matrix, the unity scheme, the exponential scheme and the Gaussian scheme are shown in Table 3. Comparison of the LPML values leads to the conclusion that the proposed auxiliary covariates assisted covariance matrix provides the most suitable approximation for the covariance structure of the spatial random effects for this dataset, as it has the largest LPML value among the candidate covariance structures. Therefore the auxiliary covariates assisted covariance matrix is used in all subsequent analyses.

Auxiliary Covariates Assisted Unity Exponential Gaussian
LPML -189.78 -206.37 -194.03 -241.74
Table 3: LPML values for different covariance structures of spatial effects.

Three clusters of the coefficients in the spatial regression part  are identified through the MFM approach, whose posterior estimates are shown in Table 4 and the cluster belongings of the 159 counties are visualized in Fig. 3. Cluster 1 includes 10 counties and cluster 3 consists of 5 counties, while the rest 144 counties all fall within cluster 2. Taking a closer look, cluster 1 consists of Fulton, Douglas, Paulding, Henry, Newton, Barrow, Chattahoochee Lee, Effingham and Liberty, which are all relatively economically developed counties in terms of per capita income (among the top 50 according to 2015 United States Census Data and the 2006-2010 American Community Survey 5-Year Estimates) except Liberty. Cluster 3 consists of Fannin, Union, Towns, Rabun, and Clay. Both clusters include neighboring counties and non-adjacent counties, which again echos the finding in our simulation study that the proposed method takes into consideration both spatial adjacency and the inherent similarity between covariates that influence the spatial random effects.

Coefficient Cluster 1 Cluster 2 Cluster 3
(Intercept) 2.492 -4.943 -0.801
(-0.859, 5.288) (-5.532, -4.252) (-3.521, 3.521)
(Unemployment Rate) -0.280 0.578 -1.011
(-2.697, 2.996) (0.130, 0.997) (-3.381, 1.197)
(Tax) -1.391 -0.259 -0.441
(-3.507, 0.583) (-0.588, 0.047) (-1.633, 1.089)
(Home Market Value) 1.526 4.816 1.416
(-0.417, 3.539) (4.444, 5.177) (-0.821, 3.227)
Table 4: Parameter estimates and their 95% HPD intervals for the three clusters identified.
Figure 3: Clusters produced by the proposed approach.

From Table 4, for counties belonging to cluster 2, the percentage of unemployment and the median house market price can help explain the change of median monthly housing cost. However, for the other counties, neither the factors we selected has impact on the dependent variable. Also, the intercept term for cluster 2 is noticeably negative, indicating a difference in the overall level of housing cost between counties in cluster 2 and those in the other two clusters.

Table 5 shows the posterior estimates of the overall variance term of spatial random effects, , and coefficients for the similarity matrices. By comparing the posterior estimates of in the auxiliary covariates assisted covariance matrix of the spatial effects, we can see that the similarity matrices defined by the size of population and the percentage of White race population have greater impact on the covariance matrix of spatial effects.

Parameters Posterior estimate SD 95% HPD interval
0.430 0.141 (0.230, 0.787)
0.011 0.001 (0.001, 0.030)
(White percentage) 0.286 0.151 (0.062, 0.624)
(median age) 0.173 0.133 (0.014, 0.515)
(population size) 0.516 0.165 (0.208, 0.819)
(GCD) 0.014 0.012 (0.000, 0.045)
2.750 0.499 (1.080, 3.818)
Table 5: Posterior estimates and their 95% HPD intervals for other parameters.

6 Discussion

In this paper, we propose a Bayesian clustered coefficients regression model with auxiliary covariates assistant random effects. Our proposed model has two practical merits. First, our model simultaneously estimates the number of clusters and clustering configurations of regression coefficients. Second, auxiliary covariates information are included in our random effects model. The usage of proposed method is illustrated in simulation studies, where it shows accurate estimation and clustering performance. For Georgia housing cost data, our method dominates the other benchmark methods in terms of LPML.

In addition, three topics beyond the scope of this paper are worth further investigation. First, in our real data application, auxiliary covariates are selected based on their natures, which is not always available or clearly categorized in all possible applications. Proposing a quantitative criterion for auxiliary covariates determination is an interesting future work. Furthermore, different clusters may have different sparsity patterns of the covariates. Incorporating different sparsity structure of regression coefficients into the model will enable selection and identification of most important covariates. Finally, considering geographical information for clustering detection (Hu et al., 2020; Zhao et al., 2020; Geng and Hu, 2020) is also devoted to future research.


  • Bradley et al. (2018) Bradley, J. R., S. H. Holan, C. K. Wikle, et al. (2018). Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data (with discussion). Bayesian Analysis 13(1), 253–310.
  • Brunsdon et al. (1996) Brunsdon, C., A. S. Fotheringham, and M. E. Charlton (1996). Geographically weighted regression: a method for exploring spatial nonstationarity. Geographical Analysis 28(4), 281–298.
  • Carlin et al. (2014) Carlin, B. P., A. E. Gelfand, and S. Banerjee (2014). Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC.
  • Cressie (1992) Cressie, N. (1992). Statistics for spatial data. Terra Nova 4(5), 613–617.
  • de Valpine et al. (2017) de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. T. Lang, and R. Bodik (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics 26(2), 403–413.
  • Diggle et al. (1998) Diggle, P. J., J. A. Tawn, and R. Moyeed (1998). Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(3), 299–350.
  • Ferguson (1973) Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics 1(2), 209–230.
  • Fotheringham et al. (2017) Fotheringham, A. S., W. Yang, and W. Kang (2017). Multiscale geographically weighted regression (mgwr). Annals of the American Association of Geographers 107(6), 1247–1265.
  • Gao and Bradley (2019) Gao, H. and J. R. Bradley (2019). Bayesian analysis of areal data with unknown adjacencies using the stochastic edge mixed effects model. Spatial Statistics 31, 100357.
  • Gelfand et al. (2003) Gelfand, A. E., H.-J. Kim, C. Sirmans, and S. Banerjee (2003). Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association 98(462), 387–396.
  • Gelfand and Schliep (2016) Gelfand, A. E. and E. M. Schliep (2016). Spatial statistics and Gaussian processes: A beautiful marriage. Spatial Statistics 18, 86–104.
  • Geng and Hu (2020) Geng, L. and G. Hu (2020). Bayesian spatial homogeneity pursuit for survival data with an application to the seer respiration cancer. arXiv preprint arXiv:2003.03006.
  • Hu and Bradley (2018) Hu, G. and J. Bradley (2018). A Bayesian spatial-temporal model with latent multivariate log-gamma random effects with application to earthquake magnitudes. Stat 7(1), e179. e179 sta4.179.
  • Hu et al. (2020) Hu, G., J. Geng, Y. Xue, and H. Sang (2020). Bayesian spatial homogeneity pursuit of functional data: an application to the U.S. income distribution. arXiv preprint arXiv:2002.06663.
  • Hu and Huffer (2020) Hu, G. and F. Huffer (2020). Modified Kaplan–Meier estimator and Nelson–Aalen estimator with geographical weighting for survival data. Geographical Analysis 52(1), 28–48.
  • Hu et al. (2020) Hu, G., Y. Xue, and F. Huffer (2020). A comparison of Bayesian accelerated failure time models with spatially varying coefficients. arXiv preprint arXiv:2002.03251.
  • Ibrahim et al. (2013) Ibrahim, J. G., M.-H. Chen, and D. Sinha (2013). Bayesian Survival Analysis. Springer Science & Business Media.
  • Ishwaran and James (2001) Ishwaran, H. and L. F. James (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96(453), 161–173.
  • Kulldorff and Nagarwalla (1995) Kulldorff, M. and N. Nagarwalla (1995). Spatial disease clusters: detection and inference. Statistics in Medicine 14(8), 799–810.
  • Lee et al. (2014) Lee, D., A. Rushworth, and S. K. Sahu (2014).

    A Bayesian localized conditional autoregressive model for estimating the health effects of air pollution.

    Biometrics 70(2), 419–429.
  • Lee et al. (2017) Lee, J., R. E. Gangnon, and J. Zhu (2017). Cluster detection of spatial regression coefficients. Statistics in Medicine 36(7), 1118–1133.
  • Lee et al. (2019) Lee, J., Y. Sun, and H. H. Chang (2019). Spatial cluster detection of regression coefficients in a mixed-effects model. Environmetrics, e2578.
  • Li and Sang (2019) Li, F. and H. Sang (2019). Spatial homogeneity pursuit of regression coefficients for large datasets. Journal of the American Statistical Association 114(527), 1050–1062.
  • Liu et al. (2020) Liu, J., Y. Ma, and H. Wang (2020).

    Semiparametric model for covariance regression analysis.

    Computational Statistics & Data Analysis 142, 106815.
  • Ma et al. (2019a) Ma, Z., Y. Xue, and G. Hu (2019a). Heterogeneous regression models for clusters of spatial dependent data. arXiv preprint arXiv:1907.02212.
  • Ma et al. (2019b) Ma, Z., Y. Xue, and G. Hu (2019b). Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data. Communications for Statistical Applications and Methods 26(1), 57–67.
  • Miller and Harrison (2013) Miller, J. W. and M. T. Harrison (2013). A simple example of Dirichlet process mixture inconsistency for the number of components. In Advances in Neural Information Processing Systems, pp. 199–206.
  • Miller and Harrison (2018) Miller, J. W. and M. T. Harrison (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association 113(521), 340–356.
  • Nakaya et al. (2005) Nakaya, T., A. S. Fotheringham, C. Brunsdon, and M. Charlton (2005). Geographically weighted Poisson regression for disease association mapping. Statistics in Medicine 24(17), 2695–2717.
  • Neal (2000) Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265.
  • Pitman (1995) Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields 102(2), 145–158.
  • Raftery et al. (1997) Raftery, A. E., D. Madigan, and J. A. Hoeting (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92(437), 179–191.
  • Rand (1971) Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850.
  • Reich et al. (2010) Reich, B. J., M. Fuentes, A. H. Herring, and K. R. Evenson (2010). Bayesian variable selection for multivariate spatially varying coefficient regression. Biometrics 66(3), 772–782.
  • Sethuraman (1991) Sethuraman, J. (1991). A constructive definition of Dirichlet priors. Statistics Sinica 4(2), 639–650.
  • Vavrek (2011) Vavrek, M. J. (2011). fossil: Palaeoecological and palaeogeographical analysis tools. Palaeontologia Electronica 14(1), 1T. R package version 0.3.0.
  • White and Ghosh (2009) White, G. and S. K. Ghosh (2009). A stochastic neighborhood conditional autoregressive model for spatial data. Computational Statistics & Data Analysis 53(8), 3033–3046.
  • Xu et al. (2019) Xu, Z., J. R. Bradley, and D. Sinha (2019). Latent multivariate log-gamma models for high-dimensional multi-type responses with application to daily fine particulate matter and mortality counts. arXiv preprint arXiv:1909.02528.
  • Xue et al. (2019) Xue, Y., E. D. Schifano, and G. Hu (2019). Geographically weighted Cox regression for prostate cancer survival data in Louisiana. Geographical Analysis. Forthcoming.
  • Yang et al. (2019) Yang, H.-C., G. Hu, and M.-H. Chen (2019). Bayesian variable selection for Pareto regression models with latent multivariate log gamma process with applications to earthquake magnitudes. Geosciences 9(4), 169.
  • Zhao et al. (2020) Zhao, P., H.-C. Yang, D. K. Dey, and G. Hu (2020). Bayesian spatial homogeneity pursuit regression for count value data. arXiv preprint arXiv:2002.06678.
  • Zou et al. (2017) Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. Journal of the American Statistical Association 112(517), 266–281.