Bayesian Spatial Homogeneity Pursuit Regression for Count Value Data

02/16/2020
by   Peng Zhao, et al.
University of Connecticut
0

Spatial regression models are ubiquitous in many different areas such as environmental science, geoscience, and public health. Exploring relationships between response variables and covariates with complex spatial patterns is a very important work. In this paper, we propose a novel spatially clustered coefficients regression model for count value data based on nonparametric Bayesian methods. Our proposed method detects the spatial homogeneity of the Poisson regression coefficients. A Markov random field constraint mixture of finite mixtures prior provides a consistent estimator of the number of the clusters of regression coefficients with the geographically neighborhood information. The theoretical properties of our proposed method are established. An efficient Markov chain Monte Carlo algorithm is developed by using multivariate log gamma distribution as a base distribution. Extensive simulation studies are carried out to examine empirical performance of the proposed method. Additionally, we analyze Georgia premature deaths data as an illustration of the effectiveness of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

page 20

page 21

page 25

07/06/2019

Bayesian Nonparametric Nonhomogeneous Poisson Process with Applications to USGS Earthquake Data

Intensity estimation is a common problem in statistical analysis of spat...
07/05/2020

Bayesian Hierarchical Spatial Regression Models for Spatial Data in the Presence of Missing Covariates with Applications

In many applications, survey data are collected from different survey ce...
02/16/2020

Bayesian Spatial Homogeneity Pursuit of Functional Data: an Application to the U.S. Income Distribution

An income distribution describes how an entity's total wealth is distrib...
04/25/2020

Bayesian Clustered Coefficients Regression with Auxiliary Covariates Assistant Random Effects

In regional economics research, a problem of interest is to detect simil...
12/18/2020

Identifying latent groups in spatial panel data using a Markov random field constrained product partition model

Understanding the heterogeneity over spatial locations is an important p...
09/01/2021

Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence

In this paper, we propose a Spatial Robust Mixture Regression model to i...
01/03/2019

Nonparametric graphical model for counts

Although multivariate count data are routinely collected in many applica...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spatial regression models have been universally used in many different fields such as environmental science (Hu and Bradley, 2018), biological science (Zhang and Lawson, 2011), and econometrics (Brunsdon et al., 1996) to explore the relation between a response variable and a set of predictors over a region. One of the most important tasks for a spatial regression model is to capture the spatial dependent structure for the response variable. Cressie (1992) first proposed a regression model with Gaussian spatial random effects. Diggle et al. (1998) extended linear mixed effect model to generalized linear mixed regression. In both models, spatial random effects are accounted for by the intercepts, and the regression coefficients are assumed to be constant over space. Brunsdon et al. (1996) proposed a geographically weighted regression (GWR) to capture spatially varying pattern in regression coefficients. The idea of GWR has been subsequently extended to the Cox model framework by Xue et al. (2019). GWR does not give any distribution assumption for spatially varying coefficients. Furthermore, Gelfand et al. (2003)

incorporated spatial Gaussian process to linear regression model to build a spatially varying coefficients regression model. The literatures mentioned above all assume that each location has its own set of regression parameters, which sometimes leads to overfitting. Heterogeneity effects over the space of interest has not been taken into account. Detection of clustered covariate effects has significant merits in many different fields, such as environmental science, spatial econometrics, and disease mapping. For example, a country’s demographical information often has different economic statuses and development patterns. More advanced regions and less developed regions could be put into separate clusters and analyzed. Such heterogeneity pattern is of great interest to regional economics researchers.

Spatial cluster detection method such as the scan statistic based method (Kulldorff and Nagarwalla, 1995; Jung et al., 2007) is a remedy for spatial heterogeneity detection. A scan statistic is constructed via a likelihood ratio statistic to test the potential clusters. Another important approach for spatial heterogeneity detection is to use the Bayesian framework to pursue spatial cluster (Carlin et al., 2014; Li et al., 2010). These two important approaches mainly focus on estimating cluster configurations of spatial response. Recently, methods for cluster detection of spatial regression coefficients have been proposed to detect the homogeneity of the covariates effects among sub areas. Lee et al. (2017) detected the spatial cluster under the GWR framework based on a hypothesis testing procedure. Lee et al. (2019) extended their framework to a spatial mixed effects model. The clustering configuration in both works is detected via a series of hypothesis testing. The slopes and intercepts estimation are performed in two consecutive stages. Li and Sang (2019) incorporated spatial neighborhood information based on minimum spanning trees in a penalized approach to detect spatially clustered coefficients.

There are several major challenges in developing clustering algorithms for Poisson spatially clustered regression coefficients. First, imposing certain spatial contiguous constraints on the clustering configuration to facilitate interpretations in the spatially clustered coefficients regression. Furthermore, in many regional science applications, spatially contiguous constraints should not dominate global clustering configuration. In other words, the clustering results should contain both spatially contiguous pattern and spatially disconnected pattern. The aforementioned methods (Lee et al., 2017, 2019; Li and Sang, 2019) guarantee spatial contiguity, but fail to obtain globally discontiguous clusters which allow two clusters with long geographical distance belonging to the same cluster.

Second, an important consideration in clustering algorithm is to estimate the number of clusters. The frequentist approach requires two-stage procedures which assumes the knowledge of the number of clusters or estimate it using either of BIC or cross-validation. Bayesian inference provides a probabilistic framework for simultaneous inference of the number of clusters and the clustering configurations. Nonparametric Bayesian approaches, such as the Dirichlet process mixture (DPM) model

(Ferguson, 1973), offer choices to estimate the number of clusters and the clustering configurations simultaneously. Ma et al. (2019) proposed a Bayesian clustered regression for spatially dependent data based on Dirichlet process mixture model. But their methods do not contain consistent estimator of the number of the clusters due to inconsistency of the Dirichlet process mixture model (Miller and Harrison, 2013). To solve this over clustering problem of DPM, Miller and Harrison (2018) proposed mixture of finite mixtures (MFM) model which puts a prior on the number of clusters. Xie and Xu (2019)

 developed a general class of Bayesian repulsive Gaussian mixture models to well separate the different cluster.

Lu et al. (2018) proposed a powered Chinese restaurant process (pCRP), which encourages the data points to go to existing clusters. Most existing literatures give the remedy for over clustering problem. But they do not consider other information such as geographical information which will help the clustering performance. From Tobler’s first law of geography (Tobler, 1970), “Everything is related to everything else, but near things are more related than distant things,” it is reasonable to consider the similar pattern of the data due to similar environmental circumstances.

Finally, a significant concern of Bayesian nonparametric algorithm is computational efficiency. A reversible jump Markov chain Monte Carlo (MCMC) algorithm (Green, 1995) is proposed to search variable dimensional parameter space based on assigning a prior on the number of clusters. Such algorithms are not efficient to implement and automate. Collapsed Gibbs sampling algorithm (Neal, 2000) requires Metropolis-Hastings algorithm (Chib and Greenberg, 1995) or incorporating auxiliary variables (Neal, 2000) when the base distribution is not conjugate. Therefore, Gaussian prior for regression coefficients is not an efficient choice because of its non-conjugacy.

To address these challenges, in this paper, we develop a Markov random field (MRF) constraint MFM (MRF-MFM) model to capture the spatial homogeneity in regression coefficients. Our main contributions in this paper are in three folds. First, we develop a new nonparametric Bayesian methods for spatially clustered coefficients Poisson regression which leveraging geographical information based on Markov random field constraint MFM model. To the best of our knowledge, this is the first time to leverage geographical information in Bayesian model-based clustering algorithm for Poisson regression. MRF-MFM is able to capture both locally spatially contiguous clusters and globally discontiguous cluster simultaneously. The Pólya urn scheme, exchangeability, and estimation consistency of MRF-MFM are fully discussed in this article. Second, a multivariate log gamma (MLG) process (Bradley et al., 2018) is chosen as the base distribution of MRF-MFM for Poisson regression model. Hence, an efficient MCMC algorithm is developed for our model without Metropolis-Hastings algorithm (Chib and Greenberg, 1995) or incorporating auxiliary variables (Neal, 2000). We firstly introduce MLG process in nonparametric Bayesian algorithm. Finally, our proposed Bayesian approach reveals interesting features of the county-level premature data in the state of Georgia, which provides important information to study homogeneity effects of public health.

The remainder of the paper is organized as follows. In Section 2, we develop a spatial clustered coefficients Poisson regression model with MRF-MFM-MLG process and examine the properties of our proposed methods. In Section 3, an MCMC sampling algorithm is developed and post MCMC inference is discussed. Extensive simulation studies are carried out in Section 4. For illustration, our proposed methodology is applied to premature deaths data in the state of Georgia in Section 5. Finally, we conclude this paper with a brief discussion in Section 6. For ease of exposition, proofs and additional technical results are given in the supplementary materials.

2 Methodology

In this section, we will first introduce the clustered coefficients regression for count value data based on Poisson regression. In addition, MRF-MFM are proposed for spatial clustered coefficients regression model in order to leverage geographical information. The theoretical properties of MRF-MFM are fully examined. The Bayesian hierarchical model in conjunction with MLG is proposed in the end of this section.

2.1 Spatial Poisson Regression

Consider a Poisson regression model with spatially varying coefficients as follows

(1)

where is a dimensional regression coefficients at location . From Gelfand et al. (2003), a Gaussian process prior can be assigned on regression coefficients to obtain spatially varying pattern. Compared with spatially varying pattern, heterogeneity pattern of covariate effects over subareas is also universally discussed in many different fields, such as real estate applications, spatial econometrics, and environmental science. The model in (1) can be rewritten as

(2)

where . Through the popular Chinese restaurant process, are defined through the following conditional distribution (Pólya urn scheme, Blackwell et al., 1973):

(3)

where is the size of cluster .

While CRP has a very attractive feature of simultaneous estimation on the number of clusters and the cluster configuration, a striking consequence of this has been recently discovered (Miller and Harrison, 2018) where it is shown that the CRP produces extraneous clusters in the posterior leading to inconsistent estimation of the number of clusters even when the sample size grows to infinity. A modification of the CRP called Mixture of finite mixtures (MFM) model is proposed to circumvent this issue (Miller and Harrison, 2018):

(4)

where

is a proper probability mass function on

and is a point-mass at . Compared to the CRP, the introduction of new tables is slowed down by the factor , which allows a model-based pruning of the tiny extraneous clusters. is a coefficient that need to be precomputed:

where , and . under (4) can be defined in a Pólya urn scheme similar to CRP:

(5)

Where is the number of existing clusters.

2.2 Markov Random Field Constrained MFM

For the spatial data, leveraging geographical information is an important task. We apply the pairwise MRF in the level of coefficients to bring spatial interactions. Based on MRF, we can leverage information from nearby locations for Bayesian nonparametric clustering algorithm. This model can be more adapted to the count data since directly applying Poisson MRF can only incorporate negative correlations among response variables (Yang et al., 2012). Consider an undirected random graph , where is the vertex set while is the set of graph edges, with weights on the corresponding edges. Each vertex

is associated with a random variable

for . The pairwise MRF model is defined as

(6)

where is the normalizing constant. For example, for Gaussian MRF, and ; while for binary MRF, which is the celebrated Ising model, and . We can then decompose the pairwise MRF into vertex-wise term and interaction term , then

(7)

where and is the set of all cliques for the random graph . For the spatial clustered coefficient regression introduced in the next subsection, we study the component defined in equation (7) with a MFM prior.

2.3 Theoretical Properties

Our first theorem provides the generalized urn-model induced by MRF-MFM, thus a collapsed Gibbs sampler can be applied.

Theorem 2.1.

Under the definition in equation, (6) and (7), if is a continuous distribution and , the distributions of given is proportional to

(8)

with

(9)

where , are the distinct values taken by and , and .

Remark 2.1.

This theorem shows how the MRF constraints directly affect the Pólya urn sampling scheme compared with MFM. For example, when , with . Then the spatial smoothness can be controlled by the magnitude of .

In nonparametric Bayesian Statistics such as Dirichlet Process, the

exchangeable random variable is an important concept. When are infinite exchangeable, for any finite ,

(10)

where is the set of all permutations for the index set . If are i.i.d. sampled from the distribution , then they are exchangeable, but the converse is not true. Examples for exchangeable random variables that are not independent are Pólya’s Urn (Blackwell et al., 1973) and Gaussian random variables that has the same marginal distribution and the same correlation between any two variables. The famous de Finetti’s Theorem (De Finetti, 1929) reveals the intrinsic characterization of exchangeable random variables: there is a latent random variable , such that sampled from is equivalent to the following sampling procedure:

(11)

where only depends on . In other words, exchangeable random variables are conditionally i.i.d. given their latent label. There are two levels of labeling during the sampling procedure, the first labels are created by the Dirichlet prior, while the second one comes from the fact that are sampled from exchangeable but could be dependent distribution, thus the dependence of can be introduced.

Our next theorem establishes the key point on why the MFM can be extended to MRF-MFM without loss of consistency when , are not i.i.d but only exchangeable. By marginalizing the coefficients , the distribution of can be totally decided by the cluster configuration .

Theorem 2.2.

Assume are exchangeable random variables. Given the cluster configuration , the data and the number of components are independent.

Remark 2.2.

Compared with MFM, we generalize the same result from i.i.d. to the exchangeable cases. Since the dependence between is totally decided by , when are exchangeable, all the play the same role in generating . Then are marginalized, the cluster configuration covers the same information with the number of components and the latent labels .

Consider the pairwise interactions, we model the conditional cost functions as

(12)

where is the smoothness parameter, denotes the set of the neighbors of observation . When , the MRF-MFM reduces to MFM (Miller and Harrison, 2018). Note that since is continuous, with probability , are exchangeable random variables.

Our Final theorem shows that the estimation of the number of components is consistent asymptotically under MRF-MFM.

Theorem 2.3.

If , then under the assumption in Theorem 2.2, we have

(13)

as .

Remark 2.3.

This theorem demonstrates the efficacy of our proposed MRF-MFM compared with Dirichlet process mixture model with Markov random fields (DP-MRF)(Orbanz and Buhmann, 2008). For DP-MRF, there could be a lot of small spurious clusters due to inconsistency of the Dirichlet process mixture model (Miller and Harrison, 2013). However, since a prior distribution for the number of components is specified for our method, the posterior number of components is properly regularized, and can achieve consistency when the model is well-specified.

2.4 Spatial Clustered Coefficient Regression for Count Value Data

For the MRF-MFM, the natural choice of the base distribution of

is the multivariate normal distribution. However, the multivariate normal distribution is not the conjugate prior for Poisson regression. Using multivariate normal distribution as the base distribution requires Metropolis-Hastings updates or auxiliary parameters

(Neal, 2000) in Gibbs sampling algorithm for MRF-MFM. Bradley et al. (2018)

developed a multivariate log-gamma distribution (MLG), which is conjugate with the Poisson distribution. Based on MLG, an MRF-MFM-MLG is proposed for spatial clustered coefficients for Poisson regression.

We now review the multivariate log-gamma distribution from Bradley et al. (2018). We define the

-dimensional random vector

, which consists of mutually independent log-gamma random variables with shape and scale parameters organized into the -dimensional vectors , and , respectively. Then define the -dimensional random vector as follows

(14)

where the matrix and . Bradley et al. (2018) called as the multivariate log-gamma random vector. The random vector

has the following probability density function:

(15)

where “det” represents the determinant function. As a shorthand we use the notation, “” for the probability density function in (15). We give a MLG prior with parameters , then the posterior distribution of is given as:

(16)

where , , ,, and . Compared with Gaussian prior of , the MLG prior of does not require Metropolis-Hastings algorithm (Chib and Greenberg, 1995) or Slice sampler (Neal et al., 2003) to obtain the posterior distribution of . Another good property for MLG prior is the multivariate log-gamma distribution has asymptotic relation with the multivariate normal distribution. Bradley et al. (2018) show that if , and then converges in distribution to a multivariate normal random vector with mean and covariance matrix as . In practice, is sufficiently large for this normal approximation.

Next, we look at the Theorem 2 from Bradley et al. (2018), let , and partition this n-dimensional random vector so that , where is g-dimensional and is (n-g)-dimensional. Additionally, consider the class of MLG random vectors that satisfy the following:

(17)

where in general is a matrix of zeros; is a identity matrix;

(18)

is the QR decomposition of the

matrix H; the matrix satisfies , the matrix satisfies , and ; is a upper triangular matrix; and . Hence, the marginal distribution of the g-dimensional random vector is given by

(19)

where the normalizing constant is

(20)

so the is equal to the pdf in equation (19). The posterior distribution of with MLG prior under Poisson regression framework is the .

Adapting the MRF-MFM in conjunction with MLG to the spatial Poisson regression setting, we focus on the clustering of spatially-varying coefficients , where is the -dimensional coefficient vector for location . In our setting, we assume that the parameter vectors can be clustered into groups, i.e., , by letting the multivariate log-gamma as the base measure for , the hierarchal model can be written as

(21)

The choice of the hyperparameters and probability distribution of the number of clusters will be disccused in Section 

3.

3 Bayesian Inference

MCMC is used to draw samples from the posterior distributions of the model parameters. In this section we present the sampling scheme, the posterior inference of cluster belongings, and measures to evaluate the estimation performance and clustering accuracy.

3.1 The MCMC Sampling Schemes

Our goal is to sample from the posterior distribution of the unknown parameters , and . We choose and in (21), , and for all the simulations and real data analysis, where is a n-dimensional vector with 0, is a n-dimensional vector with 1, and is a n-dimensional identity matrix. The sampler is presented in Algorithm (1), which efficiently cycles through the full conditional distributions of for and , where . The details of the full conditional distributions are in the supplementary materials. The marginalization over can avoid complicated reversible jump MCMC algorithms or even allocation samplers.

1:procedure c-MRF-MFM-MLG
2:     Initialize and
3:     for each iteration=1 to B do
4:         Update conditional on in a closed form as
where,
Parameter Form
5:         Update conditional on for each in (1,…,n), we can get closed form expression for :
where denotes the partition obtained by removing and
where,
6:     end for
7:end procedure
Algorithm 1 Collapsed sampler for MRF-MFM-MLG

3.2 Inference of MCMC results

The posterior mean or median of clustering configurations is not suitable. Dahl’s method (Dahl, 2006) provides a remedy for posterior inference of the clustering configurations . The inference of Dahl’s method is based on the posterior samples of membership matrices, , where is the total number of Monte Carlo iterations. The definition of membership matrix is

(22)

with for all . indicates observations and are in the same cluster in the th posterior sample after burn-in iterations. Based on the posterior samples of membership matrices , a Euclidean mean for membership matrices is calculated by

The posterior iteration with the least squares distance to is obtained by

(23)

The estimated parameters, together with the cluster assignments , are obtained from th post burn-in iteration.

The second measure used in our performance evaluation is the Rand index Rand (1971), which can be used to measure the accuracy of clustering. The rand index is obtained by using R-package fossil (Vavrek, 2011). The RI ranges from 0 to 1 with a higher value indicating better agreement between the two partitions. In particular, indicates that two cluster assignments are identical in terms of modulo labeling of the nodes.

In the proposed spatial clustered coefficients model, the tuning parameter in Markov random fields needs to be selected. The Logarithm of the Pseudo-Marginal Likelihood (LPML) (Ibrahim et al., 2013) is applied for tuning parameter selection. The LPML can be obtained through the Conditional Predictive Ordinate (CPO) values. Let denote the observations with the th subject response deleted. The CPO for the th subject is defined as:

(24)

where

and is the normalizing constant. Following (Gelfand and Dey, 1994), a Monte Carlo estimate of the CPO can be obtained as:

(25)

where is th posterior samples of . An estimate of the LPML can subsequently be calculated as:

(26)

A model with a larger LPML value is more preferred.

4 Simulation

In this section, we firstly detail the simulation settings in Section 4.1, as well as how performance of the proposed method is evaluated. In addition, extensive simulation results are shown in Section 4.2.

4.1 Simulation Setting and Evaluation Metrics

We use the spatial structure of the state of Georgia, which contains 159 counties in Georgia. We consider two different spatial cluster designs. The first design is a three-cluster setting as illustrated in Figure 1, where geographical proximity becomes the factor that determines true cluster configuration. Another design is the two-cluster partition in Figure  2. The first cluster consists of two disjoint parts in the top and bottom region of Georgia state. The counties in the middle are in second cluster. It is designed to mimic a common premature deaths pattern that geographically distant regions can share similar distribution pattern, and geographical proximity is not the sole determining factor for homogeneity in premature deaths distribution.

Figure 1: Simulation design with three underlying clusters.
Figure 2: Simulation design with two underlying clusters, where the first cluster is composed of two geographically disjoint regions.

For each design, we have two different scenarios. The first scenario for each design is without spatial random effects. The second scenario for each design is with spatial random effects. We assume the spatial random effects follow a multivariate Normal distribution with mean zero and exponential covariogram. In total, we have four different scenarios in our simulation study. The details of data generation are given as

  1. , where , , , , .

  2. , where , , , , . , where , we set and .

  3. , where , , , .

  4. , where , , , . , where , we set and .

The final clustering performance is evaluated based on the estimated number of clusters and Rand Index (RI). The RI is calculated for each iteration for each replicate, and we calculate an average RI over all replicates. Also, the final number of clusters estimated is computed for each replicate. A total of 100 sets of data are generated under different scenarios. We run 5,000 iterations MCMC chain and burn-in the first 1,000 for each replicates.

4.2 Simulation Results

We generate sets of data from the model in four different scenarios. To each replicated data set, we fit the MFM-MLG and MRF-MFM-MLG with different values of the smoothness parameter and select the best smoothness parameter for each replicate based on LPML. In addition, we present the average value of RI for each case over 100 replicates to compare the clustering performance with MFM-MLG model based on RI. We also compare model fitness of our proposed model with MFM-MLG model in terms of LPML. We see that our model outperforms the MFM-MLG model in term of LPML on four different scenarios. We also evaluate the performance in terms of estimation results of the number of clusters. We report the proportion of times the true cluster recovered among the 100 replicates. For two clusters without spatial random effect design, we find out our model can recover the true number of clusters 100% of the replicates. And the MFM-MLG model can recover 99% of the replicates. In this case, both models perform well in number of clusters estimation. But our model outperforms the MFM-MLG model in terms of LPML value. For the two clusters design with spatial random effects design, we see that our model can recover the true number of clusters 99% of the replicates, but the MFM-MLG model only recover 45% of the replicates. For three clusters without spatial random effect design, we find out our model can recover the true number of clusters 98% of the replicates. On the other hand, MFM-MLG model recover 89% of the replicates. In this case, although the rand index of our model is slight worse than MFM-MLG but we think our method overcome the over-cluster issue. For three clusters with spatial random effect design, we find out our model can recover the true number of clusters 53% of the replicates. However the MFM-MLG model only recover 3% of the replicates.

All results including comparison of LPML, rand index, and estimating the number of clusters for each design are provided in Figure 3, 4, 5, and 6.

Figure 3: Two Clusters Design Without Spatial Random Effect. Left top: LPML plot; Right top: Rand Index plot. Left bottom: Number of cluster cover by MRF-MFM model. Right bottom: Number of clusters covered by MFM model.
Figure 4: Two Clusters Design With Spatial Random Effect. Left top: LPML plot; Right top: Rand Index plot. Left bottom: Number of cluster cover by MRF-MFM model. Right bottom: Number of clusters covered by MFM model.
Figure 5: Three Clusters Design Without Spatial Random Effect. Left top: LPML plot; Right top: Rand Index plot. Left bottom: Number of cluster cover by MRF-MFM model. Right bottom: Number of clusters covered by MFM model.
Figure 6: Three Clusters Design With Spatial Random Effect. Left top: LPML plot; Right top: Rand Index plot. Left bottom: Number of cluster cover by MRF-MFM model. Right bottom: Number of clusters covered by MFM model.

From the results shown in Figures above, our method can successfully estimate the true number of clusters. Otherwise, the MFM will overestimate the number of clusters when there exist spatial random effects.

Furthermore, we show the average mean square error (AMSE) of our proposed method and MFM-MLG in Table 1.

Method Parameter No Spatial Random effect With Spatial Random effect
Two Clusters Three Clusters Two Clusters Three Clusters
MRF-MFM-MLG 0.0848 0.2508 0.0966 0.3918
0.0839 0.2435 0.0967 0.3814
MFM-MLG 0.1170 0.2841 0.3675 0.6996
0.1164 0.2781 0.3668 0.6898
Table 1: AMSE for Estimation under All Scenarios

From the results in Table 1, we see that in four different scenarios, our proposed methods outperform MFM in terms of coefficient estimation. For the data generated from the model with spatial random effect, the improvement of our proposed methods is more obvious.

5 Illustration: Premature Deaths in Georgia

In this section, we will analyze premature deaths in state of Georgia as the illustration of our proposed methods.

5.1 Data Description

We consider analyzing influential factors for the number of premature deaths in state of Georgia using the proposed methods. The dataset is available at www.countyhealthrankings.org with 159 observations corresponding to the 159 counties in state of Georgia of the year 2015. For each county, the dependent variable is the number of the premature deaths of each county. The premature death is the death that occurs before the average age of death in a certain population. The Figure presents the number of the premature death of each county. In the United States, the average age of death is about 75 years. The dependent variable is the number of life lost per 100,000 population before age 75 in each county. The two covariates we consider in this paper is PM 2.5 () and food environment index (). PM 2.5 is the average daily density of fine particulate matter in micrograms per cubic meter. The food environment index is the index of factors that contribute to a healthy food environment, 0 (worst) to 10 (best). Figure 8 presents a visualization of the two covariates on the Georgia map.

Figure 7: Count of the Premature Death (in hundreds) Visualized on the County Map of Georgia.
Figure 8: Visualizations of PM 2.5 and Food Environment Index on the County Map of Georgia.

5.2 Analysis

In this section, we apply the proposed methodology to present a detailed analysis of premature death data in the state of Georgia. First, we rescale the data to a decent range, because the variance in the Poisson distribution is equal to the mean. The count of the premature death is scaled to in hundreds. We run 15,000 MCMC iterations and burnin the first 5,000 iterations. The smoothing parameter is tuned over the grid

. All other parameters are set to be consistent with the simulation study. The final clustering result is set to be the one corresponding to the largest LPML (Ibrahim et al., 2013), hence we choose the smoothing parameter equal 0.3. The 159 counties turned out to be put into six clusters as illustrated in Figure 9. The number of the counties in each cluster are 148, 2, 1, 5, 2 and 1, respectively. We also compare our model with best LPML to MFM, Latent Gaussian Process (LGP) (Hadfield et al., 2010), conditional autoregressive (CAR) (Lee, 2013) models. The LPML values of all candidate models are shown in Table 2. Based on the Table 2, our proposed model outperforms other models. From the estimation results shown in 9, we see that for all the counties higher PM 2.5 will cause higher premature deaths. For most counties, better food environment will cause less premature death.

Figure 9: Left top: Illustration of clusters identified by the proposed method for counties.; Left bottom:; Right top:; Right bottom:.
Method LPML
MRF-MFM () -2221.45
MFM -3614.38
LGP -2461.31
CAR -5015.93
Table 2: LPML for Different Models of Georgia Data
Cluster
1 -0.422 0.032 0.174
2 1.445 1.018 -1.792
3 -1.776 0.158 0.645
4 -0.928 0.937 -1.496
5 -1.546 0.179 0.410
6 0.393 0.221 0.170
Table 3: Dalh’s method estimates for the six clusters of Georgia Data

6 Discussion

In this paper, we have proposed a Bayesian spatial clustered coefficients regression for count value data. Our proposed MRF-MFM-MLG has two merits. First, MRF-MFM-MLG leverages the geographical information to detect the spatial homogeneity pattern of regression coefficients. Second, an efficient MCMC algorithm is developed for Poisson regression with spatial clustered coefficients. The usage of proposed method is illustrated in simulation studies, where it shows accurate estimation performance, and also outperforms other popular models. For premature death data, our method dominates the other benchmark methods in terms of LPML.

In addition, three topics beyond the scope of this paper are worth further investigation. First, in our MCMC algorithm, one numerical integration is required for Gibbs sampling. Proposing an efficient calculation algorithm of the numerical integration will broad the applications of our proposed methods. Furthermore, different clusters may have different sparsity patterns of the covariates. Incorporating spatial clustered sparsity structure of regression coefficients into the model will enable selection and identification of most important covariates. One parameter of Markov random field is require to be selected. Proposing a hierarchical model for tuning parameter is also an interesting future work.

Acknowledgement

Dr. Hu’s Research was supported by the Dean’s office of the College of Liberal Arts and Sciences at University of Connecticut.

Appendix A Proof of the Theorem 2.1

By Bayes’ theorem, we have:

(27)

As shown in Miller and Harrison (2018), by conditioning on the different possible situations of the cluster for the new observations, we have

(28)

Let . When considering the full conditional distribution

(29)

where only depends on for . Note that

(30)

where specifies the cluster that belongs to. With the property in equation (30) and the assumption that is continuous, almost surely for . Then given the density for and any subset for the domain of ,

(31)

where the constant only depends on the constant part of . Hence, the full conditional of can be derived

(32)

Appendix B Proof of the Theorem 2.2

We show that the interaction term does not change the conditional independence among data and the number of components given the cluster configuration when all are exchangeable.
Let , based on the definition of and , we have

(33)

where , are the distinct values of decided by and , and . Given , the transformation from variable to , is totally decided, so when marginalizing the unused , given any function , we have the identity

(34)

Note that are exchangeable based on assumption, then the density after marginalizing can be seen

(35)

where is a function only depends on and . In addition, directly follows from de Finetti’s Theorem; for , we apply the Fubini’s theorem; is because the expression depends only on through since there is no correspondence between and after integrating out