Spatial point patterns are usually divided into three cases: regularity/repulsiveness, complete spatial randomness, and aggregation/clustering. There is a wide selection of point process models suitable for these situations, see e.g. the overview in clustering_regularity_thin and the references therein. However, some point patterns show repulsiveness between the points at small scale and aggregation at a larger scale, see clustering_regularity_thin for a detailed discussion. In this regard, clustering_regularity_thin suggested a model for this situation obtained by a dependent thinning of a repulsive point process. It is also possible to construct certain Gibbs point processes with this behaviour, see e.g. clustering_regularity_Gibbs.
1.1 The log Gaussian Cox Strauss process
In this paper, we present a model for regularity at small scale and aggregation at larger scale which is a combination of a pairwise interaction point process and a log Gaussian Cox process. It is constructed by the following two steps.
First, we consider a pairwise interaction point process defined as follows. Let X be a spatial point process viewed as a finite random subset of a given bounded region (we think of as an observation window). Then X is a pairwise interaction point process if X follows a density (with respect to the unit rate Poisson process on ) of the form
for all point patterns with (if then is the empty point pattern), where the notation means the following: is a so-called first order interaction function; is a so-called second order interaction function; denotes usual Euclidean distance; and is the normalising constant which is required to be positive and finite. Usually, , in which case the density is well defined and results in a model for repulsion between the points. The first order interaction function may be used to model systematic aggregation of points.
Second, we consider a doubly stochastic construction, by replacing with a random function in order to introduce random aggregation to the model. This is an extension of a Cox process (the case , cf. cox55), and such a model was considered in kmjm08 when is the stochastic intensity function of a shot noise Cox process. Instead, we use the random intensity function of a log Gaussian Cox process (LGCP, see LGCP), which is a popular model for random aggregation. Specifically, we let
where is a Gaussian random field (GRF) with constant mean and exponential covariance function
is the variance andis a scale parameter. For , the flexible stochastic process may account for aggregation caused by unobserved covariates. Note that if .
For the second order interaction function in (1), kmjm08 used a piecewise linear function, whereas we will use the much simpler second order interaction function of a Strauss process (strauss1975; kelly1976). This gives us a density for X (with respect to the unit rate Poisson process) of the form
is the parameter vector. Here, the expectation is with respect to the GRF;is the normalising constant obtained by conditioning on Z; denotes the indicator function; and we use the convention . The parameter is called the interaction radius and the parameter controls the repulsion between points. This model for X will be referred to as an LGCP-Strauss process.
The model includes some well-known special cases:
Conditioned on Z, X is an inhomogeneous Strauss process.
If , X is a usual Strauss process. If in addition , X is a hard core Gibbs process with hard core parameter ; or if in addition , X is a homogeneous Poisson process on with intensity .
If , X is an LGCP.
The following coupling result becomes useful when interpreting the meaning of and when we later discuss simulation of the LGCP-Strauss process. To stress the dependence of , we write . Then, using a dependent thinning technique (KendallMoeller) it follows that there exists a coupling of the LGCP-Strauss processes for all such that whenever . In particular, the special case of the LGCP (item (c) above) dominates any of the LGCP-Strauss processes . The intensity of is (LGCP), so provides an upper bound on the expected number of points in . Here, denotes the area of .
Note that if we do not have any of the above special cases (a)–(c), both the intensity and other moment characteristics ofX, the density (3), and the Papangelou conditional intensity (see e.g. textbook)
are not expressible in closed form. Therefore, in general, usual approaches for estimation based on likelihood, pseudo-likelihood, composite likelihood, and minimum contrasts(see the review in JMRW2017) are not feasible for the LGCP-Strauss process. This makes statistical inference challenging.
1.2 Objectives and outline
In this paper, we show how to use approximate Bayesian computation (ABC) to make statistical inference for spatial point process models in general and provide further details for the LGCP-Strauss process. In brief, ABC is a flexible method for approximate inference in a Bayesian framework, which does not require the likelihood to be expressible in closed form. Instead, it is based on the ability to make simulations under the assumed model, which are then compared to the observed data by using summary statistics.
In previous work on ABC in the setting of spatial point process models, ABCppAlan explained how ABC can be used for Strauss process models and determinantal point process models. For the Strauss process model they estimated the interaction radius using maximum profile pseudo likelihood and then kept the interaction radius fixed at this estimate during the ABC procedure. Further, ABCppfunctional presented an ABC method using functional summary statistics such as the pair correlation function, which they exemplified for a Thomas process model and a marked point process model. Finally, ABCshadow presented an ABC method applicable to spatial point process models with a continuously differentiable likelihood function.
In contrast to ABCppAlan, the method we use for statistical inference is based entirely on ABC, and unlike ABCppAlan and ABCppfunctional we do not fix any of the unknown parameters during the ABC procedure. Furthermore, we provide a detailed discussion of the choice of summary statistics for ABC when making statistical inference for the LGCP-Strauss process. This discussion is equally relevant if using ABC in connection with other point process models which are special cases of the LGCP-Strauss process such as the LGCP or Strauss process. We also suggest a method for model validation and comparison based on posterior predictions and global envelopes. We use this in a simulation study to assess the quality of ABC results for LGCP-Strauss processes and to investigate whether realisations of the LGCP-Strauss process can be distinguished from LGCPs and Strauss processes.
The remainder of this paper is organized as follows. In Section 2, our chosen method for ABC model fitting is specified. Section 3 presents simulated examples of LGCP-Strauss processes with corresponding ABC analyses. Section 4 contains a real data example using a point pattern of oak trees which suffer from frost shake. Section 5 concludes with a brief summary and paths for future work.
The open source software R(R) is used for all statistical computations. Most plots are created with the R-package ggplot2 (ggplot) and some of the functionalities of the R-package spatstat (spatstat) are used to handle spatial point patterns.
2 ABC for spatial point process models
2.1 The general case
Consider a spatial point process X defined on a bounded region
and which follows a parametric model with parameter vector. Assume a realisation of X is observed. Our chosen procedure for ABC is specified in Algorithm 1 below. It is inspired by ABCppAlan and is based on the semi-automatic approach by ABCsemi. ABCppAlan
used a Markov chain Monte Carlo method for the ABC sampling whereas we choose ABC rejection sampling, because of its simplicity and ability to be run in parallel.
In Algorithm 1, is the number of points in a point pattern x, and in the first and last for loop we demand that for each simulated x. This is not strictly necessary for ABC, but it is a way to insure that summary statistics are not calculated for point patterns with very few points. Most summary statistics for spatial point patterns can only be calculated or considered reliable if there is a reasonable number of points in the point pattern. In the examples of Sections 3 and 4, was found to be sufficient.
In the second for loop of Algorithm 1, the linear models approximating the posterior means , , are fitted in a two-step procedure, which is a special case of a relaxed Lasso (relaxedlasso)lasso: Chapter 2). Let , , be the resulting estimate of and set . Second, the summary statistics in
are used as predictors in a linear model fitted with ordinary least squares, which results in the final model for approximating.
2.2 The special case of LGCP-Strauss process models
Consider again an LGCP-Strauss process X on the observation window with density (3), which depends on the unknown parameter vector . One requirement for ABC is the ability to simulate data under this model for a given . We do this in two steps: First, a realisation z of Z is simulated (see e.g. schlather99). In R, this can be done with the function RFsimulate from the R-package RandomFields (RF2; RF1). Second, a realisation of X given is simulated using an MCMC algorithm, namely a birth-death Metropolis-Hastings algorithm (bdMetropolis : specifically, a birth is proposed with probability
: specifically, a birth is proposed with probabilityand otherwise a death is proposed; for a birth proposal, the new point is generated from a density proportional to ; and for a death proposal, the point to die is selected uniformly from the current point pattern).
Another requirement for ABC is the selection of appropriate summary statistics , for Algorithm 1. Since the parameter especially affects the number of points in a point pattern generated by an LGCP-Strauss process, we include the number of observed points as a summary statistic. Further, the -function , where denotes inter-point distance and is Ripley’s -function (ripley76; ripley77), summarises many important aspects of the second order moment properties of X. Since for a Poisson process, one usually considers . If (), this indicates that X is regular/repulsive (aggregated/clustered) at inter-point distances . A simulation study suggested that for realisations of an LGCP-Strauss process, the empirical estimate of often has a global minimum when is close to the interaction radius , at least when there is strong to moderate repulsion in the model. In this regard, the bottom row of Figure 1 shows some empirical -functions associated with realisations of LGCP-Strauss processes. We therefore also include summary statistics related to the function (see (b)-(c) below).
Furthermore, the parameters and of the LGCP-Strauss process affect the clustering in the process. To supply some summary statistics describing this, assume for ease of exposition that is a square with side length . Then we split into squares of side length , , and let be the number of points falling in . We choose summary statistics which describe how varies (see (d) below) and which are calculated for a user-specified finite range of -values.
Specifically, for a point pattern x (either or one of the simulated point patterns in Algorithm 1), we chose the following summary statistics.
, , and , where is a non-parametric estimate of the -function evaluated over a user-specified finite range of -values.
evaluated at equally spaced values of between 0 and .
, , and , where again means empirical variance.
We have chosen these specific forms of the summary statistics based on some numerical experiments. In the examples of Sections 3 and 4, and . This means that the vector of summary statistics has dimension equal to .
3 Simulated examples
3.1 Simulated data and prior specification
The top panels of Figure 1 show examples of simulated realisations of the LGCP-Strauss processes on the unit square for three different values of ; the remaining parameters are specified in the caption. The bottom panels of Figure 1 show the corresponding estimated -functions using Ripley’s isotropic edge correction (ripley77). As expected, the point patterns exhibit both regularity and aggregation, with an increasing degree of regularity at small to moderate distances as decreases, but a similar degree of aggregation at large distances.
The ABC procedure in Algorithm 1 requires specification of (proper) prior distributions for the parameters in order to draw prior samples of the parameters. These samples are then used to simulate LGCP-Strauss processes with the given parameters. The more points a simulated point pattern has, the more computationally expensive the simulation procedure will be (see below). Therefore, we choose the prior distributions in such a way that these simulated point patterns will not yield unreasonably many points compared to the number of points in our observed point patterns. Specifically, when considering the prior distributions of and , recall that is an upper bound on the expected number of points in the unit square. So, for the examples in this section, independent uniform prior distributions are chosen for on the interval , on , on , on , and on .
In order to use the MCMC algorithm when a realisation of the GRF is given (see Section 2.2), it is necessary to choose a burn-in which can be used for all simulations in the ABC procedure. In order to choose this burn-in, we considered 30 samples of the parameters drawn from the prior distributions; used the MCMC algorithm for all these samples; and considered trace plots of the number of points and -close pairs. Figure 7 shows these plots for three different prior samples for illustration. It seems that the higher the number of points, the slower the convergence. The burn-in should be high enough for the MCMC algorithm to have converged given any prior sample, but increasing the burn-in will also increase the computation time. Considering all 30 examples, appears to be an appropriate overall burn-in. We furthermore choose to initiate the MCMC algorithm at the empty point pattern or at a realisation of an inhomogeneous Poisson process on with intensity function (these initial states are extreme because of the coupling result mentioned in Section 1.1). The simulations in the top panels of Figure 1 and all following simulations are iteration of the MCMC algorithm initiated at the empty point pattern.
3.2 Posterior results
shows kernel density estimates of the resulting (approximate) marginal posterior distributions of the parameters, using a Gaussian kernel and a bandwidth chosen with the method bybwselection. From Figure 2 we see the following. In all cases, the posterior mean and median are close and in most cases they agree with the true parameter value. As the true value of increases, the posterior distribution of
becomes more and more left skew. The approximate posterior distributions forand
look rather similar to their prior uniform distributions. When the true value ofis 0, the posterior distribution of is very concentrated near 0, but it becomes more and more symmetric around the true value of as this increases. The spread of the posterior distribution of seems to increase as the true value of increases. Overall, the ABC procedure seems to be most successful for estimating and . Note that when fitting a Strauss process to a point pattern, ABCppAlan first estimated by maximum pseudo-likelihood and then used this value of in their ABC procedure; in contrast we found no need to fix when fitting an LGCP-Strauss process with our ABC procedure.
In geostatistics, it is known that the scale and variance parameters of an exponential covariance function for a GRF are unidentifiable (see e.g. covar_unidentifiable). This might explain why the ABC procedure is not so successful when it comes to identifying these parameters. Therefore, we made the same analysis as in Figure 2 when is given. However, the posterior marginal distributions of the remaining parameters (not shown) looked very similar to those in Figure 2.
We now investigate how the ABC procedure for fitting an LGCP-Strauss process works when the data is generated from some of the special cases of this process. For this purpose, we simulated a realisation of an LGCP with parameters , and , and a realisation of a Strauss process with parameters , and . Notice that when simulating under an LGCP, there is no need to employ the MCMC algorithm described at the beginning of Section 2.2. We used the faster method implemented in the function rLGCP from the package spatstat (spatstat). We used the same ABC procedure as above for fitting an LGCP-Strauss process to these point patterns and the posterior results can be seen in Figure 3. For the point pattern generated from an LGCP, the true value of seems to be identified well when fitting the LGCP-Strauss process. The posterior marginal distribution of is rather concentrated near , and a plot of the posterior samples of and (not shown) shows that very small values of appear together with very small values of . This indicates that the fitted LGCP-Strauss process is close to the special case of an LGCP, which is the true model. Again, it seems to be difficult to identify and .
For the point pattern generated from a Strauss process, the marginal posterior distribution for is very concentrated near zero, which is the true value. The true value of seems to be well identified whereas the median and mean of the marginal posterior distributions of and are somewhat higher than the true values. It appears that for this example, the ABC procedure is not as successful for as it was for the point patterns in Figure 1. For the Strauss process, should be irrelevant, and a plot of the posterior samples of and (not shown) shows that is irrelevant for small values of .
3.3 Model checking and comparison
We are interested in whether the point patterns in Figure 1 can be distinguished from realisations of an LGCP and a Strauss process, so for comparison we also fitted an LGCP and a Strauss process to each point pattern, using the ABC procedure in Algorithm 1. We used the same summary statistics as for the LGCP-Strauss process and the same prior distributions on the relevant parameters (that is, the parameters , and when fitting the LGCP, and the parameters , and when fitting the Strauss process). Again, when simulating under an LGCP, we used the faster method implemented in spatstat.
For model checking and comparison we suggest to make global envelope tests based on posterior predictions as follows. For each ABC realisation of , a realisation x of the process in question given is simulated. For each x, a functional summary statistic is estimated. These empirical curves are then used to construct global envelopes and corresponding tests based on extreme rank lengths (GET2017; GET2018: note that we only used 1 000 simulations instead of the recommended 2499, because the ABC procedure is rather time consuming). The R-package GET (GET2017) was used for this purpose.
In order to compare the fitted LGCP-Strauss, LGCP, and Strauss process models, we used global envelopes based on posterior predictions and the empirical - and -function, with where is the empty space function and is the nearest-neighbour distribution function (see Jfunction). We also tried to use the - and -functions for model validation but these functional summary statistics were unable to distinguish between the models.
Figure 4 shows combined global envelopes for the - and -function, meaning that, under the LGCP-Strauss process, the probability that both empirical curves are within their respective envelopes is approximately . To combine the envelopes we have used the two-step combining procedure described in GETinR. Note that the -function can only be estimated reliably for all simulations in the interval , whereas the -function can be estimated reliably on a larger interval.
In all cases, the -values of the global envelope tests are highest in the situation of the LGCP-Strauss process, which may indicate that they provide the best fit to data. The LGCP is rejected in the cases where and because the empirical -functions in these cases are above the global envelopes at small inter-point distances. This indicates that the point patterns are more regular at small inter-point distances than what would be expected under the fitted LGCPs. For the case (the case with weakest inhibition), the LGCP cannot be rejected. Notice that the -values of these tests are increasing as increases which is in agreement with the fact that the LGCP-Strauss process approaches the special case of an LGCP.
The Strauss process model is rejected in all three cases because the empirical -function clearly shows that the point patterns are more clustered at moderate to large inter-point distances than what can be modelled with a Strauss process. In the case , the empirical -function also shows this, but for the remaining two cases the -function is contained completely within the envelopes.
Overall, it appears that the -function is best at criticizing the LGCP and the -function is best at criticizing the Strauss process. The later may have something to do with the fact that the -function can only be estimated on a relatively small interval. So, it is less likely to capture the aggregation, which happens on a larger scale, than the -function which can be estimated on a bigger interval. When we use the -function for model validation we keep in mind that it was also used in the ABC procedure which might lead us to conclude that the model fits better to data than it actually does.
In order to investigate how this model validation and comparison works when the LGCP-Strauss process is overfitting, we also fitted an LGCP to the first point pattern in Figure 3 and a Strauss process model to the second point pattern in Figure 3 and compare them to the fitted LGCP-Strauss process models (the global envelopes are not shown). For the realisation of an LGCP, the -values of the combined global envelope test for the fitted LGCP-Strauss and LGCP were and , respectively. Since the data is generated from an LGCP, both models should fit the data equally well, so the higher -value for the LGCP-Strauss process is probably a result of the fact that it is overfitting. For the realisation of a Strauss process model, the -values of the combined global envelope test for the fitted LGCP-Strauss and Strauss process were and , respectively. In this example, the -values do not reveal the fact that the LGCP-Strauss process is overfitting.
4 Data example
The first panel in Figure 5 shows the locations of 256 oak trees which suffer from frost shake (frost shake refers to cracks in the trunk of the tree) in a m rectangular region of Allogny in France. This data set is part of the Allogony data set from the R-package ads (ads). The data set of the oaks was analysed in parmodelAllogny using a parametric point process model with regularity on the small scale and aggregation on the large scale. The LGCP-Strauss model should also be able to model this behaviour.
We used Algorithm 1 on this oak data set. Here, independent uniform prior distributions are chosen for on the interval , on , on , on , and on . Notice that the observation window for the oak data is much larger than the ones in Section 3.1, and the prior distributions are chosen to take this into account. Furthermore, when calculating the summary statistics for the ABC procedure, , , are now rectangular sets of the same size (see Section 2.2). Trace plots as those in Figure 7 (supplied in an appendix) suggested that iterations of the MCMC algorithm is a sufficient burn-in for this example. Again, a pilot sample of simulations was used and the resulting ABC posterior sample consists of draws from the approximate posterior distribution.
The marginal posterior distributions, which are estimated from the ABC sample, can be seen in Figure 5. They are all clearly different from their uniform priors. The posterior distributions of and look approximately normal, whilst the posterior distributions of , and are right skew. Note that the posterior distribution of indicates strong repulsion between the points. The posterior distribution of , particularly its heavy tail, suggests some aggregation among the splited oaks.
The first plot in Figure 6 shows 95% combined global envelopes for the fitted LGCP-Strauss process as described in Section 3. The overall behaviour of the observed point pattern seems to be captured well by the LGCP-Strauss process, and the -value is very high. For comparison, the remaining plots in Figure 6 show the corresponding envelopes for a fitted LGCP, a fitted Strauss process model, and the model fitted in parmodelAllogny, respectively. The LGCP and Strauss process models are fitted with the ABC procedure in Algorithm 1, and the envelopes are based on posterior predictions. The model in parmodelAllogny was not fitted in a Bayesian setup, so simulations under this model are not posterior predictions. For simulating this model, we used the technique suggested in parmodelAllogny. The combined global envelopes indicate that the LGCP model provides a poor fit to data. The tests conclude that both the Strauss process model and the model fitted in parmodelAllogny fit well. However, the -values are lower than the corresponding -value for the LGCP-Strauss process, indicating that the later may provide a better fit.
5 Summary and future work
We have proposed a novel spatial point process model which enables capturing of regularity through pairwise interactions and aggregation through a Gaussian process realization. This doubly stochastic spatial point process generalizes both the customary log Gaussian Cox process and the customary Gibbs process. Because the likelihood is intractable for this model we have developed model fitting through an ABC method. We have provided both simulation investigation and a real data application in order to reveal the behaviour of process realizations and also our ability to fit the model and do full inference for given point pattern realizations.
Future work can consider marked point patterns or so-called multi-type versions of our model (see e.g. textbook). Such multi-type modelling may allow attraction or inhibition within types but also introduce attraction or inhibition between types. A different direction would consider space-time versions. That is, a realization of the process is seen as a spatial point pattern by integrating over a window of time.
The research of the first two authors was supported by The Danish Council for Independent Research — Natural Sciences, grant DFF – 7014-00074 ‘Statistics for point processes in space and beyond’. The second author was also supported by the ‘Centre for Stochastic Geometry and Advanced Bioimaging’, funded by grant 8721 from the Villum Foundation.
Appendix A Trace plots for accessing the burn-in for the simulation algorithm
Figure 7 shows trace plots of the number of points and -close pairs for the MCMC algorithm when simulating an LGCP-Strauss process for different draws of the parameter vector from it’s prior distribution which is described in Section 3. For each prior sample of , a realisation z of the GRF was simulated, and the MCMC algorithm was used to simulate the LGCP-Strauss process given . This analysis was used to choose an appropriate burn-in in Section 3.