1 Introduction
Spatial point patterns are usually divided into three cases: regularity/repulsiveness, complete spatial randomness, and aggregation/clustering. There is a wide selection of point process models suitable for these situations, see e.g. the overview in clustering_regularity_thin and the references therein. However, some point patterns show repulsiveness between the points at small scale and aggregation at a larger scale, see clustering_regularity_thin for a detailed discussion. In this regard, clustering_regularity_thin suggested a model for this situation obtained by a dependent thinning of a repulsive point process. It is also possible to construct certain Gibbs point processes with this behaviour, see e.g. clustering_regularity_Gibbs.
1.1 The log Gaussian Cox Strauss process
In this paper, we present a model for regularity at small scale and aggregation at larger scale which is a combination of a pairwise interaction point process and a log Gaussian Cox process. It is constructed by the following two steps.
First, we consider a pairwise interaction point process defined as follows. Let X be a spatial point process viewed as a finite random subset of a given bounded region (we think of as an observation window). Then X is a pairwise interaction point process if X follows a density (with respect to the unit rate Poisson process on ) of the form
(1) 
for all point patterns with (if then is the empty point pattern), where the notation means the following: is a socalled first order interaction function; is a socalled second order interaction function; denotes usual Euclidean distance; and is the normalising constant which is required to be positive and finite. Usually, , in which case the density is well defined and results in a model for repulsion between the points. The first order interaction function may be used to model systematic aggregation of points.
Second, we consider a doubly stochastic construction, by replacing with a random function in order to introduce random aggregation to the model. This is an extension of a Cox process (the case , cf. cox55), and such a model was considered in kmjm08 when is the stochastic intensity function of a shot noise Cox process. Instead, we use the random intensity function of a log Gaussian Cox process (LGCP, see LGCP), which is a popular model for random aggregation. Specifically, we let
(2) 
where is a Gaussian random field (GRF) with constant mean and exponential covariance function
Here,
is the variance and
is a scale parameter. For , the flexible stochastic process may account for aggregation caused by unobserved covariates. Note that if .For the second order interaction function in (1), kmjm08 used a piecewise linear function, whereas we will use the much simpler second order interaction function of a Strauss process (strauss1975; kelly1976). This gives us a density for X (with respect to the unit rate Poisson process) of the form
(3) 
where
is the parameter vector. Here, the expectation is with respect to the GRF;
is the normalising constant obtained by conditioning on Z; denotes the indicator function; and we use the convention . The parameter is called the interaction radius and the parameter controls the repulsion between points. This model for X will be referred to as an LGCPStrauss process.The model includes some wellknown special cases:

Conditioned on Z, X is an inhomogeneous Strauss process.

If , X is a usual Strauss process. If in addition , X is a hard core Gibbs process with hard core parameter ; or if in addition , X is a homogeneous Poisson process on with intensity .

If , X is an LGCP.
The following coupling result becomes useful when interpreting the meaning of and when we later discuss simulation of the LGCPStrauss process. To stress the dependence of , we write . Then, using a dependent thinning technique (KendallMoeller) it follows that there exists a coupling of the LGCPStrauss processes for all such that whenever . In particular, the special case of the LGCP (item (c) above) dominates any of the LGCPStrauss processes . The intensity of is (LGCP), so provides an upper bound on the expected number of points in . Here, denotes the area of .
Note that if we do not have any of the above special cases (a)–(c), both the intensity and other moment characteristics of
X, the density (3), and the Papangelou conditional intensity (see e.g. textbook)are not expressible in closed form. Therefore, in general, usual approaches for estimation based on likelihood, pseudolikelihood, composite likelihood, and minimum contrasts
(see the review in JMRW2017) are not feasible for the LGCPStrauss process. This makes statistical inference challenging.1.2 Objectives and outline
In this paper, we show how to use approximate Bayesian computation (ABC) to make statistical inference for spatial point process models in general and provide further details for the LGCPStrauss process. In brief, ABC is a flexible method for approximate inference in a Bayesian framework, which does not require the likelihood to be expressible in closed form. Instead, it is based on the ability to make simulations under the assumed model, which are then compared to the observed data by using summary statistics.
In previous work on ABC in the setting of spatial point process models, ABCppAlan explained how ABC can be used for Strauss process models and determinantal point process models. For the Strauss process model they estimated the interaction radius using maximum profile pseudo likelihood and then kept the interaction radius fixed at this estimate during the ABC procedure. Further, ABCppfunctional presented an ABC method using functional summary statistics such as the pair correlation function, which they exemplified for a Thomas process model and a marked point process model. Finally, ABCshadow presented an ABC method applicable to spatial point process models with a continuously differentiable likelihood function.
In contrast to ABCppAlan, the method we use for statistical inference is based entirely on ABC, and unlike ABCppAlan and ABCppfunctional we do not fix any of the unknown parameters during the ABC procedure. Furthermore, we provide a detailed discussion of the choice of summary statistics for ABC when making statistical inference for the LGCPStrauss process. This discussion is equally relevant if using ABC in connection with other point process models which are special cases of the LGCPStrauss process such as the LGCP or Strauss process. We also suggest a method for model validation and comparison based on posterior predictions and global envelopes. We use this in a simulation study to assess the quality of ABC results for LGCPStrauss processes and to investigate whether realisations of the LGCPStrauss process can be distinguished from LGCPs and Strauss processes.
The remainder of this paper is organized as follows. In Section 2, our chosen method for ABC model fitting is specified. Section 3 presents simulated examples of LGCPStrauss processes with corresponding ABC analyses. Section 4 contains a real data example using a point pattern of oak trees which suffer from frost shake. Section 5 concludes with a brief summary and paths for future work.
The open source software R
(R) is used for all statistical computations. Most plots are created with the Rpackage ggplot2 (ggplot) and some of the functionalities of the Rpackage spatstat (spatstat) are used to handle spatial point patterns.2 ABC for spatial point process models
2.1 The general case
Consider a spatial point process X defined on a bounded region
and which follows a parametric model with parameter vector
. Assume a realisation of X is observed. Our chosen procedure for ABC is specified in Algorithm 1 below. It is inspired by ABCppAlan and is based on the semiautomatic approach by ABCsemi. ABCppAlanused a Markov chain Monte Carlo method for the ABC sampling whereas we choose ABC rejection sampling, because of its simplicity and ability to be run in parallel.
In Algorithm 1, is the number of points in a point pattern x, and in the first and last for loop we demand that for each simulated x. This is not strictly necessary for ABC, but it is a way to insure that summary statistics are not calculated for point patterns with very few points. Most summary statistics for spatial point patterns can only be calculated or considered reliable if there is a reasonable number of points in the point pattern. In the examples of Sections 3 and 4, was found to be sufficient.
In the second for loop of Algorithm 1, the linear models approximating the posterior means , , are fitted in a twostep procedure, which is a special case of a relaxed Lasso (relaxedlasso)
: First, a model is fitted with Lasso regression, where the penalty term is chosen based on a crossvalidation argument using the ‘onestandarderror rule’
(see e.g. lasso: Chapter 2). Let , , be the resulting estimate of and set . Second, the summary statistics inare used as predictors in a linear model fitted with ordinary least squares, which results in the final model for approximating
.2.2 The special case of LGCPStrauss process models
Consider again an LGCPStrauss process X on the observation window with density (3), which depends on the unknown parameter vector . One requirement for ABC is the ability to simulate data under this model for a given . We do this in two steps: First, a realisation z of Z is simulated (see e.g. schlather99). In R, this can be done with the function RFsimulate from the Rpackage RandomFields (RF2; RF1). Second, a realisation of X given is simulated using an MCMC algorithm, namely a birthdeath MetropolisHastings algorithm (bdMetropolis
: specifically, a birth is proposed with probability
and otherwise a death is proposed; for a birth proposal, the new point is generated from a density proportional to ; and for a death proposal, the point to die is selected uniformly from the current point pattern).Another requirement for ABC is the selection of appropriate summary statistics , for Algorithm 1. Since the parameter especially affects the number of points in a point pattern generated by an LGCPStrauss process, we include the number of observed points as a summary statistic. Further, the function , where denotes interpoint distance and is Ripley’s function (ripley76; ripley77), summarises many important aspects of the second order moment properties of X. Since for a Poisson process, one usually considers . If (), this indicates that X is regular/repulsive (aggregated/clustered) at interpoint distances . A simulation study suggested that for realisations of an LGCPStrauss process, the empirical estimate of often has a global minimum when is close to the interaction radius , at least when there is strong to moderate repulsion in the model. In this regard, the bottom row of Figure 1 shows some empirical functions associated with realisations of LGCPStrauss processes. We therefore also include summary statistics related to the function (see (b)(c) below).
Furthermore, the parameters and of the LGCPStrauss process affect the clustering in the process. To supply some summary statistics describing this, assume for ease of exposition that is a square with side length . Then we split into squares of side length , , and let be the number of points falling in . We choose summary statistics which describe how varies (see (d) below) and which are calculated for a userspecified finite range of values.
Specifically, for a point pattern x (either or one of the simulated point patterns in Algorithm 1), we chose the following summary statistics.

.

, , and , where is a nonparametric estimate of the function evaluated over a userspecified finite range of values.

evaluated at equally spaced values of between 0 and .

, , and , where again means empirical variance.
We have chosen these specific forms of the summary statistics based on some numerical experiments. In the examples of Sections 3 and 4, and . This means that the vector of summary statistics has dimension equal to .
3 Simulated examples
3.1 Simulated data and prior specification
The top panels of Figure 1 show examples of simulated realisations of the LGCPStrauss processes on the unit square for three different values of ; the remaining parameters are specified in the caption. The bottom panels of Figure 1 show the corresponding estimated functions using Ripley’s isotropic edge correction (ripley77). As expected, the point patterns exhibit both regularity and aggregation, with an increasing degree of regularity at small to moderate distances as decreases, but a similar degree of aggregation at large distances.
The ABC procedure in Algorithm 1 requires specification of (proper) prior distributions for the parameters in order to draw prior samples of the parameters. These samples are then used to simulate LGCPStrauss processes with the given parameters. The more points a simulated point pattern has, the more computationally expensive the simulation procedure will be (see below). Therefore, we choose the prior distributions in such a way that these simulated point patterns will not yield unreasonably many points compared to the number of points in our observed point patterns. Specifically, when considering the prior distributions of and , recall that is an upper bound on the expected number of points in the unit square. So, for the examples in this section, independent uniform prior distributions are chosen for on the interval , on , on , on , and on .
In order to use the MCMC algorithm when a realisation of the GRF is given (see Section 2.2), it is necessary to choose a burnin which can be used for all simulations in the ABC procedure. In order to choose this burnin, we considered 30 samples of the parameters drawn from the prior distributions; used the MCMC algorithm for all these samples; and considered trace plots of the number of points and close pairs. Figure 7 shows these plots for three different prior samples for illustration. It seems that the higher the number of points, the slower the convergence. The burnin should be high enough for the MCMC algorithm to have converged given any prior sample, but increasing the burnin will also increase the computation time. Considering all 30 examples, appears to be an appropriate overall burnin. We furthermore choose to initiate the MCMC algorithm at the empty point pattern or at a realisation of an inhomogeneous Poisson process on with intensity function (these initial states are extreme because of the coupling result mentioned in Section 1.1). The simulations in the top panels of Figure 1 and all following simulations are iteration of the MCMC algorithm initiated at the empty point pattern.
3.2 Posterior results
We used Algorithm 1 on the three point patterns in Figure 1 with and , (the same choice as in ABCppAlan). Figure 2
shows kernel density estimates of the resulting (approximate) marginal posterior distributions of the parameters, using a Gaussian kernel and a bandwidth chosen with the method by
bwselection. From Figure 2 we see the following. In all cases, the posterior mean and median are close and in most cases they agree with the true parameter value. As the true value of increases, the posterior distribution ofbecomes more and more left skew. The approximate posterior distributions for
andlook rather similar to their prior uniform distributions. When the true value of
is 0, the posterior distribution of is very concentrated near 0, but it becomes more and more symmetric around the true value of as this increases. The spread of the posterior distribution of seems to increase as the true value of increases. Overall, the ABC procedure seems to be most successful for estimating and . Note that when fitting a Strauss process to a point pattern, ABCppAlan first estimated by maximum pseudolikelihood and then used this value of in their ABC procedure; in contrast we found no need to fix when fitting an LGCPStrauss process with our ABC procedure.In geostatistics, it is known that the scale and variance parameters of an exponential covariance function for a GRF are unidentifiable (see e.g. covar_unidentifiable). This might explain why the ABC procedure is not so successful when it comes to identifying these parameters. Therefore, we made the same analysis as in Figure 2 when is given. However, the posterior marginal distributions of the remaining parameters (not shown) looked very similar to those in Figure 2.
We now investigate how the ABC procedure for fitting an LGCPStrauss process works when the data is generated from some of the special cases of this process. For this purpose, we simulated a realisation of an LGCP with parameters , and , and a realisation of a Strauss process with parameters , and . Notice that when simulating under an LGCP, there is no need to employ the MCMC algorithm described at the beginning of Section 2.2. We used the faster method implemented in the function rLGCP from the package spatstat (spatstat). We used the same ABC procedure as above for fitting an LGCPStrauss process to these point patterns and the posterior results can be seen in Figure 3. For the point pattern generated from an LGCP, the true value of seems to be identified well when fitting the LGCPStrauss process. The posterior marginal distribution of is rather concentrated near , and a plot of the posterior samples of and (not shown) shows that very small values of appear together with very small values of . This indicates that the fitted LGCPStrauss process is close to the special case of an LGCP, which is the true model. Again, it seems to be difficult to identify and .
For the point pattern generated from a Strauss process, the marginal posterior distribution for is very concentrated near zero, which is the true value. The true value of seems to be well identified whereas the median and mean of the marginal posterior distributions of and are somewhat higher than the true values. It appears that for this example, the ABC procedure is not as successful for as it was for the point patterns in Figure 1. For the Strauss process, should be irrelevant, and a plot of the posterior samples of and (not shown) shows that is irrelevant for small values of .
3.3 Model checking and comparison
We are interested in whether the point patterns in Figure 1 can be distinguished from realisations of an LGCP and a Strauss process, so for comparison we also fitted an LGCP and a Strauss process to each point pattern, using the ABC procedure in Algorithm 1. We used the same summary statistics as for the LGCPStrauss process and the same prior distributions on the relevant parameters (that is, the parameters , and when fitting the LGCP, and the parameters , and when fitting the Strauss process). Again, when simulating under an LGCP, we used the faster method implemented in spatstat.
For model checking and comparison we suggest to make global envelope tests based on posterior predictions as follows. For each ABC realisation of , a realisation x of the process in question given is simulated. For each x, a functional summary statistic is estimated. These empirical curves are then used to construct global envelopes and corresponding tests based on extreme rank lengths (GET2017; GET2018: note that we only used 1 000 simulations instead of the recommended 2499, because the ABC procedure is rather time consuming). The Rpackage GET (GET2017) was used for this purpose.
In order to compare the fitted LGCPStrauss, LGCP, and Strauss process models, we used global envelopes based on posterior predictions and the empirical  and function, with where is the empty space function and is the nearestneighbour distribution function (see Jfunction). We also tried to use the  and functions for model validation but these functional summary statistics were unable to distinguish between the models.
Figure 4 shows combined global envelopes for the  and function, meaning that, under the LGCPStrauss process, the probability that both empirical curves are within their respective envelopes is approximately . To combine the envelopes we have used the twostep combining procedure described in GETinR. Note that the function can only be estimated reliably for all simulations in the interval , whereas the function can be estimated reliably on a larger interval.
In all cases, the values of the global envelope tests are highest in the situation of the LGCPStrauss process, which may indicate that they provide the best fit to data. The LGCP is rejected in the cases where and because the empirical functions in these cases are above the global envelopes at small interpoint distances. This indicates that the point patterns are more regular at small interpoint distances than what would be expected under the fitted LGCPs. For the case (the case with weakest inhibition), the LGCP cannot be rejected. Notice that the values of these tests are increasing as increases which is in agreement with the fact that the LGCPStrauss process approaches the special case of an LGCP.
The Strauss process model is rejected in all three cases because the empirical function clearly shows that the point patterns are more clustered at moderate to large interpoint distances than what can be modelled with a Strauss process. In the case , the empirical function also shows this, but for the remaining two cases the function is contained completely within the envelopes.
Overall, it appears that the function is best at criticizing the LGCP and the function is best at criticizing the Strauss process. The later may have something to do with the fact that the function can only be estimated on a relatively small interval. So, it is less likely to capture the aggregation, which happens on a larger scale, than the function which can be estimated on a bigger interval. When we use the function for model validation we keep in mind that it was also used in the ABC procedure which might lead us to conclude that the model fits better to data than it actually does.
In order to investigate how this model validation and comparison works when the LGCPStrauss process is overfitting, we also fitted an LGCP to the first point pattern in Figure 3 and a Strauss process model to the second point pattern in Figure 3 and compare them to the fitted LGCPStrauss process models (the global envelopes are not shown). For the realisation of an LGCP, the values of the combined global envelope test for the fitted LGCPStrauss and LGCP were and , respectively. Since the data is generated from an LGCP, both models should fit the data equally well, so the higher value for the LGCPStrauss process is probably a result of the fact that it is overfitting. For the realisation of a Strauss process model, the values of the combined global envelope test for the fitted LGCPStrauss and Strauss process were and , respectively. In this example, the values do not reveal the fact that the LGCPStrauss process is overfitting.
4 Data example
The first panel in Figure 5 shows the locations of 256 oak trees which suffer from frost shake (frost shake refers to cracks in the trunk of the tree) in a m rectangular region of Allogny in France. This data set is part of the Allogony data set from the Rpackage ads (ads). The data set of the oaks was analysed in parmodelAllogny using a parametric point process model with regularity on the small scale and aggregation on the large scale. The LGCPStrauss model should also be able to model this behaviour.
We used Algorithm 1 on this oak data set. Here, independent uniform prior distributions are chosen for on the interval , on , on , on , and on . Notice that the observation window for the oak data is much larger than the ones in Section 3.1, and the prior distributions are chosen to take this into account. Furthermore, when calculating the summary statistics for the ABC procedure, , , are now rectangular sets of the same size (see Section 2.2). Trace plots as those in Figure 7 (supplied in an appendix) suggested that iterations of the MCMC algorithm is a sufficient burnin for this example. Again, a pilot sample of simulations was used and the resulting ABC posterior sample consists of draws from the approximate posterior distribution.
The marginal posterior distributions, which are estimated from the ABC sample, can be seen in Figure 5. They are all clearly different from their uniform priors. The posterior distributions of and look approximately normal, whilst the posterior distributions of , and are right skew. Note that the posterior distribution of indicates strong repulsion between the points. The posterior distribution of , particularly its heavy tail, suggests some aggregation among the splited oaks.
The first plot in Figure 6 shows 95% combined global envelopes for the fitted LGCPStrauss process as described in Section 3. The overall behaviour of the observed point pattern seems to be captured well by the LGCPStrauss process, and the value is very high. For comparison, the remaining plots in Figure 6 show the corresponding envelopes for a fitted LGCP, a fitted Strauss process model, and the model fitted in parmodelAllogny, respectively. The LGCP and Strauss process models are fitted with the ABC procedure in Algorithm 1, and the envelopes are based on posterior predictions. The model in parmodelAllogny was not fitted in a Bayesian setup, so simulations under this model are not posterior predictions. For simulating this model, we used the technique suggested in parmodelAllogny. The combined global envelopes indicate that the LGCP model provides a poor fit to data. The tests conclude that both the Strauss process model and the model fitted in parmodelAllogny fit well. However, the values are lower than the corresponding value for the LGCPStrauss process, indicating that the later may provide a better fit.
5 Summary and future work
We have proposed a novel spatial point process model which enables capturing of regularity through pairwise interactions and aggregation through a Gaussian process realization. This doubly stochastic spatial point process generalizes both the customary log Gaussian Cox process and the customary Gibbs process. Because the likelihood is intractable for this model we have developed model fitting through an ABC method. We have provided both simulation investigation and a real data application in order to reveal the behaviour of process realizations and also our ability to fit the model and do full inference for given point pattern realizations.
Future work can consider marked point patterns or socalled multitype versions of our model (see e.g. textbook). Such multitype modelling may allow attraction or inhibition within types but also introduce attraction or inhibition between types. A different direction would consider spacetime versions. That is, a realization of the process is seen as a spatial point pattern by integrating over a window of time.
Acknowledgements
The research of the first two authors was supported by The Danish Council for Independent Research — Natural Sciences, grant DFF – 701400074 ‘Statistics for point processes in space and beyond’. The second author was also supported by the ‘Centre for Stochastic Geometry and Advanced Bioimaging’, funded by grant 8721 from the Villum Foundation.
References
Appendix A Trace plots for accessing the burnin for the simulation algorithm
Figure 7 shows trace plots of the number of points and close pairs for the MCMC algorithm when simulating an LGCPStrauss process for different draws of the parameter vector from it’s prior distribution which is described in Section 3. For each prior sample of , a realisation z of the GRF was simulated, and the MCMC algorithm was used to simulate the LGCPStrauss process given . This analysis was used to choose an appropriate burnin in Section 3.
Comments
There are no comments yet.