# Infill asymptotics and bandwidth selection for kernel estimators of spatial intensity functions

We investigate the asymptotic mean squared error of kernel estimators of the intensity function of a spatial point process. We show that when n independent copies of a point process in R^d are superposed, the optimal bandwidth h_n is of the order n^-1/(d+4) under appropriate smoothness conditions on the kernel and true intensity function. We apply the Abramson principle to define adaptive kernel estimators and show that asymptotically the optimal adaptive bandwidth is of the order n^-1/(d+8) under appropriate smoothness conditions.

## Authors

• 5 publications
04/16/2018

### Optimal mean squared error bandwidth for spectral variance estimators in MCMC simulations

This paper proposes optimal mean squared error bandwidths for a family o...
06/07/2021

### A consistent nonparametric test of the effect of dementia duration on mortality

A continuous-time multi-state history is semi-Markovian, if an intensity...
06/24/2020

### Second order asymptotic efficiency for a Poisson process

We consider the problem of the estimation of the mean function of an inh...
10/09/2020

### Uniform Deconvolution for Poisson Point Processes

We focus on the estimation of the intensity of a Poisson process in the ...
10/01/2018

### Investigating Spatial Error Structures in Continuous Raster Data

The objective of this study is to investigate spatial structures of erro...
12/22/2017

### A Bidirectional Adaptive Bandwidth Mean Shift Strategy for Clustering

The bandwidth of a kernel function is a crucial parameter in the mean sh...
03/08/2018

### Nonparametric estimation of the first order Sobol indices with bootstrap bandwidth

Suppose that Y = m(X_1, ..., X_p), where (X_1, ..., X_p) are inputs, Y i...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Often the first step in the analysis of a spatial point pattern is to estimate its intensity function. Various non-parametric estimators are available to do so. Some techniques are based on local neighbourhoods of a point, expressed for example by its nearest neighbours [7], its Voronoi [11] or Delaunay tessellation [13, 14]. By far the most popular technique, however, is kernel smoothing [6]. Specifically, let be a point process that is observed in a bounded open subset of

and assume that its first order moment measure exists as a

-finite Borel measure and is absolutely continuous with respect to Lebesgue measure with a Radon–Nikodym derivative known as its intensity function. A kernel estimator of based on takes the form

 (1) ˆλ(x0;h)=ˆλ(x0;h,Φ,W)=1hd∑y∈Φ∩Wκ(x0−yh),x0∈W.

The function is supposed to be kernel, that is, a

-dimensional symmetric probability density function

[15, p. 13]. The choice of bandwidth determines the amount of smoothing. In principle, the support of as a function of could overlap the complement of . Therefore, various edge corrections have been proposed [2, 9]. In the sequel, though, we will be concerned with very small bandwidths, so this aspect may be ignored.

The aim of this paper is to derive asymptotic expansions for the bias and variance of (

1) in terms of the bandwidth. This problem is well known when dealing with probability density functions. Indeed, there exists a vast literature, for example the textbooks [3, 15, 16] and the references therein. In a spatial context, bandwidth selection is dominated by ad hoc [2] and non-parametric methods [5]. The first rigorous study into bandwidth selection to the best of our knowledge is that by Lo [10] who studies infill asymptotics for spatial patterns consisting of independent and identically distributed points. Our goal is to extend his approach to point processes that may exhibit interactions between their points and to investigate adaptive versions thereof.

The plan of this paper is as follows. In Section 2 we focus on the regime in which independent copies of the same point process are superposed and the bandwidth tends to zero as tends to infinity but does not depend on the points of the pattern. We derive Taylor expansions and deduce the asymptotically optimal bandwidth. Intuitively, however, one feels that in sparse regions more smoothing is necessary then in regions that are rich in points. Indeed, in the context of estimating a probability density function, Abramson [1] proposed to scale the bandwidth in proportion to the square root of the density. Analogously, in Section 3 we let decrease in proportion to the square root of the intensity function and show that by doing so the bias can be reduced. For the sake of readability, all proofs are deferred to Section 4.

## 2 Infill asymptotics

Let be independent and identically distributed point processes for which the first order moment measure exists, is locally finite and admits an intensity function . For , let

 Yn=n⋃i=1Φi

denote the union. Upon taking the limit for , one obtains an asymptotic regime known as ‘infill asymptotics’ [12]. Since the are independent, the intensity function of is . Therefore , , may be estimated by

 (2) ˆλ(x0):=ˆλ(x0;h,Yn,W)n=1nn∑i=1ˆλ(x0;h,Φi,W).
###### Lemma 1

Let be a point process observed in a bounded open subset whose factorial moment measures exist up to second order and are absolutely continuous with intensity function and second order product densities . Let be a kernel. Then the first two moments of (1) are

 E[ˆλ(x0;h,Φ,W)]=1hd∫Wκ(x0−uh)λ(u)du

and

 E[(ˆλ(x0;h,Φ,W))2] = 1h2d∫W∫Wκ(x0−uh)κ(x0−vh)ρ(2)(u,v)dudv + 1h2d∫Wκ(x0−uh)2λ(u)du.

The proof follows directly from the definition of product densities, see for example [4, Section 4.3.3]. Provided , the variance of can expressed in terms of the pair correlation function defined by as

 1h2d[∫W×Wκ(x0−uh)κ(x0−vh)(g(u,v)−1)λ(u)λ(v)dudv+∫Wκ(x0−uh)2λ(u)du].

For Poisson processes, the first integral vanishes as .

In this paper, we will restrict ourselves to kernels that belong to the Beta class

 (3)

for . Here is the closed unit ball in centred at the origin. The normalising constant will be abbreviated by

 (4) c(d,γ)=∫b(0,1)(1−xTx)γdx=πd/2Γ(γ+1)Γ(d/2+γ+1),d∈N,γ≥0.

Note that Beta kernels are supported on the compact unit ball and that their smoothness is governed by the parameter . Indeed, the box kernel defined by is constant and therefore continuous on the interior of the unit ball; the Epanechnikov kernel corresponding to the choice is Lipschitz continuous. For the function is times continuously differentiable on .

The following Lemma collects further basic properties of the Beta kernels. The proof can be found in Section 4.1.

###### Lemma 2

For the Beta kernels , , defined in equation (3), the integrals

 ∫Rxiκγ(x)dxi=0=∫b(0,1)xixjκγ(x)dx1⋯dxd

vanish for all such that . Furthermore

 Q(d,γ):=∫Rdκγ(x)2dx=c(d,2γ)c(d,γ)2

is finite and so are, for all ,

 V(d,γ):=∫∞−∞⋯∫∞−∞x2iκγ(x)dx1⋯dxd=1d+2γ+2,
 V4(d,γ):=∫∞−∞⋯∫∞−∞x4iκγ(x)dx1⋯dxd=3(d+2γ+2)(d+2γ+4)

as well as, for and ,

 V2(d,γ):=∫∞−∞⋯∫∞−∞x2ix2jκγ(x)dx1⋯dxd=1(d+2γ+2)(d+2γ+4).

Their values do not depend on the particular choices of and .

For the important special case ,

 Q(2,γ)=(γ+1)2(2γ+1)π.

Lemma 1 can be used to derive the mean squared error of (2). Its proof can be found in Section 4.2.

###### Proposition 1

Let be independent and identically distributed point processes observed in a bounded open subset . Assume that their factorial moment measures exist up to second order and are absolutely continuous with strictly positive intensity function and second order product densities . Write for the union, , and let be a Beta kernel (3) with . Then the mean squared error of (2) is given by

 mseˆλ(x0) = (1hd∫b(x0,h)∩Wκγ(x0−uh)λ(u)du−λ(x0))2 + 1nh2d∫∫(b(x0,h)∩W)2κγ(x0−uh)κγ(x0−vh)(g(u,v)−1)λ(u)λ(v)dudv + 1nh2d∫b(x0,h)∩Wκγ(x0−uh)2λ(u)du.

The first term in the above expression is the squared bias. It depends on and the bandwidth but not on . The remaining terms come from the variance and depend on , on , on and on .

Our aim in the remainder of this section is to derive an asymptotic expansion of the mean squared error for bandwidths that depend on in such a way that as . In order to achieve this, first recall some basic facts from analysis. Let be an open subset of and denote by the class of functions for which all order partial derivatives exist and are continuous on . For such functions the order of taking partial derivatives may be interchanged and the Taylor theorem states that if and for all , then a can be found such that

 (5) f(x+h)−f(x)=k−1∑r=11r!Drf(x)(h(r))+1k!Dkf(x+θh)(h(k)),

where is the -tuple and

 Drf(x)(h(r)):=n∑j1,…,jr=1hj1⋯hjrDj1⋯jrf(x)

for .

We are now ready to state the main result of this section, generalising [10, Theorem 2] for the union of independent random points. The proof can be found in Section 4.2.

###### Theorem 1

Let be i.i.d. point processes observed in a bounded open subset with well-defined intensity function and pair correlation function . Suppose that is bounded and that is twice continuously differentiable with second order partial derivatives , , that are Hőlder continuous with index on , that is, there exists some such that for all :

 |λij(x)−λij(y)|≤C||x−y||α,x,y∈W.

Consider the estimator based on the unions , , and Beta kernel , , with bandwidth chosen in such a way that, as , and . Then, for , as ,

1. .

2. .

The bias depends on the second order partial derivatives of the unknown intensity function and on the smoothness parameter . The smoothness of the kernel, measured by , also plays a role. The leading term of the variance depends on and on the smoothness of the kernel.

Theorem 1 readily yields the asymptotically optimal bandwidth, cf. Section 4.2.

###### Corollary 1

Consider the setting of Theorem 1. Then

 mseˆλ(x0)=h4nV(d,γ)24(d∑i=1λii(x0))2+λ(x0)Q(d,γ)nhdn++O(h4+αn)+O(1nhd−1n).

The asymptotic mean squared error is optimised at

 h∗n(x0)=1n1/(d+4)⎛⎜ ⎜⎝dλ(x0)Q(d,γ)V(d,γ)2(∑di=1λii(x0))2⎞⎟ ⎟⎠1/(d+4).

In words, is of the order . Clearly tends to zero as . Moreover, is of the order to the and therefore tends to infinity with . For the special case ,

 h∗n(x0)=1n1/6(8λ(x0)(γ+1)2(γ+2)2(2γ+1)π(λ11(x0)+λ22(x0))2)1/6.

The following Proposition generalises [10, Proposition 5]. Its proof can be found in Section 4.2.

###### Proposition 2

Let be i.i.d. point processes observed in a bounded open subset with well-defined intensity function and pair correlation function . Suppose that is bounded and that is twice continuously differentiable with second order partial derivatives , , that are Hőlder continuous with index on . Consider based on the unions , , and Beta kernel , , with bandwidth chosen in such a way that as , and . Then, for , as ,

 ˆλ(x0)=λ(x0)+h2n∑di=1λii(x0)2(d+2γ+2)+O(h2+αn)+√λ(x0)Q(d,γ)OP(n−1/2h−d/2n).

Up to now, estimators based on (1) were considered in which the same bandwidth was applied at every point . However, at least intuitively, it seems clear that the bandwidth should be smaller in regions with many points, larger when points are scarce. This suggests that should be decreasing in .

Motivated by similar considerations in the context of density estimation, Abramson [1] suggested to consider point-dependent bandwidths of the form for equal to the square root of the probability density function. He found that a significant reduction in bias could be obtained by the use of such adaptive bandwidths. Our aim in this section is to show that a similar result holds for spatial intensity function estimation.

Define an estimator

 ~λ(x0)=1nn∑i=1ˆ~λ(x0;h,Φi,W)

of , , that is the average of data-adaptive estimators

 (6) ~λ(x0;h,Φi,W)=∑y∈Φic(y)dhdκ(x0−yhc(y)).

As in Section 2, is a kernel and the , , are independent and identically distributed point processes on observed in a bounded non-empty open subset for which the first order moment measure exists and admits an intensity function ; is assumed to be a measurable positive-valued weight function on .

The next result summarises the first two moments.

###### Lemma 3

Let be a point process observed in a bounded open subset , whose factorial moment measures exist up to second order and are absolutely continuous with intensity function and second order product densities . Let be a kernel. Then the first two moments of (6) are

 E~λ(x0)=1hd∫Wc(u)dκ(x0−uhc(u))λ(u)du

and

 E[(~λ(x0;h,Φ1,W))2] = 1h2d∫W2c(u)dc(v)dκ(x0−uhc(u))κ(x0−vhc(v))ρ2(u,v)dudv + 1h2d∫Wc(u)2dκ(x0−uhc(u))2λ(u)du.

The proof follows directly from the definition of product densities, see for example [4, Section 4.3.3]. For the special case , we retrieve Lemma 1.

Provided , the variance of , the average of the , can be expressed in terms of the pair correlation function as

 Var~λ(x0)=1nh2d∫Wc(u)2dκ(x0−uhc(u))2λ(u)du+
 (7) 1nh2d∫W∫Wc(u)dc(v)dκ(x0−uhc(u))κ(x0−vhc(v))(g(u,v)−1)λ(u)λ(v)dudv.

We are now ready to state the first main result of this section in analogy to [1, Theorem, p. 1218]. The proof can be found in Section 4.3.

###### Theorem 2

Let be i.i.d. point processes observed in a bounded open subset with well-defined intensity function and pair correlation function . Suppose that is bounded and that is bounded, bounded away from zero and twice continuously differentiable on with bounded second order partial derivatives , .

Consider the estimator with

 c(x)=√λ(x)λ(x0)

based on the unions , , and Beta kernel , , with bandwidth chosen in such a way that, as , and . Then, for , as ,

1. .

2. .

In comparison with Theorem 1, the variance is the same as that for a non-adaptive bandwidth. The bias term on the other hand is of a smaller order. Note that, since the leading bias term is not specified, Theorem 2 cannot be used to calculate an asymptotically optimal bandwidth. To remedy this, stronger smoothness assumptions seem needed.

###### Theorem 3

Let be i.i.d. point processes observed in a bounded open subset with well-defined intensity function and pair correlation function . Suppose that is bounded and that is bounded, bounded away from zero and five times continuously differentiable on with bounded partial derivatives.

Consider the estimator with

 c(x)=√λ(x)λ(x0)

based on the unions , , and Beta kernel , , with bandwidth chosen in such a way that, as , and . Then, for , as ,

1. , where

 A(u;x0) = Dgu(1)24D4c(x0)(u,u,u,u)+D4gu(1)24(Dc(x0)u)4 + D2gu(1)2{13Dc(x0)uD3c(x0)(u,u,u)+14(D2c(x0)(u,u))2} + D3gu(1)4(Dc(x0)u)2D2c(x0)(u,u)

and .

2. .

For the important special cases , the expression for may be simplified. All the proofs are given in Section 4.3.

###### Proposition 3

Consider the framework of Theorem 3 in one dimension . Then the coefficient of in the expansion of is

 λ(x0)V4(1,γ)24[−λ(iv)(x0)λ(x0)+8λ′′′(x0)λ′(x0)λ(x0)2+6(λ′′(x0))2λ(x0)2−36λ′′(x0)(λ′(x0))2λ(x0)3+24(λ′(x0))4λ(x0)4]

where and the superscript indicates the fourth order derivative.

###### Proposition 4

Consider the framework of Theorem 3 in two dimensions . Then the coefficient of in the expansion of is

 λ(x0){V4(2,γ)C4+V2(2,γ)C2},

with , and constants

 C4=2∑i=1[−112Diiiic(x0)+Dic(x0)Diiic(x0)+34(Diic(x0))2−6(Dic(x0))2Diic(x0)+5(Dic(x0))4]

and

 C2 = 30(D1c(x0))2(D2c(x0))2−6(D1c(x0))2D22c(x0)−6(D2c(x0))2D11c(x0) − 24D1c(x0)D2c(x0)D12c(x0)+3D1c(x0)D122c(x0)+3D2c(x0)D112c(x0) + 32D11c(x0)D22c(x0)+3(D12c(x0))2−12D1122c(x0).

Theorem 3 immediately yields the asymptotically optimal bandwidth, which should be compared with that in Corollary 1.

###### Corollary 2

Consider the setting of Theorem 3. Then

 mse~λ(x0)=λ(x0)2(∫RdA(u;x0)du)2h8n+λ(x0)Q(d,γ)nhdn+o(h8n)+O(1nhd−1n).

The asymptotic mean squared error is optimised at

 h∗n(x0)=1n1/(d+8)⎛⎝dQ(d,γ)8λ(x0)(∫RdA(u;x0)du)2⎞⎠1/(d+8).

The optimal bandwidth and the weights depend on the unknown intensity function. In practice, a non-parametric pilot estimator (for example the one proposed in [5]) would be plugged in.

To conclude this section, we present the analogue of Proposition 2. The proof can be found in Section 4.3.

###### Proposition 5

Let be i.i.d. point processes observed in a bounded open subset with well-defined intensity function and pair correlation function . Suppose that is bounded and that is bounded, bounded away from zero and five times continuously differentiable on with bounded partial derivatives. Consider with based on the unions , , and Beta kernel , , with bandwidth chosen in such a way that as , and . Then, for , as ,

 ~λ(x0)=λ(x0)+h4nλ(x0)∫RdA(u;x0)du+o(h4n)+√λ(x0)Q(d,γ)OP(n−1/2h−d/2n)

where is as defined in Theorem 3.

## 4 Proofs and technicalities

### 4.1 Auxiliary lemmas for the Beta kernel

Proof of Lemma 2: The first two claims follow from the symmetry of the Beta kernel. Furthermore

 Q(d,γ)=∫Rdκγ(x)2dx=1c(d,γ)2∫b(0,1)(1−||x||2)2γdx=c(d,2γ)c(d,γ)2.

Due to the symmetry of the Beta kernel it is clear that the definitions of , and do not depend on the choices of and . First consider the case . By the symmetry of and a change of variables , , it follows that

 V(1,γ)=∫∞−∞x2κγ(x)dx=2c(1,γ)∫10v(1−v)γ12v1/2dv=B(32,γ+1)c(1,γ)=12γ+3.

Similarly,

 V4(1,γ)=∫∞−∞x4κγ(x)dx=2c(1,γ)∫10v2(1−v)γ12v1/2dv=B(52,γ+1)c(1,γ)=3(2γ+3)(2γ+5).

For dimensions , write and as a repeated integral and note that the innermost integral takes the form

 ∫⎧⎨⎩s21−||x||2d−1≤1⎫⎬⎭sα(1−||x||2d−1−s2)γds

for . By the symmetry and a change of parameters , it follows that

 V(d,γ)=B(32,γ+1)c(d,γ)c(d−1,γ+32)

and

 V4(d,γ)=B(52,γ+1)c(d,γ)c(d−1,γ+52)

in accordance with the claim.

Finally for , can be written as

 ∫{||x||2d−1≤1}x2d−1c(d,γ)⎛⎜ ⎜⎝∫⎧⎨⎩s21−||x||2d−1≤1⎫⎬⎭s2(1−||x||2d−1−s2)γds⎞⎟ ⎟⎠dx1⋯dxd−1.

The inner integral is equal to so

 V2(d,γ)=B(32,γ+1)c(d,γ)c(d−1,γ+32)V(d−1,γ+32)

in accordance with the claim.

In the sequel, the following additional properties of the Beta kernels will be needed.

###### Lemma 4

Consider the Beta kernels with defined in equation (3). Then, for all ,

 ∫RduiDiκγ(u)du1⋯dud=−1,

the integrals of second order products in with respect to vanish and for distinct ,

 ∫Rduiu2jDiκγ(u)du1⋯dud = −V(d,γ) ∫Rdu3iDiκγ(u)du1⋯dud = −3V(d,