Kernel Mean Embedding Based Hypothesis Tests for Comparing Spatial Point Patterns

05/31/2019 ∙ by Raif M. Rustamov, et al. ∙ 0

This paper introduces an approach for detecting differences in the first-order structures of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way by introducing its approximate version tailored to spatial point processes. While the original embedding is infinite-dimensional and implicit, our approximate embedding is finite-dimensional and comes with explicit closed-form formulas. With its help we reduce the pattern comparison problem to the comparison of means in the Euclidean space. Hypothesis testing is based on conducting t-tests on each dimension of the embedding and combining the resulting p-values using the harmonic mean p-value combination technique. The main advantages of the proposed approach are that it can be applied to both single and replicated pattern comparisons, and that neither bootstrap nor permutation procedures are needed to obtain or calibrate the p-values. Our experiments show that the resulting tests are powerful and the p-values are well-calibrated; two applications to real world data are presented.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Comparison of spatial point patterns is of practical importance in a number of scientific fields including ecology, epidemiology, and criminology. For example, such comparisons may reveal differential effects of the environment on plant species spread, uncover spatial variation in disease risk, or detect seasonal differences in crime locations. While exploratory analyses are vital for obtaining deep insights about pattern differences, such analyses can be subjective unless supplemented with formal hypothesis tests.

In this paper we are interested in comparing the first-order structures [FirstOrderPatComp] of point patterns. To be more precise, consider two point processes and over the region with the first-order intensities given by ) and . Given realizations from these processes, we would like to detect whether there are statistically significant differences in the first-order intensities. However, testing for equality,

, is not flexible enough. For example, when studying the spatial variation in disease risk, the diseased population is only a small fraction compared to the control population; naturally, the corresponding observed patterns will differ significantly in the overall counts of points—yet this is irrelevant to the substantive question. The more appropriate null hypothesis posits that there exists a constant

such that . Equality within a constant factor means that the intensities have the same functional form of spatial variation. To avoid dealing with the nuisance parameter

, one can normalize the intensities to integrate to 1, giving rise to probability distributions

and over the region ; in [FirstOrderPatComp] these are called the densities of event locations. Now, our null hypothesis is equivalent to the equality , which is an instance of the two-sample hypothesis testing problem (see, e.g. [ANDERSON199441]).

In practice it is desirable to have nonparametric hypothesis testing approaches to pattern comparison that: a) capture a particular aspect of difference; b) can be applied to both single and replicated patterns; and c) do not depend on resampling methods for (re-)calibration. Early nonparametric tests for pattern comparison [DigglePatComp1, UteHahnJASA] probe for differences in the -functions [RipleyKFunction] of point patterns. Being based on a second-order property, the detected differences conflate the spatial variation in intensities with the interaction properties. Concentrating on the first-order properties, [rel_risk1, rel_risk2, rel_risk3]estimate the logarithm of the ratio between the intensities using kernel density estimation. Other approaches rely on counting events within pre-specified areas [ANDRESEN2009333, ALBAFERNANDEZ2016352]. The recent work [FirstOrderPatComp] detects differences in the first-order structure by looking at the -distance between kernel density estimates of the probability distributions above (i.e. and ). All of these first-order comparison approaches are limited to single patterns, and in practical situations most of them are calibrated with resampling methods.

In this paper, we introduce an approach that leverages the kernel mean embedding (KME) [mmd, MMD_review] to test for the equality , which allows us to detect differences in the first-order structure of point patterns. Our approach is based on introducing an approximate version of the kernel mean embedding, aKME. While the original KME is infinite-dimensional and implicit, our approximate kernel mean embedding is finite-dimensional and comes with explicit closed-form formulas. With the help of aKME, we reduce the pattern comparison problem to the comparison of means in the Euclidean space. We carry out the latter by applying -tests on each coordinate of the embedding, and obtain an overall -value using the harmonic mean -value combination technique [HarmonicP1958, HarmonicP], leading to well-calibrated and powerful tests as confirmed by the simulations. The main advantages of the proposed approach are that it can be applied to both single and replicated pattern comparisons, and that neither bootstrap nor permutation procedures are needed to obtain or calibrate the -values.

The ideas developed in this paper are in line with the recent surge of interest in applying reproducing kernel Hilbert space techniques to the comparison of probability distributions. For example, the Maximum Mean Discrepancy (MMD) is a measure of divergence between distributions [mmd]

which has already found numerous applications in statistics and machine learning. This has led to the general approach of kernel mean embeddings, see for example the the recent review

[MMD_review] and citations therein. Our approximate embedding has its roots in the Random Fourier Features [Rahimi:2007:RFL:2981562.2981710] and its application to the MMD [Fastmmd], although the scheme proposed here is tailored to the two-dimensional setting, and has the ability to provide high-order approximations. There has already been some interest in applying the reproducing kernel methodology to spatial point processes, the roots going back to the 1980s [bartoszynski1981, silverman1982] and more recently in [flaxman2017, jitkrittum_linear-time_2017, SteinPapangelou]. We discuss some of the connections between reproducing kernel machinery and kernel density estimation based methods commonly used with spatial point patterns in Section 2.

The main contributions of this paper are the proposed approximate kernel mean embedding (Section 3) and the hypothesis testing framework for comparison of point patterns (Section 4). After investigating the empirical properties of the resulting tests on simulated data (Section 5.1), we present applications of the methodology to two real world datasets (Section 5.2).

2 Preliminaries

Kernel Mean Embedding

Mathematically rigorous development of the kernel mean embedding requires the machinery of the reproducing kernel Hilbert spaces, and the interested reader is referred to [MMD_review]. For our purposes, it will be sufficient to concentrate on the dual version of the definition where the kernel mean embedding is expressed in terms of the feature maps.

Given a data instance (in our context will be a point in some region of ), a nonlinear transformation can be used to lift this point into a feature space that is usually high/infinite dimensional. When the entire dataset is lifted like this, the hope is that a good feature map would reveal structures in the dataset that were not apparent in the original space . For example, in the context of classification in machine learning, two classes that are hard to separate in the original space may turn out linearly separable in the feature space. This is the foundation of kernel based approaches in machine learning (see e.g. [ElementsStatisticalLearning]), with the idea going back to [cover_theorem].

Assuming that an inner product is available on , with each feature map one can associate the kernel function defined by . It turns out that the converse is true as well: for each positive semi-definite (psd) kernel function there exist a corresponding inner-product space together with a feature map satisfying .

For a given psd kernel , the kernel mean embedding (KME) of a probability distribution over is defined as the expectation

An important property of this embedding is that when is a characteristic kernel, the embedding is injective [MMD_review]. In other words, for two distributions and over , the equality is satisfied if and only if . This means that in the context of two-sample hypothesis testing, intuitively, the problem of testing whether can be reduced to testing , albeit in the infinite-dimensional space .

Two-Sample Tests with KME

To avoid the infinite-dimensionality and absent an explicit formula for , one can use the kernel trick to devise an actual hypothesis testing procedure. Consider the distance (as induced by the inner-product in ) between KMEs of and , which is known as the Maximum Mean Discrepancy (MMD) [mmd]:

The kernel trick, namely the application of the equality , allows to express the MMD in terms of without requiring an explicit formula for the feature map , namely,

In practice this statistic is estimated from the samples by replacing the expectations with the sample means, with a nuance that the first and third terms can either include the diagonal terms or exclude them; the former gives a biased and the latter gives an unbiased estimator. Large enough values of

provide evidence against the null hypothesis of .

This has been an extremely fruitful approach to two-sample testing as witnessed by the vast amount of the follow-up work, see the references in [mmd, MMD_review]. The distribution of MMD under the null hypothesis is not asymptotically normal nor has a practically useful explicit form except in big data settings where one can compute an incomplete estimator of MMD that is asymptotically normal; these cases are described in generality in [MMDyamada2018post]. As a result, in typical scenarios some kind of approximation or resampling procedure is required to perform the test, see [mmd] and pointers therein.

Connections to Kernel Density Estimation

The above described machinery has a close connection to kernel density estimation commonly used with spatial point patterns. For example, let us consider the two-sample test statistic proposed in

[ANDERSON199441], which was used by [Duong8382] in a three-dimensional setting and later adapted to the spatial point pattern comparison problem in [FirstOrderPatComp]. This statistic is constructed by first obtaining kernel density estimates of the two distributions and then computing the squared -distance between the density estimates. Its connection to MMD is described in [mmd, Section 3.3.1] and can be exemplified as follows: if, say, the kernel density estimate is based on the Gaussian RBF kernel of width , then this statistic is equivalent to the MMD computed with the same kernel but of width . Overall, MMD statistic is strictly more general than the statistic of [ANDERSON199441] because it allows using a larger variety of kernels; another benefit is the stronger consistency guarantees that stem from the kernel mean embedding interpretation. Similarly to the MMD, the statistic of [ANDERSON199441] is not asymptotically normal under the null; nevertheless, a normal approximation was suggested in [Duong8382]. However, this approximation leads to calibration problems: [FirstOrderPatComp] reported conservativeness of the resulting test, which led them to rely on bootstrap for calibration.

Of course, instead of the -distance, other distances can be computed between the kernel density estimates to obtain statistics that capture complementary aspects of the differences between distributions. Similarly, one can realize that the MMD captures only one aspect of the discrepancy between the kernel mean embeddings. For example, the kernel Fisher discriminant analysis (KFDA) test statistic [KernelFisherDA1] is another statistic that can be computed via the kernel trick and used to test the equality . Similarly to the relationship between MMD and the squared -distance, the KFDA is connected to the -distance between kernel density estimates. An insightful review of this type of connections can be found in [KernelFisherDA2].

3 Approximate Kernel Mean Embedding

Instead of relying on the kernel trick, in this section we take an orthogonal path to avoiding the infinite-dimensionality of the kernel mean embedding. Namely, specializing to the case , we construct a finite-dimensional approximate feature map such that . As a result, testing can be reduced to testing in the -dimensional Euclidean space.

For the rest of the paper we will concentrate on the Gaussian RBF kernel , where . To avoid clutter in the derivations the kernel width is assumed to be ; the case of general is mentioned below. Our goal is to construct an approximate feature map for the Gaussian RBF kernel tailored to the two-dimensional case. As a starting point, we will use the following representation for the Gaussian kernel (see, e.g. [SkolkovoKernelApprox]),

where . To compress the notation, let where the dependence on and is suppressed from the notation. We change to the radial coordinates, and notice that is an even function to obtain:

where

is a unit vector in

making an angle of with the -axis. We approximate the integration with respect to by the simple mean over a set of equidistant , . Let to get

Now we use a radial version of Gauss-Hermite quadrature to compute the integral over [jackel-2005a]. For a given integer , denoting the quadrature roots by and the corresponding weights by , we obtain111The values of the roots and weights can be retrieved from the following URL: http://www.jaeckel.org/RootsAndWeightsForRadialGaussHermiteQuadrature.cpp

The crucial point is that the terms containing and are untangled, which allows us to write approximately as a dot product in the embedding space: , where , is explicitly given by

Figure 1: Approximate embedding can be seen as projecting the point pattern onto a line followed by the application of functions with the appropriate frequencies. The overall dimensionality of the resulting embedding is . Figure adapted from [Rahimi:2007:RFL:2981562.2981710].

Figure 1 (adapted from [Rahimi:2007:RFL:2981562.2981710]) gives an intuitive depiction of this construction. An R implementation of the embedding step can be found in Appendix A. The circular ensemble interpretation of [Fastmmd] applies to our construction as well, giving further insights and clarifying the relationship to the MMD.

While there is a clear connection to the Random Fourier Features [Rahimi:2007:RFL:2981562.2981710], yet our construction is tailored to the two-dimensional setting and is based on a deterministic quadrature rule that quickly provides a high-order approximation. In contrast, the construction in [Rahimi:2007:RFL:2981562.2981710] and its improvements [KernelApprox1, KernelApprox2, SkolkovoKernelApprox]

are geared towards high-dimensional data and, thus, depend on stochasticity to avoid the curse of dimensionality. Nevertheless, the hypothesis tests proposed in Section

4 do generalize to such embeddings and, consequently, can be applied in high-dimensional settings.

We note that the case of the Gaussian kernel with the width can be accommodated by replacing with . To avoid committing to a single , in practice, we construct multiple feature maps corresponding to different values of the kernel width, and concatenate them together.

Finally, having introduced the approximate feature map, we can define the approximate kernel mean embedding (aKME) of a probability distribution over the region as From a theoretical point of view, this approximate construction loses the injectivity of the kernel mean embedding, and, consequently, the consistency of the test. However, as we will see in the experimental section, this does not lead to any apparent issues with the testing power in practical situations. We believe that this is related to the approximation quality of the proposed embedding whereby one is able to capture most of the information about the distribution in a small number of dimensions.

3.1 Spatial Point Pattern aKME

Let be a point process on the region with the first-order intensity function given by ). The intensity function of can be normalized to give the probability distribution on :

(3.1)

In [FirstOrderPatComp] this is called the density of event locations. The aKME of the point process is defined via the embedding of , namely, In practice, we only have access to a point process via its realizations, and so, we must be able to estimate the aKME in an unbiased manner from a point pattern; this is the focus of the current subsection.

Consider a realization of the point process . The estimator that we will be interested in is the one that replaces the expectation appearing in the aKME formula by the sample mean:

(3.2)

We require this estimate to be unbiased, , where the expectation is taken over the realizations of the process . Unbiasedness is essential as it ensures that the estimate captures information about the functional form of the intensity function rather than some other properties of the point process.

For Poisson processes it is easy to see that unbiasedness holds, yet writing out an explicit proof will make clear what properties of the point process are relevant to unbiasedness. To this end, let be a Poisson process with intensity , where the constant is the integral in the denominator of Eq. (3.1). We will condition on the number of the points in the realization, . As well-known, this can be seen as a Binomial process on , which we denote by ; the intensity of is given by . We can write,

Using Campbell’s formula we have

Substituting this back, we get

which proves unbiasedness.

The crucial ingredient in the derivation is the following: conditioning on the number of points in the pattern does not have an effect on the functional form of the intensity. For example, above it went from to , which is a change by a constant factor. While theoretically it is possible to construct point processes where this does not hold, the simplicity of this requirement makes it possible to use domain knowledge to assess how likely is the condition to hold in practice. For example, in plant biology applications the functional form of the intensity is determined by the environmental factors such as light conditions, level of water underground, and quality of soil, which makes it plausible to assume that the condition holds.

The non-Poisson case is relevant to the proposed hypothesis test for replicated pattern comparison. Different replications commonly have differing numbers of points. If the condition above is not satisfied—the point pattern properties depend profoundly on the number of points—then the replicated pattern comparison cannot be performed by any test, unless all of the involved patterns have exactly the same number of points. In this latter case our test can be used too because for a fixed number of points unbiasedness holds directly by Campbell’s formula.

4 Comparing Spatial Point Patterns with aKME

Consider two point processes and in with the first-order intensity functions given by ) and . We would like to test the null hypothesis of whether there exists a constant such that . Equality within a constant factor means that the intensities of the two processes have the same functional form. This is different from testing because our null hypothesis can hold true even if the realizations from and have vastly differing numbers of points. We can compute the probability distributions and from the intensities ) and using the formula Eq. (3.1). Clearly, the null hypothesis above can be written as , allowing us to use the aKME machinery and reducing the problem to testing whether holds in the -dimensional Euclidean space.

Single Pattern Comparison

Assume that we observe two point patterns and from two Poisson processes and on the region with the true intensity functions given by ) and . We would like to test the null hypothesis of whether there exists a constant such that .

Applying the aKME machinery, we reduce the test to checking whether in the -dimensional Euclidean space. While the classic Hotelling’s test seems like a natural choice, yet in practice even the computation of the precision matrix turns out to be unstable due to the nonlinear relationships between the coordinates of the approximate feature map. This is why we perform the test by applying -tests independently to each coordinate and combining the resulting -values as explained below.

More precisely, for each coordinate , by Poisson assumption we can consider and to be two independent samples of sizes and respectively; here is the -th coordinate of the approximate feature map. Our goal is to compare the sample means and

. Note that since the approximate feature maps are expressed in terms of sines and cosines, their range is bounded, and so the classical Central Limit Theorem can be applied to deduce the approximate normality of the sample means

and when the sample sizes are big enough. As a result, we can apply the Behrens-Fisher-Welch

-test (without assuming equality of variances) to obtain the corresponding

-value .

All together we end up with a set of -values one per coordinate of the aKME. In principle, one can resort to multiple testing procedures to reject or retain the global null. However, it is useful to compute an overall -value for the the global null hypothesis so that it can be used in downstream multiple testing procedures. To obtain such an overall -value we use the recently popularized harmonic mean -value combination approach [HarmonicP1958, HarmonicP]. Namely, one defines

(4.1)

where is a function whose precise form is described in [HarmonicP]. While at the theoretical level this approach assumes independence between -values, yet as shown in [HarmonicP] it is very robust when there are dependencies. We observed this to be true with our -values as well: despite the dependencies, we found that obtained in this manner resulted in well-calibrated tests with better power than simple alternatives such as the Bonferroni adjustment for multiple testing. Other -value combination approaches such as Fisher’s and Stouffer’s methods (see, e.g. [pvalue_combo1, pvalue_combo2]) failed to give well-calibrated -values likely due to their strong reliance on the independence assumption.

Replicated Pattern Comparison

While the above test was based on the Poisson assumption, we consider another setting where the only assumption on the point processes is that the sample mean gives an unbiased estimate of aKME. Here, for two such general processes and on the region with true intensity functions given by ) and , we observe a set of patterns and independently drawn from and respectively. Our goal is to test whether , which is reduced to testing exactly as before.

In this setting, we can consider and as independent samples of sizes and . This allows us to test for the equality of -th coordinates using the -test. Here, the validity of the -test stems from independence of replications and holds for general processes; this should be contrasted to the above test which derives its validity from the independence of samples in the Poisson process. The set of -values obtained from the -tests are combined once again using Eq. (4.1) to obtain an overall -value .

We finish with an important remark about the nature of this test. The point processes and do not have to be of the same type for the null hypothesis to hold. The test is concerned only with the functional form of the intensity, and not the type of the process. For example, if is a homogeneous inhibition process and is a homogeneous cluster process, then the null hypothesis still holds due to the first-order intensity functions being the same or differing by a constant factor.

5 Experiments

Our goal in this section is to investigate the size and the power of the proposed tests. We also demonstrate two applications to real world data. The aKME embedding is constructed using four radial projections and four roots for the polar Gauss-Hermite formula (i.e. , , and in the notation of Section 3) resulting in . To avoid the selection of the kernel width parameter, we concatenate together aKMEs corresponding to and , when the point pattern domain is the unit square; the dimensionality of the concatenated aKME is .

5.1 Simulations

Single Pattern Comparison

Our simulation setting for single pattern comparison closely follows the design of [YongtaoGuan2008]. We generate a homogeneous Poisson process (CSR) and inhomogeneous Poisson processes (Linear and Sine) on the unit square . For the inhomogeneous processes the intensity functions are given by and , where is the abscissa and takes the values of . All of the intensity functions are normalized so that the expected number of points per realization is roughly and . Each type of model is simulated 2000 times.

Model Mean Point Count Guan,
Size CSR 100 0.012 0.060 0.105
400 0.012 0.056 0.099
Linear 100 1 0.010 0.054 0.096 0.096
2 0.014 0.060 0.110 0.130
400 1 0.008 0.046 0.088 0.094
2 0.008 0.050 0.090 0.098
Sine 100 1 0.012 0.056 0.105 0.100
2 0.012 0.047 0.085 0.108
400 1 0.010 0.054 0.100 0.082
2 0.007 0.053 0.095 0.114
Power CSR 100 0.644 0.746 0.790
400 0.924 0.966 0.978
Linear 100 1 0.550 0.627 0.678 0.200
2 0.690 0.754 0.806 0.482
400 1 0.844 0.918 0.944 0.466
2 0.912 0.956 0.971 0.994
Sine 100 1 0.780 0.873 0.909 0.728
2 0.815 0.886 0.916 1.000
400 1 1.000 1.000 1.000 1.000
2 1.000 1.000 1.000 1.000
Table 1: Rejection rates for the single pattern comparison test when the Poisson assumption holds.

Table 1 lists rejection rates at the nominal levels and . The top part of the table corresponds to the case where the patterns being compared come from the same intensity model. The resulting rejection rates give the size of the test; note that all of them are close to the nominal sizes, which confirms that our test is well-calibrated. In the bottom half of the table the patterns being compared come from different intensity models; the corresponding rejection rates give the power of the test. For comparison, in the last column we include the rejection rates at level taken from [YongtaoGuan2008] (specifically, Table 1 with the choice of ; the values for CSR were not available). Note that while our test compares two point patterns, Guan’s test is a goodness-of-fit test, whereby it compares a pattern against a model, and so it has access to the theoretical quantities, which, in principle, gives it a competitive advantage. Nevertheless, it can be seen from the table that our test compares very favorably to Guan’s test in terms of power.

Model
Size Hardcore-1 0.012 0.046 0.090
Hardcore-2 0.008 0.035 0.073
Hardcore-3 0.002 0.009 0.020
Cluster-1 0.056 0.164 0.271
Cluster-2 0.160 0.372 0.508
Cluster-3 0.413 0.686 0.780
Table 2: Single pattern testing depends on the validity of the Poisson assumption. When it is violated, the test size does not match the nominal rate; inhibition and clustering have opposite effects on the size.
Model
Size Cluster-1 0.011 0.051 0.086
Cluster-2 0.014 0.048 0.086
Cluster-3 0.015 0.050 0.078
Table 3: Size of the single pattern comparison test when the Poisson assumption is violated due to clustering. Using the effective sample size in the -tests for clustering processes results in sizes that are close to the nominal rate.

Next we investigate what happens when the Poisson assumption is violated. To this end we run comparisons between patterns from homogeneous Poisson process and patterns from homogeneous non-Poisson processes. This is the null case, because the functional form of intensity is the same (constant) for all of these cases. The non-Poisson processes we consider are generated using the spatstat package [spatstat] and are as follows. We use Matern’s Model II inhibition process with the inhibition distance parameter setting of ; these are referred to as Hardcore-1,2,3. We also generate realizations from Matern’s cluster process with the radius parameter value of 0.1, and with the mean number of points parameter ; these are called Cluster-1,2,3 respectively. The underlying intensity for all of the processes is set to yield an average of 100 points per pattern.

Table 2 lists the resulting null rejection rates which correspond to the size of the test. We see that the inhibition process results in the test becoming more conservative; we conjecture that this is true in general, and so, when inhibition can be argued based on the domain knowledge, the rejections obtained via our test are still valid. In contrast, clustering quickly leads to an anti-conservative test. The latter is not surprising, as the clustered points cannot be seen as independently drawn. One way of looking at this is that the effective sample size [EffectiveSampleSize0, EffectiveSampleSize1, EffectiveSampleSize2] in a clustered process is lower than the number of points in the pattern. Next, we investigate whether correcting the -tests for the sample size can bring the test size to the nominal level. We made a back-of-the-envelope computation of the effective sample size by the formula: ; note that this is different from the naive due to clusters with zero points being deleted. The resulting sizes are shown in Table 3, and match the nominal rates more closely. Of course, in practice, one pattern is not enough to deduce whether the clustering is due to the heterogeneity of the intensity or due to clustering; thus, some modeling assumptions on the point process would be required in order to carry out such a correction.

Replicated Pattern Comparison

Since in the replicated pattern test does not require the Poisson assumption, we generate seven classes of point patterns first homogeneous and second non-homogeneous, yielding fourteen types of patterns in total. For homogeneous versions, we generate CSR, Hardcore-1,2,3, and Cluster-1,2,3. To obtain the non-homogeneous versions of these, we start with a homogeneous version and independently thin it with spatially varying retention probability of ; this is based on [10.2307/4541321]. As a result, we obtain inhomogeneous point patterns where the intensity has the functional form . All generation processes were manipulated so that the expected number of points per realization is roughly . The first row in the Table 4 lists the rejection rates corresponding to the cases when the patterns compared have the same (in-)homogeneity type without regard for the class of the pattern. This corresponds to the null case, and we can see that the sizes are close to the nominal levels. The second row lists the rejection rates when the the (in-)homogeneity types differ, and so correspond to the power of the test.

Size 0.008 0.045 0.090
Power 0.764 0.900 0.938
Table 4: Rejection rates for the replicated pattern comparison test.

5.2 Real-World Data

Single Pattern: Cancer Data

We apply the single pattern comparison test to a dataset from an epidemiological study relating to the locations of larynx and lung cancer occurrences in Chorley-Ribble area of Lancashire, England during the years of 1974-1983; the source of the data is [DiggleIncinerator]. Figure 2 plots the 58 cases of larynx cancer and 978 cases of lung cancer together with the location of an industrial incinerator in this area. Following [DiggleIncinerator], we assume that the inhomogeneous Poisson process model applies, and that the distribution of the lung cancer cases can be used as a surrogate for the susceptible population.

Figure 2: Locations of larynx (red dots) and lung (blue pluses) cancers together with the location of the industrial incinerator (black circle cross).
Figure 3: Geospatial window used for the Chicago Crime experiment. The window coincides with the OpenStreetMap tile at zoom level of 10 that covers the city of Chicago and the surrounding region. Map data copyrighted OpenStreetMap contributors and available from https://www.openstreetmap.org.

Let the corresponding true intensity functions be ) and . If there is an effect of the relative location of the incinerator on larynx cancer (and not on lung cancer), then the functional forms of these intensities would be different, i.e. there would be some non-constant function such that . If, furthermore, depends on the distance to the incinerator, this would be an evidence of a differential effect of the incinerator on the two types of cancer.

Considering the larynx cancer locations as the point pattern and the lung cancer locations as the pattern , we would like to test whether the functional form of the intensity is the same for these patterns. Note that the null hypothesis states that there is a constant such that the intensities of the two point processes satisfy . Applying our methodology, we find the -value of , which leads us to retain the null hypothesis; this is in agreement with the re-analyses of this data such as [DigglePatComp1].

Replicated Patterns: Chicago Crime

In this experiment we use the dataset provided by Chicago Data Portal222data.cityofchicago.org that reflects reported incidents of crime that occurred in the City of Chicago. We consider all of the crimes during the year of 2018 within a geospatial window (shown in Figure 3), and the goal is to compare the spatial patterns of each crime type between weekdays and weekends. We wanted to pick days that are maximally spaced out and avoid ambiguity (e.g. Friday nights), so we decided to compare the following days of the week: Tuesday, Thursday, and Saturday. The comparison between Tuesday and Thursday serves as a sanity check, as we do not expect to see any differences between them.

For each day and type of crime we generate the crime patterns, and filter by the count of points keeping the patterns that have more than 20 points. Next we count how many distinct patterns are available for each day and type of crime; we kept the crime types that had at least 20 patterns for each of Tuesday, Thursday, and Saturday. The types of crimes remaining after this filtering are shown in Table 5 together with the number of days and the average count of points per pattern.

The results of comparisons are shown in the last two columns of Table 5. We run the Benjamini-Hochberg [FDR] procedure on the 18 resulting -values at the false discovery rate of , and the rejected hypotheses are indicated by the -values in bold. As expected, no differences were detected between Tuesday and Thursday patterns. On the other hand, we see that there are statistically significant differences between Tuesday and Saturday patterns in the following categories of crime: theft, criminal damage, deceptive practice, and battery.

Crime Type Tuesday Thursday Saturday Tue vs Thu Tue vs Sat
#Days #Pts #Days #Pts #Days #Pts -value -value
Theft 40 144.7 40 150.3 40 148.6 0.125 0.001
Criminal Damage 40 88.0 40 87.2 40 115.5 0.179 0.002
Deceptive Practice 40 43.0 40 42.8 40 36.2 0.825 0.006
Battery 40 50.5 40 53.6 40 61.4 0.566 0.014
Narcotics 40 19.9 40 19.4 40 22.9 0.099 0.051
Other Offense 40 28.1 40 27.1 40 30.1 0.915 0.613
Motor Veh. Theft 40 25.9 40 25.4 40 22.8 0.475 0.657
Assault 40 40.3 40 41.4 40 37.9 0.987 0.725
Burglary 40 32.8 40 31.6 40 28.1 0.788 0.968
Table 5: Application of the replicated pattern comparison to Chicago Crime dataset. The entries in bold correspond to the rejected hypotheses with the BH procedure at the FDR level of 0.1.

6 Conclusion

We have introduced an approach to detect differences in the first-order structure of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way, by introducing its approximate version. Hypothesis testing is based on conducting -tests on each dimension of the approximate embedding and combining them using the harmonic mean approach. Our experiments confirm that the resulting tests are powerful and the -values are well-calibrated. Two applications to real world data have been presented.

An interesting avenue for future research would be to leverage the directionality of the projections that correspond to the subsets of dimensions of the aKME (c.f. Figure 1). One can envision a technique where anisotropy in the first-order structure difference would be detected by appropriately combining the -values. Another aspect would be to investigate the scales at which the differences happen, this time combining the -values for each scale. The harmonic mean approach is suitable for such multi-facet and multi-level analyses as shown in [HarmonicP]. Another direction is studying the proposed approach in the context of goodness-of-fit testing of the first-order structure. Since our approach allows for vastly differing numbers of points in the patterns, one can compute a low-variance approximation to the theoretical aKME of the model distribution by constructing a pattern via drawing a large number of points from that distribution. Finally, it would be desirable to extend the proposed technique to the high-dimensional setting, where the curse of the dimensionality would be an important factor that can affect the testing power.

References

Appendix A R Implementation of Single Pattern Comparison

Here we provide an R implementation of the single pattern comparison test for the example demonstrated in Section 5.2, namely comparison of the larynx and lung cancer location patterns.

library(spatstat)
library(harmonicmeanp)
radialGaussHermiteData <- function(k) {
  if (k == 4) {
    Roots4 = c(
      0.3961205684809970482046989062,
      1.1767346714185921750900701015,
      2.2010676629189581946543092946,
      3.4836109980286615716953100856
    )
    Weights4 = c(
      0.2279981086730596270303131898,
      0.538464759426994159094177725,
      0.2215779133653168055615931299,
      0.0119592185346294083139159552
    )
      list(x = Roots4, w = Weights4)
  } else {
    stop(
      "Please refer to the values in
    )
  }
}
get_aKME <- function(p,
                     sigmas,
                     n_radial = 4,
                     n_linear = 4) {
  #radial unit vectors
  theta = seq(0, pi, length.out = n_radial + 1)[-(n_radial + 1)]
  W0 = rbind(cos(theta), sin(theta))
  #quadrature roots and weights
  temp = radialGaussHermiteData(n_linear)
  Roots = temp$x
  Weights = temp$w
  W = do.call(’cbind’, lapply(seq_along(Roots), function(i)
    Roots[i] * W0))
  coeff = rep(sqrt(Weights / n_radial), each = nrow(W0))
  #project the points on the lines, c.f. Figure 1
  tt = cbind(p$x, p$y) %*% W
  #embedding for each sigma
  emb = lapply(seq_along(sigmas), function(i)
    cbind(cos(tt / sigmas[i]) * coeff, sin(tt / sigmas[i]) * coeff))
  #concatenated embedding
  do.call(’cbind’, emb)
}
#load data
d = chorley
d1 = split(d)$lung
d2 = split(d)$larynx
#we use several Gaussian widths at once
sigma_list0 = c(1 / 16, 1 / 8, 1 / 4) #these values are for [0,1]x[0,1] window
sigma_list = diameter(Window(d)) * sigma_list0 / sqrt(2)
aKME1 = get_aKME(d1, sigma_list)
aKME2 = get_aKME(d2, sigma_list)
#p-values for t-tests on each dimension of the embeddings
p_ttests = sapply(seq_len(ncol(aKME1)), function(i)
  t.test(aKME1[, i], aKME2[, i])$p.value)
#overall p-value
p_harmonic = p.hmp(p_ttests)
p_harmonic
#0.9053312
chorley˙code.R