Statistical thresholding for a wavelet-based estimation of the intensity of a Poisson process.
We take a wavelet based approach to the analysis of point processes and the estimation of the first order intensity under a continuous time setting. A multiresolution analysis of a point process is formulated which motivates the definition of homogeneity at different scales of resolution, termed J-th level homogeneity. Further to this, the activity in a point processes' first order behavior at different scales of resolution is also defined and termed L-th level innovation. Likelihood ratio tests for both these properties are proposed with asymptotic distributions provided, even when only a single realization of the point process is observed. The test for L-th level innovation forms the basis for a collection of statistical strategies for thresholding coefficients in a wavelet based estimator of the intensity function. These thresholding strategies are shown to outperform the existing local hard thresholding strategy on a range of simulation scenarios.READ FULL TEXT VIEW PDF
Statistical thresholding for a wavelet-based estimation of the intensity of a Poisson process.
The development of wavelet theory has been one of the most significant advances in signal and image processing. Wavelets’ ability to decompose an object at different scales makes them ideal for understanding underlying structures in random processes. Based on their success in analyzing time series (Percival and Walden, 2000), there has been an ever increasing interest in applying wavelets to point processes (e.g. Brillinger, 1997; Cohen, 2014). Representing a point process as , a random integer indicating the number of events that have occurred in the set , one may use the notation to be equal to for , for and (Daley and Vere-Jones, 1988). Wavelets have most commonly been used to estimate the first order intensity (rate) function defined as . Here, denotes the differential process . This is based on the fact we can represent any function as a linear combination of basis functions. Namely, for some and father and mother wavelet pair ,
where and , provided . To estimate , the task becomes estimating the coefficients and , where is the usual inner product on . This can be achieved by computing the stochastic integrals and , where is the set of random event times of the process. Both and
can easily be shown to be unbiased estimators ofand , respectively. Restricting the wavelet reconstruction up to some maximum resolution in (1), one can construct the estimator
, or when using wavelets to estimate probability density functions(Härdle et al., 1998)
, it is then typical that shrinkage or thresholding procedures are applied to the coefficients to reduce the variance of the estimator.
Estimating the intensity of a point process has of course been addressed numerous times in either parametric (e.g. Rathbun and Cressie, 1994) or non-parametric methods (e.g. Brillinger, 1975; Aalen, 1978; Ramlau-Hansen, 1983; Patil and Wood, 2004). In the specific case of wavelet based estimation, a non-parametric method, the approaches can be split into discrete-time and continuous-time methods. Discrete time methods (e.g. Timmermann and Nowak, 1999; Kolaczyk, 1999; Kolaczyk and Dixon, 2000; Fryzlewicz and Nason, 2004) typically apply a discrete wavelet transform (DWT) to the aggregated process , where and then perform a shinkage procedure. Besbeas et al. (2004) offers a comprehensive review of discrete time methods and provides a simulation study comparing various thresholding schemes.
Under the continuous time framework, the setting of this paper, Brillinger (1997) proposes the estimator in (2), as well as an estimator for the second-order intensity. The shrinkage procedure is proposed where
is an estimate of the standard error inand is the Tukey function. Although applied to California earthquake data, the properties of the estimator are not studied in any detail. De Miranda (2008) offers the first proper treatment of the continuous time formulation, providing the characteristic and density functions for the estimators of the coefficients and in terms of the basis for any continuous compactly supported wavelet of known closed form. This result is theoretically interesting but cannot be readily exploited as, apart from the Haar family, wavelets that fulfil all these criteria are rare and exotic. This work is extended in de Miranda and Morettin (2011)
to provide first and second order moments for the linear (no thresholding) intensity estimator for any compactly supported wavelet of known closed form. With
representing the characteristic function of the set, they also propose a hard threshold ( typically set to 3) but it is given little treatment.
Further thresholding procedures have been proposed in Bigot et al. (2013) under a Meyer wavelet basis and in Reynaud-Bouret and Rivoirard (2010) under any biorthogonal wavelet basis. Both of these estimators are shown to achieve near optimal performance in the asymptotic setting that , the number of observed independent realizations of the point process, goes to infinity. Further, the thresholding procedure of Reynaud-Bouret and Rivoirard (2010) does not require a compactly supported and bounded intensity to achieve asymptotic optimality. However, both thresholds are proportional to and are therefore only non-zero when , a highly restrictive condition for application purposes where one may only ever be able to observe a single realization. A thresholding procedure that can be applied in the setting but for which the statistical properties are still tractable is therefore clearly desirable. In this paper, we consider a wavelet based multiresolution analysis of a point process to propose statistical thresholding procedures of the intensity function. Statistical thresholding has previously been considered in Abramovich and Benjamini (1995) in the classical wavelet regression setting. Here we adapt it for point processes and show it is capable of providing estimates with just a single realization of the process (), while being grounded in a statistically principled and tractable framework.
In Section 2 we provide a background to wavelet estimation of point process intensities. We extend existing results to show that the linear wavelet estimator of
has a scaled Poisson distribution under a Poisson process and the Haar wavelet basis . Then in Section3 we develop the theoretical framework for a wavelet-based multiresolution analysis of a point process. Considering the first order properties of a point process to be due to activity on different scales, under the Haar basis we define different levels of homogeneity, which we term -th level homogeneity in reference to the particular scale
at which we are analyzing the point process. We provide a likelihood ratio test (LRT) for these different levels of homogeneity for the class of Poisson processes, providing the asymptotic distribution for the LRT statistic under the null hypothesis. We then consider a more general test for whether the intensity function exhibits activity at a particular scale, which we term-th level innovation. Again, we provide a LRT for this property for the class of Poisson processes under the Haar wavelet basis .
In Section 4, we demonstrate how the LRT for -th level innovation can be used as a method of statistical thresholding for wavelet coefficients, for which we propose three different forms: local, intermediate and global. Importantly, we demonstrate that under our LRT framework increasing
and increasing the intensity of the process are equivalent to one another, and hence indistinguishable in the asymptotic analysis. We are therefore able to use the asymptotic distributions to draw reliable inference and threshold the intensity in thesetting. We finish by providing a comprehensive simulation study comparing the three different statistical thresholding procedures presented in this paper with the hard thresholding procedure given in de Miranda and Morettin (2011). We demonstrate that one or more of the proposed statistical thresholding procedures outperform this hard thresholding in almost all circumstances.
A discussion on how the estimation and statistical thresholding procedures presented in this paper can be extended to Daubechies D4 wavelets can be found in Appendix A. Further discussions on the LRTs, including boundary cases can be found in Appendix B, all proofs are provided in Appendix C, and results of a comprehensive simulation study can be found in Appendix D.
In this section we provide a brief background to wavelet estimation of point process intensities. We will restrict ourselves to simple point processes, i.e. point processes that satisfy almost surely for all .
We summarize here essential definitions and results on wavelets that need to be stated prior to their application to the intensity function. The theory presented here follows the work of Meyer (1992).
A multiresolution approximation of is an increasing sequence , , of closed linear subspaces of with the following properties:
is dense in ;
for all and , ;
for all and , ;
there exists a function , such that the sequence , is a Riesz basis of the space .
It is also shown in Meyer (1992) that for a Riesz basis of , the sequence defined by is the canonical orthonormal basis of , where and
are the Fourier transforms ofand , respectively. is called either the father wavelet or scaling function. In this paper, we are concerned with point processes on the real line, and therefore we focus on the space . Defining to be the orthogonal complement of in , Definition 2.1 allows us to write
The spaces each have the basis and are called the approximation spaces. The spaces are called detail spaces and each have the orthonormal basis , where is called the mother wavelet and is constructed from the father wavelet. The mappings are called dyadic transformations. Consequently, a fundamental result from (3) is that for any , the set forms an orthonormal basis for . Furthermore, for any a function can be decomposed as
This identity, which illustrates the idea of multiscale analysis, will be used to decompose the first order intensity of a point process. In practice, a function is often approximated by its projection onto a specific approximation space , with . Expansion (4) is then reduced to:
As we increase , the function approximates with ever increasing accuracy such that as , where is the norm.
Consider a point process with a piecewise continuous intensity function , typically restricted to a finite length observation window . We write the following wavelet expansion for this intensity (de Miranda and Morettin, 2011; Brillinger, 1997):
where is fixed and called the coarse resolution level, and . We are required to estimate the coefficients and which we do so with and where are the event times for one realization of a point process on the time interval . Hence the general linear estimator of the intensity function based on its wavelet expansion is:
For a compactly supported wavelet function, Campbell’s theorem (Daley and Vere-Jones, 1988, Chapter 6) gives us
showing the coefficient estimators to be unbiased. This is a linear estimator as it involves no shrinkage of the coefficients.
For obvious computational reasons, we can not in practice use an infinite wavelet basis to reconstruct the intensity (the intensity may only be fully reconstructed when we know that its decomposition is actually finite). Therefore, we firstly have to choose a maximum resolution level . This maximum level plays a role in the bias-variance tradeoff of the estimator. Low values of result in a smooth (high bias, low variance) estimator, whereas large values of result in a noisy (low bias, high variance) estimator. The linear estimator then becomes the estimator of the projection of onto the space , and is noted from now on. In practice we usually set the coarsest level of resolution to 0. Also, with compactly supported wavelets and events restricted to a finite length observation window , the subset of translation indexes satisfying is finite.
A non-linear estimator is obtained by adding a coefficient shrinkage term, determined from a thresholding strategy. The use of shrinkage methods in the classical wavelet regression setting is well studied (e.g. Donoho et al., 1995) and is used as a smoothing method to suppress contributing terms from fine scales which typically contain noise. For point process intensity estimation, while we do not have a noise term per se, shrinkage strategies are again desirable for smoothing, with fine scale terms typically having high variance.
When reconstructing the intensity of a point process, we have two desirable properties for a wavelet function. The first is that it should have a closed-form expression; it will be shown that this is required to compute the estimator of the intensity function. Second, the wavelet should be compactly supported; this is because invariably we can only observe the point process on a finite interval and therefore compactly supported wavelets allow us to only consider a finite set of dyadic translations. In Figure 2.1 we show three examples of wavelet families; these are the Haar, Daubechies D4 and Meyer wavelets. Each family exhibits either one or both characteristics.
The Haar mother and father wavelets are defined as
These wavelets can be extended to the support with an orthonormality preserving rescaling and dyadic transforms of the type . Henceforth, we will drop the subscript and assume all wavelets are scaled for the support . Now consider a point process on . By construction, Haar wavelets have disjoint supports across all translations for a fixed scale, which justifies simplification of the indexes. The linear estimator of the intensity function based on its Haar wavelet expansion becomes:
Under the Haar wavelet basis, at scale and a translation we have .
In the case of Haar wavelets, we are able to derive the distribution of the estimator . The approximation space of interest, , naturally induces a subdivision of the interval . The elements of this subdivision, , are the supports of the Haar wavelets at scale and form disjoint subintervals of . The Haar reconstruction of the intensity and its linear estimator are piecewise constant functions, with forms and , respectively. Hence we can establish the exact distribution for this estimator under a Poisson process model.
Under the Haar wavelet basis and for an inhomogeneous Poisson process of intensity on , are independent random variables distributed as
are independent random variables distributed as
The proof can be found in Appendix C.2. The result can also naturally be extended to any other point process with a square integrable intensity function for which the distribution of the event counts in any time interval is known (e.g. a binomial point process). It follows that , for all . We will now use Proposition 2.1 to develop likelihood ratio tests for two newly defined multiscale properties of a Poisson process.
In this section, we will develop the theoretical framework for a wavelet-based multiresolution analysis of a point process. Considering the first order properties of a point process to be due to activity on different scales, under the Haar basis we define different levels of homogeneity under a multiresolution framework. We call this -th level homogeneity, and provide a likelihood ratio test for it for the class of Poisson processes.
Under a compactly supported wavelet family, we then consider a more general setting to describe any activity of the intensity function at a particular scale, which we term -th level innovation. We provide a likelihood ratio test for this property for the class of Poisson processes under the Haar basis. In Section 4, we will demonstrate how this test can be used as a method of thresholding coefficients in our wavelet estimator of the intensity function. In this section, it will be always assumed that the intensity is piecewise continuous and .
We use the Haar wavelet basis (rescaled if T is different than 1), because of its intuitive interpretation, its simplicity to implement and its amenability to statistical analysis. We consider the projection of the intensity on the Haar approximation space . With Haar wavelets, the reconstruction of the intensity at scale
is a piecewise constant function, and hence we can define a wavelet reconstruction vectorwhere is the value of on the subinterval , , . We use this formulation to define a property we call -th level homogeneity.
A point process on with intensity is considered level homogeneous if the reconstruction of the intensity at resolution with Haar wavelets, or its projection on , is constant on . That is, .
th-level homogeneity was introduced in Taleb and Cohen (2016) in terms of the projection of the intensity on . We propose that it is instead more convenient to base it on , i.e. every point process is level homogeneous as the projected intensity on is always a constant on . The concept of -th level homogeneity goes side by side with the idea of a multiresolution analysis of the intensity function, providing a natural way of studying on what scales the intensity function appears constant and hence the point process homogeneous, and on what scales the intensity function exhibits variability. If we define as the class of level homogeneous point processes, we have . Indeed we know from Remark 1 that for Haar wavelets, and therefore if .
Let be a point process with intensity . Then is constant almost everywhere on (i.e. almost everywhere) if and only if for all .
See Appendix C.3 for the proof. To avoid any confusion, we say that a point process with intensity is strictly homogeneous on when for all . Proposition 3.1 illustrates how strict homogeneity can be loosely interpreted as the limit extension of th-level homogeneity. Furthermore, Definition 3.1 naturally leads us to define th-level inhomogeneity.
A point process on with intensity is considered level inhomogeneous if it is level homogeneous and not level homogeneous.
We immediately remark that a level inhomogeneous point process cannot be level homogeneous for all . -th level homogeneity and inhomogeneity together describe the global behavior of a point process when viewed at a particular scale.
As the scope of this work is to analyse point processes in a multiscale fashion, we are not interested in testing the strict homogeneity of a Poisson process, which is the limit case for Definition 3.1 and has been thoroughly addressed in previous studies (e.g. Bain et al., 1985; Ng and Cook, 1999). We are instead aiming to statistically determine the resolution level where inhomogeneous behaviour appears. Recall that the choice of Haar wavelets implies that the wavelet reconstruction of the intensity , as well as the intensity estimator , are piecewise constant functions on the dyadic partition . Although a piecewise analysis has also been carried out in Fierro and Tapia (2011) as a basis for a similar LRT, the wavelet approach presented here gives a natural, multiresolution scheme for defining the subdivision of the process. We begin by considering the LRT for equal means of scaled Poisson distributions, the results of which we can then utilize to test -th level homogeneity of Poisson processes. This provides a comprehensive and rigorous treatment of the ideas first proposed in Taleb and Cohen (2016).
Let be a set of iid scaled Poisson random vectors, each with independent components of form , . The scale parameter is known and fixed so is parametrized by the vector . We consider testing the null hypothesis against the alternative hypothesis that states is not true. The LRT statistic is defined as
where is the likelihood of the data given parameter vector .
Let , with being the likelihood ratio statistic defined in (8). Then we have
where is the maximum likelihood estimator (MLE) for , the constant mean under the null hypothesis , and is the MLE for (), under the alternative hypothesis .
See Appendix C.4 for the proof. If there exists at least one index such that , we use the convention . Further discussion on the absence of points within intervals can be found in Appendix B.2. Now let be the number of free parameters under the null hypothesis and let be the number of free parameters under the alternative hypothesis , then under the null hypothesis and regularity conditions on the likelihood functions that are met here, as sample size (see Wilks, 1938; Van der Vaart, 2000). In this setting, and . In practice, the case is frequently encountered, and therefore we establish a more general and applicable result for the asymptotic distribution of .
Let () be independent and identically distributed dimensional random vectors where each is constructed from independent components . Let where is the likelihood ratio statistic defined in (8). Then the distribution of statistic is invariant to simultaneous changes in parameters and provided that all products , , remain constant. Furthermore, if , then as .
See Appendix C.6 for the proof111It has been shown in Feng et al. (2012) that the classic asymptotic distributional result for the test statistic does not hold if we are restricting ourselves to the case and low values of ( in their study). This refutes the opposite claim in Brown and Zhao (2002), which possibly resulted from a confusion between the number of parameters and the number of independent realizations of the Poisson vector.. It will now be shown that this result illustrates the practical advantage of Haar wavelets as it ensures that only one realization of the process is enough to conduct a LRT for -th level homogeneity.
Now let be a collection of independent realizations of the same Poisson process . Let be the set of independent random vectors where is the vector of all subinterval estimates of the intensity from . From Proposition 2.1, is a vector of independent scaled Poisson random variables and is therefore parametrized by the vector . We look to test the null hypothesis which states is level homogeneous, i.e. for some , against the alternative hypothesis which states is not true. The LRT statistic in this case is given as:
where is the likelihood of the data given parameter vector . Now using Proposition 3.2 we can write
where , statistic is the maximum likelihood estimator (MLE) of and is the MLE for (), under the alternative hypothesis . In this particular setting we have and , giving as asymptotically distributed with degrees of freedom under the conditions of Theorem 3.1. We reject -th level homogeneity at significance level if where , the critical value, is the upper point of the distribution.
Here, we demonstrate the LRT for -th level homogeneity through simulations. We consider a class of inhomogeneous Poisson processes on a time interval . These processes share a similar piecewise triangular intensity represented in Figure 3.1 and are defined as following:
where and , and is the index of the subinterval in which belongs. Parameter is the absolute value of the gradient and is the number of “triangles”. The intensity takes values between and and its mean value is the parameter . By construction, the quantity does not depend on , the process is level homogeneous and level inhomogeneous. We set the significance level of our test at , with , i.e. we observe just a single realization. The empirical type 1 error and power of the LRT (over 10000 simulations) at different values of are shown in Figure 3.1 as a function of , with .
In the example represented in Figure 3.1 where the process is level homogeneous, the empirical type 1 error lies close to the level as expected. When and -th level homogeneity no longer holds, the empirical power converges to 1 when . This behavior is expected as well. Indeed, this intensity model is proportional to and therefore its Haar reconstruction at any scale satisfies as well as . Since statistic tends to infinity as increases towards infinity when and a fixed , then the power of the LRT converges to 1. Hence the observed convergence of the empirical power to 1 when is fixed and increases towards infinity as ensured by Theorem 3.1. Similarly, the value of parameter influences the speed of this convergence. Moreover, we note the power decreases as we increase because the mass of the null distribution is displaced to the right as increases, making it harder for the test to distinguish between the two hypotheses.
In Section 2.1, we presented the decomposition where is the orthogonal complement of in and often called the detail or innovation space. With -th level homogeneity we focused on the behavior displayed on any space , which brings together contributions from several resolutions. Projecting on for increasing , we explore the intensity function in progressively finer resolutions. To characterize this, we introduce the concept of -th level innovation. We consider a wavelet family with compact support, and a point process on . For a particular scale , we note the subspace of the detail space generated by dyadic transformations of the mother wavelet whose support is included in . For example, for Haar wavelets, .
Let be a point process with intensity and let be a compactly supported wavelet family. We then say that possesses a level L innovation under if and only if there exist such that and
The justification behind only considering is motivated by the analysis of homogeneous Poisson processes. Since a homogeneous Poisson process is level homogeneous for all , we desire that it similarly displays no -th level innovation irrespective of and the wavelet family used. With a constant intensity on observation window , wavelets with non compact support will always produce an infinite number of non-zero wavelet coefficients and unbiasedness of their estimators is not guaranteed. Furthermore, compactly supported wavelets whose support is only partially contained within will also admit non-zero wavelet coefficients. -th level innovation is dependent on the wavelet family used to reconstruct the intensity. In the case of Haar wavelets, we will show in Section 3.4 it has an intuitive interpretation as the absence of any change in the integrated intensity between the left and right hand sides of the Haar wavelet. Admittedly, such an interpretation becomes less intuitive with alternative wavelets. We further comment that although defined according to a specific scale, -th level innovation also has an inherent temporal component. The translation index of non-zero coefficients given by wavelets in indicates the time localization of the corresponding innovation.
For the Haar wavelet, there is the following equivalence:
A point process is level homogeneous and possesses a level innovation.
A point process is level inhomogeneous.
We are now interested in testing for -th level innovation based on Definition 3.3 using the null hypothesis : “A point process possesses no -th level innovation under a wavelet family ”. To do so, we consider the vector of empirical wavelet coefficients corresponding to the wavelet basis for , which under the null hypothesis will be zero mean. As for -th level homogeneity, we define a likelihood ratio test for -level innovation under the Poisson process model and Haar wavelets. This test will again be a special case of a more general setting for multivariate Poisson random variables.
If a point process is level inhomogeneous, then such a test should take place for any given scale (as by Remark 2 we know there must be innovation at level ). Consider a subdivision of defined as in Section 2.2.2. Let be a collection of independent realizations of the same Poisson process on with intensity function , and let be a collection of independent random vectors , where is the event count for process in . With , for the Haar wavelets . Each count is distributed as where . Therefore, the estimators , are independent realizations of a scaled Skellam distribution (or Poisson difference distribution), each with parameters and . Since has mean , Definition 3.3 is then equivalent to the following property: “There exist such that is Skellam distributed with parameters ”. We can therefore build a likelihood ratio test for testing the null hypothesis : “ for all ”.
Since there does not exist an explicit expression for the MLE of the parameter given Skellam distributed random variables (instead having to be numerically approximated (Alzaid and Omair, 2010)), it is more appealing to design a likelihood ratio test based on the event counts themselves. This leads us to first consider a LRT for the general setting of testing pairwise equality of means of Poisson distributions, which will then be used for the specific setting of testing -th level innovation.
We define here a LRT for the pairwise equality of the means of a multivariate Poisson distribution. Let be a set of iid Poisson random vectors, each with independent components of form , . We consider testing the null hypothesis , against the alternative hypothesis that states is not true. The LRT statistic is defined as
where is the likelihood of the data given parameter vector .
Let , with being the likelihood ratio statistic defined in (9). Then