Kernel and wavelet density estimators on manifolds and more general metric spaces

05/12/2018 ∙ by G. Cleanthous, et al. ∙ Newcastle University Young’s fringes pattern obtained at 80 kV showing a point 0

We consider the problem of estimating the density of observations taking values in classical or nonclassical spaces such as manifolds and more general metric spaces. Our setting is quite general but also sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Kernel and both linear and nonlinear wavelet density estimators are introduced and studied. Convergence rates for these estimators are established, which are analogous to the existing results in the classical setting of real-valued variables.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A great deal of efforts is nowadays invested in solving statistical problems, where the data are located in quite complex domains such as matrix spaces or surfaces (manifolds). A seminal example in this direction is the case of spherical data. Developments in this domain have been motivated by a number of important applications. We only mention here some of the statistical challenges posed by astrophysical data: denoising of signals, testing stationarity, rotation invariance or gaussianity of signals, investigating the fundamental properties of the cosmic microwave background (CMB), impainting of the CMB in zones on the sphere obstructed by other radiations, producing cosmological maps, exploring clusters of galaxies or point sources, investigating the true nature of ultra high energy cosmic rays (UHECR). We refer the reader to the overview by Starck, Murtagh, and Fadili [27] of the use of various wavelet tools in this domain as well as the work of some of the authors in this direction [1] and [22].

Dealing with complex data requires the development of more sophisticated tools and statistical methods than the existing tools and methods. In particular, these tools should capture the natural topology and geometry of the application domain.

Our contribution will be essentially theoretical, however, our statements will be illustrated by examples issued from different fields of applications.

Our purpose in this article is to study the density estimation problem, namely, one observes

that are i.i.d. random variables defined on a space

and the problem is to find a good estimation to the common density function.

This problem has a long history in mathematical statistics especially when the set is or a cube in (see e.g. the monograph [29] and the references herein). Here we will consider very general spaces such as Riemannian manifolds or spaces of matrices or graphs and prove that with some assumptions, we can build an estimation theory with estimation procedures, regularity sets and upper bounds evaluations quite parallel to what has been neatly done in . In particular we intend to develop kernel methods with upper bounds and oracle properties as well as wavelet thresholding estimators with adaptative behavior.

If we want to roughly summarize the basic assumptions that will be made in this work, let us mention that some of them are concerning the basic dimensional structure of the set (doubling conditions), whereas others are devoted to construct an environment where regularity spaces can be defined as well as kernels or wavelets can be constructed.

This setting is quite general but at the same time is sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Naturally, the classical setting on and the one on the sphere are contained in this general framework, but also various other settings are covered. In particular, spaces of matrices, Riemannian manifolds, convex subsets of (non-compact) Riemannian manifolds are covered.

As will be shown in this general setting, a regularity scale and a general nonparametric density estimation theory can be developed in full generality just as in the standard case of or . This undertaking requires the development of new techniques and methods that break new ground in the density estimation problem. Our main contributions are as follows:

In a general setting described below, we introduce kernel density estimators sufficiently concentrated to establish oracle inequalities and

-error rates of convergence for probability density functions lying in

Besov spaces.

We also develop linear wavelet density estimators and obtain -error estimates for probability density functions in general Besov smoothness spaces.

We establish -error estimates on nonlinear wavelet density estimators with hard thresholding in our general geometric setting. We obtain such estimates for probability density functions in general Besov spaces.

To put the results from this article in perspective we next compare them with the results in [2]. The geometric settings in both articles are comparable and the two papers study adaptive methods. In [2]

different standard statistical models (regression, white noise model, density estimation) are considered in a Bayesian framework. The methods are different (because we do not consider here Bayesian estimators) and the results are also different (since, again, we are not interested here in a concentration result of the posterior distribution). It is noteworthy that the results in the so called

dense case exhibit the same rates of convergence. It is also important to observe the wide adaptation properties of the thresholding estimates here which allow to obtain minimax rates of convergence in the so called sparse case, which was not possible in [2].

The organization of this article is as follows: In Section 2, we describe our general setting of a doubling measure metric space in the presence of a self-adjoint operator whose heat kernel has Gaussian localization and the Markov property. We provide motivation and inspiration for our developments and we present some first examples, both elementary and more involved. In Section 3, we review some basic facts related to our setting such as smooth functional calculus, the construction of wavelet frames, Besov spaces, and other background. This section can be read quickly by a reader more motivated by the introduction of estimation procedures. We develop kernel density estimators in Section 4 and establish -error estimates for probability density functions in general Besov spaces. We also introduce and study linear wavelet density estimators. In Section 5, we introduce and study adaptive wavelet threshold density estimators. We establish -error estimates for probability density functions in Besov spaces. Section 7 is an appendix, where we place the proofs of some claims from previous sections.

Notation: Throughout will denote the indicator function of the set and . We denote by positive constants that may vary at every occurrence. Most of these constants will depend on some parameters that may be indicated in parentheses. We will also denote by as well as , constants that will remain unchanged throughout. The relation means that there exists a constant such that . We will also use the notation , and , , will stand for the set of all functions with continuous derivatives of order up to on .

2. Setting and motivation

We assume that is a metric measure space equipped with a distance and a positive Radon measure .

Let be independent identically distributed (i.i.d.) random variables on with common probability having a density function (pdf) with respect to the measure . Our purpose is to estimate the density .
To an estimator of , we associate its risk:

as well as its risk:

We will operate in the following setting. Most of the material can be found in an extended form in the papers [4, 17]. Note that, depending on the results we are going to establish, some of the following conditions will be assumed, others will not.

2.1. Doubling and non-collapsing conditions

The following conditions are concerning properties related to ’dimensional’ structure of .
C1. We assume that the metric space satisfies the so called doubling volume condition:


where and is a constant. The above implies that there exist constants and such that


The least such that (2.2) holds is the so called homogeneous dimension of
From now on we will use the notation for .

In developing adaptive density estimators in Section 5 we will additionally assume that is a compact measure space with satisfying the following condition:

C1A. Ahlfors regular volume condition: There exist constants and such that


Clearly, condition C1A implies conditions C1 and the following condition C2 as well, with from (2.3) being the homogeneous dimension of .

These doubling conditions have been introduced in Harmonic Analysis in the 70’s by R. Coifman and G. Weiss [3].

It is interesting already to notice that will indeed play the role of a dimension in the statistical results as well. Condition C1A is obviously true for with the Lebesgue measure.

Also, the doubling condition is precisely related to the metric entropy using the following lemma whose elementary proof can be found for instance in [2, Proposition 1]. For , we define, as usual, the covering number as the smallest number of balls of radius covering .

Lemma 2.1.

Under the condition C1A and if is compact, there exist constants , and such that


for all .

C2. Non-collapsing condition: There exists a constant such that


This condition is not necessarily very restrictive. For instance, it is satisfied if is compact. It is satisfied for if is the Lebesgue measure, but untrue for if is a Gaussian measure.

2.2. Smooth operator

Here comes an important assumption which may seem strange to the reader at first glance. Before entering into the specificity of the set of assumptions described below, let us explain some motivations.

One rather standard method in density estimation is the kernel estimation method, i.e. considering a family of functions indexed by : an associated kernel density estimator is defined by


In , an important family is the family of translation kernels , where is a function . When is a more involved set such as a manifold or a set of graphs, of matrices, the simple operations of translation and dilation may not be meaningful. Hence, even finding a family of kernels to start with might be a difficulty. It will be shown in Section 4 that the following assumptions provide quite ’naturally’ a family of kernels.

When dealing with a kernel estimation method, it is standard to consider two quantities :

The analysis of the second term (stochastic term) , can be reduced via Rosenthal inequalities to proper bounds on norms of and (see the Lemmas 4.7, 4.8), where in particular the assumptions of the previous subsection are also important).

The analysis of the first term is linked to the approximation properties of the family . One can stop at this level and precisely express the performance of an estimator in terms of . This is the purpose of oracle inequalities (see Theorem 4.3).

However, it might seem more convincing if one can relate the rate of approximation of the family to regularity properties of the function . It is standardly proved (see e.g. [14]), that in if is a translation family with mild properties on , then polynomial rates of approximation are obtained for functions with Besov regularity.

Hence, an important issue becomes finding spaces of regularity associated to a possibly complex set . On a compact metric space one can always define the scale of -Lipschitz spaces defined by the following norm


In Euclidian spaces a function can be much more regular than Lipschitz, for instance differentiable at different orders, or belong to some Sobolev or Besov spaces.

When is a set where there is no obvious notion of differentiability, one can make the observation that in or Riemannian manifolds, regularity properties can also be expressed via the associated Laplacian. The Laplacian itself is an operator of order 2, but its square root is of order 1 and can be interpreted as a substitute for derivation.

We will use this analogy to introduce an operator playing the role of a Laplacian. However, conditions are needed to ensure that this analogy makes sense and can lead to a scale of spaces with suitable properties (which for instance, for small regularities correspond to Lipschitz spaces). This is why we adopt the setting introduced in [4, 17]. This setting is rich enough to develop a Littlewood-Paley theory in almost complete analogy with the classical case on , see [4, 17]. In particular, it allows to develop Besov spaces with all sets of indices. At the same time this framework is sufficiently general to cover a number of interesting cases as will be shown in what follows.

Our main assumption is that the space is complemented by an essentially self-adjoint non-negative operator on , mapping real-valued to real-valued functions, such that the associated semigroup consists of integral operators with the (heat) kernel obeying the following conditions:

C3. Gaussian localization: There exist constants such that


C4. Hölder continuity: There exists a constant such that


for and , whenever .

C5. Markov property:


Above are structural constants. These technical assumptions express that fact that the Heat kernel associated with the operator ’behaves’ as the standard Heat kernel of .

2.3. Typical examples

Here we present some examples of setups that are covered by the setting described above. We will use these examples in what follows to illustrate our theory. More involved examples will be given Section 6.

2.3.1. Classical case on

Here is the Lebesgue measure and is the Euclidean distance on . In this case we consider the operator

defined on the space of functions with compact support. As is well known the operator is positive essentially self-adjoint and has a unique extension to a positive self-adjoint operator. The associate semigroup is given by the operator with the Gaussian kernel: .

2.3.2. Periodic case on

Here is the Lebesgue measure and is the Euclidean distance on the circle. The operator is

defined on the set on infinitely differentiable periodic functions. It has eigenvalues


and eigenspaces

2.3.3. Non-periodic case on with Jacobi weight

(This example is further developed in Subsection 6.1.) Note that this example can arise when dealing with data issued from a density which itself has received a folding treatment such as in the Wicksell problem ([15, 18]). Now, the measure is

the distance is the Euclidean distance, and is the Jacobi operator

Conditions C1-C5 are satisfied, but not the Ahlfors condition C1A, unless The discrete spectral decomposition of is given by one dimensional spectral spaces:

where is the th degree Jacobi polynomial and .

2.3.4. Riemannian manifold without boundary

If is a Riemannian manifold, then the Laplace operator is well defined on (see [13]) and we consider

If is compact, then conditions C1-C5 are verified, including the Ahlfors condition C1A. Furthermore, there exists an associated discrete spectral decomposition with finite dimensional spectral eigenspaces of :

2.3.5. Unit sphere in

This is the most famous Riemannian manifold with the induced structure from . Here is the Lebesgue measure on , is the geodesic distance on :

and with being the Laplace-Beltrami operator on . The spectral decomposition of the operator can be described as follows:

Here is the restriction to of harmonic homogeneous polynomials of degree (spherical harmonics), see [28]. We have

2.3.6. Lie group of matrices:

This example is interesting in astrophysical problems, especially in the measures associated to the CMB, where instead of only measuring the intensity of the radiation we also measure spins. By definition


This a compact group which topologically is the sphere So, if

with , then


and for any

The eigenvalues of are and the dimension of the respective eigenspaces is

Remark 2.2.

Looking at some of these examples an important question already arises: how to choose in a given problem the distance as well as the dominating measure before even choosing the operator and a class of regularity? In , most often the euclidean distance and the Lebesgue measure seems more or less unavoidable. In some other cases it might not be so obvious.

Let us take for instance the simple case of being an interval . The cases of the ball, the simplex (see Section 6) and more generally sets with boundaries give rise in fact to identical discussions therefore we will focus on the case of the interval.

So, if , a possible choice -and probably the most standard one in statistical examples could be taking as the euclidean distance and as the Lebesgue measure. Then the usual translation kernels are available as well as the standard wavelet bases. However “something” -which generally is often swept under the carpet or not really detailed- has to be “done” about the boundary points . Often special regularity conditions are assumed about these boundary points such as (subsection 2.3.2), which de facto lead to different methods for representing the functions to be estimated.

Let us now look at the choices (again for the interval Subsection 2.3.3) that are made in the “Jacobi” case. The distance , suggests a one-to-one correspondance with the semi-circle. The measure () suggests that the points in the middle of the interval (say , where the measure behaves as the Lebesgue measure) will not be weighted in the same way as the points near the boundary. And in some cases, this makes perfect sense: for instance if one needs to give a hard weight on these points because they require special attention, or at the contrary a small one.

Apart from these considerations, there are in fact two measures in the family , that are undeniable in the case equipped with the distance . The first one is the Lebesgue measure (because Lebesgue is always undeniable), corresponding to . The second one is , because in that case there is a one-to-one identification between and the semi circle equipped with the euclidean distance and Lebesgue measure.

If we look more precisely into these two choices, we see that for the last case, all the required conditions including the Ahlfors one are satisfied, and the dimension , which is intuitively expected. Let us now observe that the case of the Lebesgue measure would lead to a larger dimension .

3. Background

In this section we collect some basic technical facts and results related to the setting described in Section 2 that will be needed for the development of density estimators. Most of them can be found in [4, 11, 17].

3.1. Functional calculus

A key trait of our setting is that it allows to develop a smooth functional calculus. If we recall that the operator has been introduced as a substitute for Laplacian, we also have to recall that in

, regularity properties of the functions are most often expressed in terms of Fourier transforms which are corresponding to spectral decompositions of the Laplacian. Hence there is no surprise that we will consider the spectral decomposition of

and define an associated functional calculus.

Let , , be the spectral resolution associated with the operator in our setting. As is non-negative, essentially self-adjoint and maps real-valued to real-valued functions, then for any real-valued, measurable, and bounded function on


is well defined on . The operator , called spectral multiplier, is bounded on , self-adjoint, and maps real-valued to real-valued functions [30]. We will be interested in integral spectral multiplier operators . If is the kernel of such an operator, it is real-valued and symmetric. From condition C4 of our setting we know that is an integral operator whose (heat) kernel is symmetric and real-valued: .

3.1.1. Examples

Let us revisit some of the examples given in Subsection 2.3:

(a) Let be in the periodic case (Subsection 2.3.2). It is readily seen that the projection operators are:

Hence, formally,

(b) If is a Riemanian manifold (Subsection 2.3.4), then is a kernel operator with kernel

with , where is an orthonormal basis of

(c) In the case of the sphere (Subsection 2.3.5), the orthogonal projector operator is a kernel operator with kernel of the form

Here is the Gegenbauer polynomials of degree . Usually, the polynomials are defined by the generating function

Hence, formally

(d) In the case of (Subsection 2.3.6), the orthogonal projector operator is the operator with kernel

where . Hence, formally

Our further development will heavily depend on the following result from the smooth functional calculus induced by the heat kernel, developed in [17, Theorem 3.4]. It asserts the localization properties of general spectral multipliers of the form (corresponding to functions of the form in (3.1)). Again the appearance of the square root is by analogy with the Laplacian, which is an operator of degree 2. It is also interesting to remark that (3.2) is valid in when is replaced by , where is a bounded compactly supported function for instance. This result is a building block for the properties of the kernel estimators defined in the sequel.

Theorem 3.1.

Let , , be even, real-valued, and , . Then , , is an integral operator with kernel satisfying


where is a constant depending on , , , and the constants from our setting.

Furthermore, for any and


3.2. Geometric properties

Conditions C1 and C2 yield


To compare the volumes of balls with different centers and the same radius we will use the inequality


As the above inequality is immediate from (2.2).

We will also need the following simple inequality (see [4, Lemma 2.3]): If , then for any


where .

3.3. Spectral spaces

We recall the definition of the spectral spaces , , from [4]. Denote by the set of all even real-valued compactly supported functions. We define

We will need the following proposition (Nikolski type inequality):

Proposition 3.2.

Let . If , , then and


where the constant is independent of and .

This proposition was established in [4, Proposition 3.12] (see also [17, Proposition 3.11]). We present its proof in the appendix because we need to control the constant .

3.4. Wavelets

In the setting of this article, wavelet type frames for Besov and Triebel-Lizorkin spaces are developed in [17]. Here, we review the construction of the frames from [17] and their basic properties. Indeed, in this setting the ’wavelets’ do not form an orthonormal basis but a frame. In this case, the construction of a ’dual wavelet system’ is necessary to get a representation of type (3.14).

This construction is inspired by to the Littlewood-Paley construction of the standard wavelets introduced by [7],[8], [9].

The construction of frames involves a “dilation” constant whose role is played by in the wavelet theory on .

The construction starts with the selection of a function with the properties: for , , and . Denote and set , . From this it readily follows that


For we let be a maximal net on with . It is easy to see that for any there exists a disjoint partition of consisting of measurable sets such that

Here is a sufficiently small constant (see [17]).

Lemma 3.3.

If is compact, then there exists a constant such that


Assume is compact and let be a maximal -net on , . Then

Therefore, using (2.2) we get


Since is compact we have and for , where is the diameter of , which is finite (see [4]). Using again (2.2) we get for . Hence

which implies (3.9). ∎

The th level frame elements are defined by


We will also use the more compact notation for .

Let , where equal points from different sets will be regarded as distinct elements of , so can be used as an index set. Then is Frame .

The construction of a dual frame is much more involved; we refer the reader to §4.3 in [17] for the details.

By construction, the two frames satisfy


A basic result from [17] asserts that for any , ,


and the same holds in if is uniformly continuous and bounded (UCB) on . As a consequence, for any , , () we have


Furthermore, frame decomposition results are established in [17] for Besov and Triebel-Lizorkin spaces with full range of indices.

Properties of frames in the Ahlfors regularity case. We next present some properties of the frame elements in the case when condition C1A is stipulated (see [17]).

1. Localization: For every , there exists a constant such that


2. Norm estimation: For


3. For


with the usual modification when . Above the constant depends only on , , , and the structural constants of the setting.

3.5. Besov spaces

We will deal with probability density functions (pdf’s) in Besov spaces associated to the operator in our setting. These spaces are developed in [4, 17]. Definition 3.4 coincides in with one the definitions of usual Besov spaces with replaced by Laplacian ( in fact to get a positive operator).

Here we present some basic facts about Besov spaces that will be needed later on.

Let be real-valued functions satisfying the conditions:


Set .

Definition 3.4.

Let , , and . The Besov space is defined as the set of all functions such that


where the -norm is replaced by the sup-norm if .

Note that as shown in [17] the above definition of the Besov spaces is independent of the particular choice of satisfying (3.18)-(3.19). For example with from the definition of the frame elements in §3.4 we have


with the usual modification when . The following useful inequality follows readily from above


As in , we will need some embedding results involving Besov spaces. Recall the definition of embeddings: Let and be two (quasi-)normed spaces. We say that is continuously embedded in and write if and for each