 # Nonparametric Estimation of Surface Integrals on Density Level Sets

The estimation of surface integrals on density level sets is important (such as for confidence regions and bandwidth selection) in the study of nonparametric level set estimation. We consider a plug-in estimator based on kernel density estimation. By establishing a diffeomorphism between the true and estimated density level sets, we obtain the asymptotic normality of our estimator of the surface integrals when the integrand is known. We also consider the convergence rate of a plug-in bandwidth selector for density level set estimation, which involves the density derivatives as unknown integrands.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The -level set of a density function on is defined as

 Mc={x∈Rd:f(x)=c}.

The set

is called the upper level set. Level set estimation finds its application in many areas such as clustering (Cadre et al. 2009), classification (Mammen and Tsybakov, 1999), and anomaly detection (Steinwart et al. 2005). It has received extensive study in the literature. See, e.g., Hartigan (1987), Polonik (1995), Tsybakov (1997), Walter (1997), Cadre (2006), Rigollet and Vert (2009), Singh et al. (2009), Rinaldo and Wasserman (2010), Steinwart (2015), and Chen et al. (2017).

We consider in this paper. For simplicity of notation, the subscript is often omitted for and . When has no flat part at the level , is a -dimensional submanifold of . Given an i.i.d. sample from the density , we investigate the estimation of the surface integral

 λ(g)=∫Mg(x)dH(x),

for some integrable function (which can be known or unknown), where is the -dimensional normalized Hausdorff measure.

The surface integral is an important quantity that is involved in asymptotic theory for level set estimation. For example, it appears in Cadre (2006) as the convergence limit of the G-measure of the symmetric difference between and its plug-in-type estimator under mild conditions. Using a similar measure as the risk criterion, Qiao (2018) shows that the optimal bandwidth for nonparametric level set estimation is determined by a ratio of two surface integrals in the form of . In fact one of the motivations for this paper is to find out the convergence rate of the plug-in bandwidth selector in Qiao (2018). In a more general setting, the surface integral on a manifold (not necessarily a level set) plays an important role in the asymptotic distribution of the suprema of rescaled locally stationary Gaussian fields indexed on this manifold (see Qiao and Polonik, 2018a). An application of this extreme value theory result leads to large sample confidence regions for and , for which the surface area of (a special form of when ) is the only unknown quantity that needs to be estimated (Qiao and Polonik, 2018b). A similar application to the asymptotic distribution for ridge estimation in Qiao and Polonik (2016) requires the estimation of surface integrals on ridges, which are low-dimensional manifolds. The quantity

is also a key component in the concept of vertical density representation (Troutt et al. 2004). Surface integrals on (regression) level sets also appears in optimal tuning parameter selection for nearest neighbour classifiers (Hall and Kang 2005, Samworth 2012, Cannings et al. 2017). To our knowledge, the only article considering the asymptotic theory for the estimation of the surface integrals on

is Cuevas et al. (2012), where the authors obtain a consistency result for the estimation of the surface area of .

The estimation of surface integrals on submanifolds embedded in is a challenging question in statistics. In the literature there exists some recent work on the estimation of the surface area of the boundary of an unknown body where

is a bounded set. The sampling scheme assumes a uniform distribution on

and the binary labels for and are observed with the sample. The surface area is defined as the Minkowski content, which coincides with the normalized Hausdorff measure in regular cases. In this setting the convergence rates of the estimators for surface area of are derived under different shape assumptions (Cuevas et al. 2007, Pateiro-López and Rodríguez-Casal 2008, 2009, Armendáriz et al. 2009). Trillo et al. (2017) consider the same setting but use a more general notion of surface area. The consistency of the surface integral on has been considered by Jiménez and Yukich (2011). Assuming the i.i.d. sampling scheme only on , Arias-Castro and Rodríguez-Casal (2017) estimate the perimeter of using the alpha-shape for and derive the convergence rate.

Surface area has been used in the definition of “contour index”, which is the ratio between perimeter and square root of the area, and has appeared in medical imaging and remote sensing (Canzonieri and Carbone, 1998, Garcia-Dorado et al., 1992, Salas et al., 2003). The estimation of surface area also has extensive applications in stereology (Baddeley et al, 1986, Baddeley and Jensen, 2005, Gokhale, 1990). The surface area of a level set corresponds to surface integral for the simplest example when . Other examples of arise where can be observable temperature, humidity, or the density of some non-homogeneous material, and one is interested in the surface integrals of these quantities on an unknown level set, as explained by Jiménez and Yukich (2011) in a similar setting.

Our main contributions are summarized as follows.

(1) We derive the asymptotic normality of the plug-in kernel estimator of when is known. Asymptotic distributional results are few in both the fields of surface area/integral estimation and level set estimation. Armendáriz et al. (2009) obtain an asymptotic normality result for the surface area of , but in the typical framework of uniform distribution sampling as described above. Note that our asymptotic normality is technically different from the result in this framework, the assumptions of which eliminate the requirement for estimation of the density and even the boundary to estimate the surface area. Mason and Polonik (2009) prove an asymptotic normality result for the estimation of upper level set , but the involved measure for integrals is not low-dimensional as what we study in this paper.

(2) We derive the convergence rate of the direct plug-in bandwidth selector for density level set estimation, which involves for some unknown functions. Bandwidth selection is always critical in kernel-type estimators. Qiao (2018) obtains the asymptotic optimal bandwidth for kernel estimator of density level set, which needs the estimation of surface integrals on for practical plug-in bandwidth selector. Our asymptotic results complements the study of this plug-in bandwidth selector.

### 1.1 Set-up and structure

Let us formally provide the setting and define our estimators in the paper. Let be an i.i.d. sample generated from an unknown -dimensional density function . The kernel density estimator of is given by

 ˆf(x)=1nhdn∑i=1K(x−Xih), (1.1)

where is the bandwidth and is a -dimensional kernel function. The plug-in estimators of and are given by and , respectively. For the convenience of exposition, we adopt a product kernel with

 K(x−Xih)=d∏j=1˜K(xj−Xijh), (1.2)

where is a univariate kernel function, and are the elements of and , respectively. The kernel function is called a th () order kernel if and

 ∫Rul˜K(u)du=⎧⎨⎩1,if% l=0,0,if l=1,⋯,ν−1,κν>0,if l=ν.

The estimator using a high-order kernel () usually takes negative values in the tails, and therefore loses its interpretability for practitioners (see, e.g., Silverman, 1986, page 69, Hall and Murison, 1993). However, this is not a problem for our estimator, since we are only interested in the level set of at a positive level, and our estimator does not include the negative values of .

When is known, our estimator for is

 ˆλ(g):=∫ˆMg(x)dH(x). (1.3)

In Section 2, we show the convergence rate and asymptotic normality of under mild assumptions, as well as some useful ancillary results. Our proof relies on the construction of a diffeomorphism between and , in order to overcome the challenge of different integral domains in and . The relevant geometric concepts and the assumptions that guarantee the existence of such a diffeomorphism are given in the following subsections.

It is well-known that bandwidth selection is very important for kernel-type estimators, such as and . It turns out that the bandwidth selection for level set estimation also requires estimation of the surface integral . Let , which is the symmetric difference between and . Integrals defined on are usually used to measure the dissimilarity between and (or between and ). See, e.g., Baíllo et al. (2000), Baíllo (2003), Cadre (2006), Cuevas et al. (2006), and Mason and Polonik (2009). Using as the risk function, Qiao (2018) showed that the asymptotic optimal bandwidth for depends on , where and are some algebraic functions of the first two derivatives of . Let and be the plug-in estimators of and . The direct plug-in bandwidth selector is then . In Section 3, we show the convergence rate of . The proofs of all the results in Sections 2 and 3 are left to Section 4. The appendix contains discussions on one of the technical assumptions.

### 1.2 Notation and geometric concepts

For a function with th derivatives (), and integers , denote . If , denote . If in addition has th derivatives, we denote . Let and be the gradient and Hessian matrix of , respectively.

For any and , let . The Hausdorff distance between any two sets is

The ball with center and radius is denoted by For any set and , we denote . Let the normal projection of onto be . Note that may not be a single point.

We will also use the concept of reach of a manifold. For a set , let Up be the set of points such that is unique. The reach of is defined as

 ρ(S)=sup{δ>0:S⊕δ⊂Up(S)}.

See Federer (1959). A positive reach is related to the concepts of “-convexity” and “rolling condition”. See Cuevas et al. (2012). These are common regularity conditions for the estimation of surface area. For two manifolds and , if the normal projections and are homeomorphisms, then and are called normal compatible (see Chazal et al. (2007)).

### 1.3 Assumptions and their discussion

We introduce the assumptions that will be used in this paper. For , denote .

Assumptions:
(K1) is a two times continuously differentiable, symmetric product kernel function of th order for , with bounded support.

(F1) The density function is uniformly continuous on and has bounded and continuous partial derivatives up to order on . There exist and such that for all .

(F2) We assume that consists of connected components, i.e., , where , , are -dimensional connected manifolds such that . We further assume that for , is parameterized by on which is a bounded Lebesgue measurable subset of , such that the -dimensional Lebesgue measure of is zero, and is injective on , which is the interior of . Denoting the Jacobian matrix of as and , we require that is continuous and bounded such that

 0

(H1) The bandwidth depends on such that as .

Discussion of the assumptions:

1. The assumption for in (F1) implies that the Lebesgue measure of on is zero. This is a typical assumption in the literature of level set estimation (see, e.g. Tysbakov (1997), Cadre (2006), Cuevas et al. (2006), Mammen and Polonik (2013)), and guarantees that has no flat parts and is a compact -dimensional manifold (see Theorem 2 in Walther (1997)), which makes well-defined.

2. Assumption (F2) is used in the derivation of the asymptotic normality of the surface integral estimator. An assumption similar to (F2) appears in Mason and Polonik (2009), where for they take

 Ωj=Ω:={[0,2π),d=2,d−2×[0,2π),d>2. (1.5)

It is known (e.g. see Lang, 1997) that the unit -sphere can be parameterized with . Mason and Polonik (2009) assume the existence of diffeomorphisms between and for the construction of the parameterization , for .

The diffeomorphisms between and can be constructed under mild conditions using Morse theory (Milnor, 1963). For , let , where are the connected components of . When is within a small neighborhood of a non-degenerate local maximum (or minimum), is diffeomorphic to (Lemma 2.25 in Nicolaescu, 2011). If no critical points exist between and , then they are differomorphic via , where is a gradient flow satisfying

 dγx(t)dt=∇f(γx(t))∥∇f(γx(t))∥2,γx(0)=x.

See Theorem 2.6 in Nicolaescu (2011). This simply means is diffeomorphic to and the parameterization on can be constructed accordingly. If there exist one or more critical points between and , further discussion can be found in the appendix based on Morse theory.

## 2 Surface integral estimation with a known integrand

We first consider the case that the integrand is a known function. The following theorem includes the convergence rate and asymptotic normality for .

###### Theorem 2.1

Let be a function with bounded continuous second order derivatives. Under assumptions (K1), (F1), (F2) and (H1), we have
(i)

 ˆλ(g)−λ(g)=Op(lognnhd+2+hν+1√nh3); (2.1)

(ii) if in addition and , we have

 √nh3[ˆλ(g)−λ(g)]→DN(√γμ,σ2), (2.2)

where for , is given in (4.64) and is given in (4.55); and for , is given in (4.65) and is given in (4.66).

• The proof of this theorem is built upon the ancillary results below in this section. One of the main challenges is that the domains of integrals in and

are not the same, which makes the comparison difficult. Briefly speaking, our strategy is to establish a diffeomorphism between the two domains and utilize the area formula in differential geometry to convert the surface integrals into those with the same integral domains. The heuristic behind the normalizing factor

is that, after the conversion, the source of difference arises from the integrands, which are Jacobian matrices (first derivatives), and have a standard rate of . Due to the integrals on a -dimensional manifold, we gain powers of , which results in the normalizing factor .

• For part (ii), the constraints , , as well as in assumption (H1), require that the integer .

The following lemma shows results related to the geometric concepts introduced in Subsection 1.2 for level sets. It is similar to Lemma 1 in Chen et al. (2017), but our result is under slightly different assumptions and holds uniformly for a collection of density level sets. Also see Theorems 1 and 2 in Walther (1997) for relevant results.

###### Lemma 2.1

Under assumptions (K1), (F1), and (H1), the following results hold.
(i) There exists , such that the reach for all .
(ii) When is sufficiently large, and are normal compatible for all

with probability one.

For with , and , let

 ζx(t)=x+∇f(x)∥∇f(x)∥t.

Note that is orthogonal to the tangent space of at . Furthermore, let

 tn(x)=argmint{|t|:ζx(t)∈ˆMτ},

and

 Pn(x)=ζx(tn(x)). (2.3)

Under the assumptions in Lemma 2.1, and are normal compatible for large . Hence is uniquely defined, and is a homeomorphism between and . In fact, . We write .

The following lemma gives asymptotic results for . Recall .

###### Lemma 2.2

Suppose that assumptions (K1), (F1) and (H1) hold. For any point , we have

 tn(x) =f(x)−ˆf(x)∥∇f(x)∥+δ1n(x), (2.4) ∇tn(x) =∇f(x)−∇ˆf(x)∥∇f(x)∥−∇∥∇f(x)∥∥∇f(x)∥2tn(x)+δ2n(x), (2.5)

where

 supx∈I(δ0)|δ1n(x)| =Op{(√lognnhd+hν)(√lognnhd+2+hν)}, (2.6) supx∈I(δ0)∥δ2n(x)∥ =Op{(√lognnhd+hν)(√lognnhd+4+hν)}. (2.7)

• Note that . Then the above results link the local horizontal variation with the local vertical variation as well as their derivatives. Here on the right-hand side of (2.4) and (2.5) can be understood as a directional derivative of in the gradient direction, i.e., , and it reflects the asymptotic rate of change between the local vertical and horizontal variations, as is parallel to the direction of .

Let be one of the pairs , defined in assumption (F2). The following proposition establishes the asymptotic normality theory corresponding to description in the remark after Theorem 2.1. In particular, the rate in (2.8) has been heuristically explained and the rate in (2.9) follows a similar justification.

###### Proposition 2.1

Suppose that assumptions (K1), (F1) and (F2) hold. Let

be a vector function with bounded continuous first derivatives. If

, then

 √nh3∫Ωw(θ)T[∇ˆf(ψ(θ))−E∇ˆf(ψ(θ))]dθ→DN(0,σ21). (2.8)

where is given in (4.22).

Similarly, let be a bounded continuous function. If , then

 (2.9)

where is given in (4.24).

So far we assume the integrand is known. In the next section we will consider some cases when is unknown and has to be estimated. In particular our study is in the context of convergence rate of the bandwidth selection for density level set estimation, for which is a functional of the derivatives of .

## 3 Bandwidth selection for nonparametric level set estimation

Bandwidth selection for nonparametric estimation of density level sets is considered by Qiao (2018), where he uses the following excess risk as the criterion of bandwidth selection and shows that under regularity conditions,

 E∫LΔˆL|f(x)−c|dx=12E∫M|ˆf(x)−f(x)|2∥∇f(x)∥dH(x){1+o(1)}. (3.1)

Individual bandwidth for each dimension is used in Qiao (2018). For expositional simplicity we use a single bandwidth for all the dimensions. However, the rates obtained in this section also carry through for the case of bandwidth vector. Following standard computation from (3.1), we have

 E∫LΔˆL|f(x)−c|dx=12˜m(h){1+o(1)},

where with

 b(f) =λ(∥∇f∥−1), (3.2) akl(f) =λ[f(k∗ν)f(l∗ν)∥∇f∥−1],1≤k,l≤d. (3.3)

The asymptotic optimal bandwidth which minimizes is

 ˜hopt=(dcb(f)∥K∥22(ν!)22nνκ2ν∑1≤k,l≤dakl(f))1d+2ν.

The plug-in bandwidth selector is

 ˆhopt=(dcˆb(f)∥K∥22(ν!)22nνκ2ν∑1≤k,l≤dˆakl(f))1d+2ν,

where

 ˆb(f) =ˆλ(∥∇ˆf∥−1), (3.4) ˆakl(f) =ˆλ(ˆf(k∗ν)ˆf(l∗ν)∥∇ˆf∥−1),1≤k,l≤d. (3.5)

Note that minimizes the estimated risk function . Three different pilot bandwidths , and are allowed for the estimators , and , respectively. It is suggested in Qiao (2018) that the direct plugin bandwidths for the kernel density and its first two derivatives are used as the pilot bandwidths, which are of the orders of , and , respectively (see, e.g., Wand and Jones, 1995). In the following theorem we compare the plug-in optimal bandwidth with the asymptotic optimal bandwidth . For sequences , we denote if .

###### Theorem 3.1

Suppose assumptions (K1), (F1), (F2) and (H1) hold. In addition, assume that is three times continuously differentiable on and is times continuously differentiable on . If , and , then for ,

 ˆhopt=˜hopt{1+Op(n−ν/(d+4+2ν))}, (3.6)

and

 ˆm(ˆhopt)=˜m(˜h%opt){1+Op(n−ν/(d+4+2ν))}. (3.7)

The remaining part of this section is denoted to some ancillary results used to prove the above theorem. When comparing and , we need to study the asymptotic performance of and , , which further depend on some specific forms of the following quantities with a generic function and random function , in addition to which is studied in Theorem 2.1. Define

 D(0)gn,n:=ˆλ(gn)−λ(gn), (3.8) D(1)g,n:=λ(gˆf2(k))−λ(gf2(k)), (3.9) D(2)g,n:=λ(gˆf(k,k)ˆf(l,l))−λ(gf(k,k)f(l,l)), (3.10)

for integers and such that . Note that we omit the indices and for and for simplicity.

The following theorem gives an asymptotic result for in (3.8).

###### Proposition 3.1

Suppose that assumptions (K1), (F1), (F2), and (H1) hold. For a sequence of functions , suppose that there exist sequences and as , such that and . Then

 D(0)gn,n=Op{αn(√logn√nhd+2+hν)+βn(√logn√nhd+hν)}. (3.11)

In the next proposition we consider in (3.9) and in (3.10).

###### Proposition 3.2

Suppose that assumptions (K1), (F1), and (H1) hold. Let be a bounded continuously differentiable function. If and , then

 D(1)g,n=O(1nhd+2+hν)+Op(√1nh3+1n3h2d+5+1n2hd+5), (3.12)

and

 D(2)g,n=O(1nhd+4+hν)+Op(√1nh5+1n3h2d+9+1n2hd+9). (3.13)

## 4 Proofs

Proof of Lemma 2.1

• We first show the proof of assertion (i). Let . For and , let

 ∥f∥∗r,max=max{∥f∥∗0,max,max[supx∈I(2δ0)|f(i1,⋯,ip)(x)|:max(i1,⋯,ip)≤d,1≤p≤r]}.

Let , that is, the closest distance between the set and the boundary of the set . Given assumption (F1), we have . This implies that for any with , and any , we have . the assertion that has a positive reach for immediately follows from Lemma 1 in Chen et al. (2017) and a positive lower bound of the reach is given by

 r0=min(b02,ϵ0∥f∥∗2,max).

Since is independent of , we have

 r0≤inf|τ−c|≤δ0/2ρ(Mτ). (4.1)

Next we prove assertion (ii). By Theorem 2 in Cuevas et al. (2006), we have

 dH(ˆMτ,Mτ)=O(∥^f−f∥∗0,max),a.s.

A closer examination of the proof given in Cuevas et al. (2006) reveals that the big term above is uniform in , i.e.

 supτ∈[c−δ0/2,c+δ0/2]dH(ˆMτ,Mτ)=O(∥^f−f∥∗0,max),a.s. (4.2)

Using Lemma 3 in Arias-Castro et al. (2016), it follows by assumptions (K1), (F1) and (H1) that for ,and ,

 ∥ˆf−Eˆf∥∗r,max=O(√lognnhd+2r),a.s. (4.3)

Also we follow a regular derivation procedure for kernel densities by applying a change of variables and Taylor expansion, and obtain from assumptions (K1), (F1) and (H1) that and 2,

 ∥Eˆf(x)−f(x)∥∗r,max=O(hν), (4.4)

where is the order of the kernel function. Since , we get from (4.2) that almost surely

 supτ∈[c−δ0/2,c+δ0/2]dH(ˆMτ,Mτ)=O(√lognnhd)+O(hν). (4.5)

Using (4.3) and (4.4), and following a similar argument for deriving (4.1), we can find with almost surely such that

 ˆr0≤infτ∈[c−δ0/2,c+δ0/2]ρ(ˆMτ). (4.6)

By (4.1), (4.5) and (4.6), for large enough with probability one we have

 supτ∈[c−δ0/2,c+δ0/2]dH(ˆMτ,Mτ)≤(2−√2)infτ∈[c−δ0/2,c+δ0/2]min(ρ(ˆMτ),ρ(Mτ)),

which by Theorem 1 in Chazal et al. (2007) implies that and are normal compatible for all .

Proof of Lemma 2.2

• We first focus on . The fact that and (4.5) imply

 supx∈I(δ0)|tn(x)|≤supτ∈[c−δ0/2,c+δ0/2]dH(ˆMτ,Mτ)=op(1). (4.7)

Since , using Taylor expansion for we obtain

 0=ˆf(x+∇f(x)∥∇f(x)∥tn(x))−f(x)=ˆf(x)−f(x)+∇f(x)T∇ˆf(x)∥∇f(x)∥tn(x)+en(x), (4.8)

where

 en(x)=121∥∇f(x)∥2∇f(x)T∇2ˆf(x+s1∇f(x)∥∇f(x)∥tn(x))∇f(x)tn(x)2,

for some . Plugging into (4.8), we have

 tn(x)=1∥∇f(x)∥[f(x)−ˆf(x)]+δ1n(x), (4.9)

where

 δ1n(x)=1∥∇f(x)∥2∇f(x)T[∇f