# Aggregated kernel based tests for signal detection in a regression model

Considering a regression model, we address the question of testing the nullity of the regression function. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. We first propose a single testing procedure based on a general symmetrickernel and an estimation of the variance of the observations. The corresponding critical values are constructed to obtain non asymptotic level-? tests. We then introduce an aggregation procedure to avoid the difficult choice of the kernel and of the parameters of the kernel. The multiple tests satisfy non-asymptotic properties and are adaptive in the minimax sense over several classes of regular alternatives.

## Authors

• 2 publications
02/18/2019

### Aggregated test of independence based on HSIC measures

Dependence measures based on reproducing kernel Hilbert spaces, also kno...
09/30/2020

### Testing for linearity in boundary regression models with application to maximal life expectancies

We consider a regression model with errors that are a.s. negative. Thus ...
07/13/2020

### Adaptive minimax testing for circular convolution

Given observations from a circular random variable contaminated by an ad...
06/07/2019

### Nonparametric volatility change detection

We consider a nonparametric heteroscedastic time series regression model...
07/23/2020

### Nonparametric Tests in Linear Model with Autoregressive Errors

In the linear regression model with possibly autoregressive errors, we p...
09/07/2019

### On the Optimality of Gaussian Kernel Based Nonparametric Tests against Smooth Alternatives

Nonparametric tests via kernel embedding of distributions have witnessed...
11/27/2020

### A new RKHS-based global testing for functional linear model

This article studies global testing of the slope function in functional ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We observe that obey to the regression model described as follows,

 Yi=f(Xi)+σϵi,i=1,⋯,n. (1)

We assume that

are i.i.d real random variables with values in a measurable set

such that with bounded density with respect to the Lebesgue measure on and are i.i.d standard Gaussian variables, independent of . All along the paper, is assumed to be in . We also assume that . In order to estimate , we assume that we also observe that obey to the model

 Y′i=f(in)+σϵ′i,i=1,⋯,n, (2)

where is independent of .
Given the observation of

, we want to test the null hypothesis

 (H0): f=0,

against the alternative

 (H1): f≠0.

Hypothesis testing in nonparametric regression have been considered in the papers by King (1988), Hardle and Marron (1990), Hall and Hart (1990), King et al. (1991) and Delgado (1992). Tests for no effect in nonparametric regression are investigated in Eubank and LaRiccia (1993). In the paper of Spokoiny et al. (1996), the authors considered the particular case where is assumed to be known. They propose tests that tests achieve the minimax rates of testing [up to an unvoidable factor] for a wide range of Besov classes. Baraud et al. (2003) propose a test, based on model selection methods, for testing in a fixed design regression model that belongs to a linear subspace of againts a nonparametric alternative. They obtain optimal rates of testing are up to a possible factor over various classes of alternatives simultaneously. More recently, in a Poisson process framework, Fromont et al. (2012, 2013) consider two independent Poisson processes and address the question of testing equality of their respective intensities. They introduce tests based on a single kernel function and aggregate several kernel based tests to obtain adaptive minimax testing procedures over alternatives based on Besov or Sobolev balls.

Our this work, we propose to construct aggregated kernel based testing procedures of versus

in a regression model. Our test statistics are based on a single kernel function which can be chosen either as a projection or Gaussian kernel and we propose an estimation for the unknown variance

. Our tests are exactly (and not only asymptotically) of level

. We obtain the optimal non-asymptotic conditions on the alternative which guarantee that the probability of second kind error is at most equal to a precribed level

. However, the testing procedures that we introduce hereafter also intended to overcome the question of calibrating the choice of kernel and/or the parameters of the kernel. They are based on an aggregation approach, that is well-known in adaptive testing (Baraud et al. (2003) and Fromont et al. (2013)). This paper is strengly inspired by the paper of Fromont et al. (2013). Instead of considering a particular single kernel, we consider a collection of kernels and the corresponding collection of tests, each with an adapted level of significance. We then reject the null hypothesis when there exists at least one of the tests in the collection which rejects the null hypothesises. The aggregated testing procedures are constructed to be of level and the loss in second kind error due to the aggregation, when unavoidable, is as small as possible. Then we prove that these multiples tests satisfy the adaptive minimax properties over several classes of alternatives. At last, we compare our tests with tests investigated in Eubank and LaRiccia (1993) from a practical point of view.

The paper is organized as follows. We describe the single tests based on a single kernel function with the corresponding critical values approximated by a Monte Carlo method in Section 2. In Section 3, we specify the performances of the single tests for two particular examples of kernels and explain the reasons why we need to aggregate tests based on a collection of kernel functions which are presented in Section 4. We present the simulation study in Section 5 and the major proofs are given in Appendix.

## 2 Single tests based on a single kernel.

### 2.1 Definition of the testing procedure.

We assume that we observe that obey to model (1). In order to estimate the unknown variance , we assume that we observe another sample from the model (2). We are interested in testing the null hypothesis against . Let be a symmetric kernel function: satisfying:

###### Assumption 1.
 ∫E2K2(x,y)f(x)f(y)dν(x)dν(y)<+∞.

We introduce the test statistic defined as follows,

 VK=TK^σ2n, (3)

where

 TK=1n(n−1)n∑i≠j=1K(Xi,Xj)YiYj (4)

and

 ^σ2n=1nn/2∑i=1(Y′2i−1−Y′2i)2, (5)

where for the sake of simplicity, we assume that is even. Let us now introduce some notations. We set and is a constant depending on and , that will be used all along the paper and may vary from line to line.
The expectation of is equal to

 E[TK] =E⎡⎣E⎡⎣1n(n−1)n∑i≠j=1Kij(f(Xi)+σϵi)(f(Xj)+σϵj)∣∣∣X⎤⎦⎤⎦ =E⎡⎣1n(n−1)n∑i≠j=1Kijf(Xi)f(Xj)⎤⎦ =∫E2K(x,y)f(x)f(y)dν(x)dν(y).

In the following, we denote for all ,

 K[f](x)=∫EK(x,y)f(y)dν(y),

and for all

 ⟨f,g⟩=∫Ef(x)g(x)dν(x) and ∥f∥2=⟨f,f⟩.

Within these notations,

 E(TK)=⟨K[f],f⟩, (6)

whose existence is ensured by Assumption 1. We now compute the expectation of .

 E[ˆσ2n] =E⎡⎣1nn/2∑i=1[(f(2i−1n)+σϵ′2i−1)−(f(2in)+σϵ′2i)]2⎤⎦ =1nn/2∑i=1[f(2i−1n)−f(2in)]2+1nn/2∑i=1σ2E(ϵ′2i−1−ϵ′2i)2 =a2+σ2,

with .
Thus is a biased estimator of with bias . If is a regular function this bias will be small.
We have chosen to consider and study in this paper two possible examples of kernel functions. For each example, we give a simpler expression of .

Example 1. When , our first choice for is a symmetric kernel function based on a finite orthonormal family with respect to the scalar product ,

 K(x,y)=∑λ∈Λϕλ(x)ϕλ(y). (7)

For all in we get

 K[f](x) =∫10(∑λ∈Λϕλ(x)ϕλ(y))f(y)dν(y) =∑λ∈Λ(∫10ϕλ(y)f(y)dν(y))ϕλ(x)=ΠS(f),

where is the subspace of generated by the functions and denotes the orthogonal projection onto for . Thus

 E(TK)=⟨ΠS(f),f⟩.

Hence, when is well-chosen, can also be viewed as a relevant estimator of .

Example 2. When and is a density function respect to the Lebesgue measure on , our second choice for is a Gaussian kernel defined by,

 K(x,y)=1hk(x−yh), for all(x,y)∈R2 (8)

where and is a positive bandwidth. Then, for all we have

 K[f](x)=∫∞−∞1hk(x−yh)f(y)dν(y)=kh∗f(x),

where is the convolution producer with respect to the measure and . Thus in this case

 E(TK)=⟨kh∗f,f⟩.

Hence, when the bandwidth is well chosen, can also be viewed as a relevant estimator of .

From the choices of the two examples above for , we have seen that the test statistic can be viewed as a relevant estimator of . Thus, it seems to be reasonable proposal to consider a test which rejects when is as ”large enough”. Now, we define the critical values used in our tests.

We define

 V(0)K=1n(n−1)∑ni≠j=1K(Xi,Xj)ϵiϵj1n∑n/2i=1(ϵ′2i−1−ϵ′2i)2. (9)

Note that, under , conditionally on , and

have exactly the same distribution. We now choose the quantile of the conditional distribution of

given as the critical value for our test. This quantity can easily be estimated by simulations.

More precisely, for in , if denotes the quantile of the distribution of conditionally on , we consider the test that rejects when . The corresponding test function is defined by

 ΦK,α=1{VK>q(X)K,1−α}. (10)

Notice that in practice, the true quantile is not available, but it can be approximated by a Monte Carlo procedure.

### 2.2 Probabilities of first and second kind errors of the test.

Since under , and have the same distribution conditionally on , for any , we have

 P(H0)(VK>q(X)K,1−α∣∣∣X)≤α.

By taking the expectation over , we obtain

 P(H0)(ΦK,α=1)≤α.

Let us now consider an alternative hypothesis, corresponding to a non zero regression function . Given in , we now aim to determine a non-asymptotic condition on the regression function which guarantees that . Denoting by the quantile of the conditional quantile ,

 Pf(ΦK,α=0) =Pf(VK≤qαK,1−β/2)+Pf(VK≤q(X)K,1−α,q(X)K,1−α>qαK,1−β/2) ≤Pf(VK≤qαK,1−β/2)+β/2.

Thus, a condition which guarantees that will ensure that . The following proposition gives such a condition.

###### Proposition 2.1.

Let be the fixed levels in . We have that

 Pf(VK≤qαK,1−β/2)≤β/2,

as soon as

 ⟨K[f],f⟩≥√16AK+8BKβ+Dn,β qαK,1−β/2, (11)

with

 AK =n−2n(n−1)∫E(K[f](x))2[f2(x)+σ2]dν(x), BK =1n(n−1)∫E2K2(x,y)[f2(x)+σ2][f2(y)+σ2]dν(x)dν(y), Dn,β =σ2+a2+4σ2n√(n2+na2σ2)ln(2β)+4σ2nln(2β).

Thus we have, under (11),

 Pf(ΦK,α=0)≤β.

Moreover, there exists some constant such that, for every and

 qαK,1−β/2≤2κ√n(n−1)ln(2α)√2∫E2K2(x,y)dν(x)dν(y)β. (12)

To prove the first part of this result, we simply use Markov’s inequality for the term and an exponential inequality for non-central Chi-square variables due to (Birgé (2001)) for the term . The control of derives from a property of Gaussian chaoes combined with an exponential inequality (due to De la Pena and Giné (2012) and Huskova and Janssen (1993)). The detailed proof is given in the Appendix.

The following theorem gives a condition on for the test to be powerful.

###### Theorem 2.2.

Let be fixed levels in , be a positive constant, be a symmetric kernel function, and be the test defined by (10). Let be an upper bound for . Then for all , we have , as soon as

 ∥f∥2 ≥∥f−K[f]∥2+16(∥f∥2∞+σ2)nβ +4√n(n−1)β(κDn,βln(2α)+√2(∥f∥2∞+σ2))√CK. (13)

The right hand side of the above inequality corresponds to a bias-variance trade-off. For particular choices of the kernel function , these terms will be upper bounded in Section 3.

### 2.3 Performance of the Monte Carlo approximation.

In this section, we introduce a Monte Carlo method used to approximate the conditional quantiles by as follows. We consider the set of independent sequences of i.i.d standard Gaussian variables

 {ϵb, 1≤b≤B}and{ϵ′b, 1≤b≤B},

where , , .
We define

 V(ϵb,ϵ′b)K=1n(n−1)∑ni≠j=1K(Xi,Xj)ϵbiϵbj1n∑n/2i=1(ϵ′b2i−1−ϵ′b2i)2,

where are observed from model (2).
Under , conditionally on , the variables have the same distribution function as and as . We denote by the empirical distribution function of the sample , conditionally on .

 ∀x∈R,FK,B(x)=1BB∑b=11{V(ϵb,ϵ′b)K≤x}.

Then the Monte Carlo approximation of is defined by

 ^q(X)K,1−α=F−1K,B(α)=inf{t∈R, FK,B(t)≥1−α}.

We recall the test function defined in (10) and we reject when with the quantile of defined by (9) conditionally on . Now, by using the estimated quantile , we consider the test given by

 ˆΦK,α=1{VK>^q(X)K,1−α}. (14)

For the test defined in (14), the probabilities of first and second kind errors can above upper bounded. This is the purpose of the two following propositions, whose proofs are given in Fromont et al. (2013).

###### Proposition 2.3.

Let be some fixed level in , and be the test defined by (14). Then,

 P(H0)(ˆΦK,α=1∣∣∣X)≤Bα+1B+1.
###### Proposition 2.4.

Let and be fixed levels in such that and . Let be the test given in (14). Let and as in Proposition 2.1, and let be the quantile of . If

 ⟨K[f],f⟩>√16AK+8BKβ+Dn,βB qαBK,1−βB/2, (15)

then . Moreover,

 qαBK,1−βB/2≤2κ√n(n−1)ln(2αB)√2∫E2K2(x,y)dν(x)dν(y)βB. (16)

Comments. When comparing (15) and (16) with (11) and (12) in Proposition 2.1, we notice that they asymptotically coincide when . Moreover, if and , the multiplicative factor of is of order in (16) compared with (12).

## 3 Two particular examples of kernel function.

In this section, we specify the performances of the above test for two examples of the kernels including projection kernels and Gaussian kernels.

### 3.1 Projection kernels.

We assume . We consider the projection kernel defined in (7) and aim to give a more explicit formulation for the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Besov bodies.

###### Corollary 3.1.

Let and be a constant. Let be defined in (10), where is the projection kernel defined by (7). We denote by the linear subspace of , generated by the functions , and we assume that the dimension of is equal . Then if

 ∥f∥2 ≥∥f−ΠS(f)∥2+16(∥f∥2∞+σ2)nβ +4√D√n(n−1)β(κDn,βln(2α)+√2(∥f∥2∞+σ2)),

then

 Pf(ΦK,α=0)≤β.

Let us consider the particular case when the kernel is the projection kernel onto the space generated by functions of the Haar basis defined as follows.
Let be the Haar basis of with

 ϕ0(x)=1[0,1](x)andϕj,k(x)=2j/2ψ(2jx−k), (17)

where . The linear subspace is generated by a subset of the Haar basis. More precisely, we denote by the subspace of generated by , and we define

 K0(x,x′)=ϕ0(x)ϕ0(x′). (18)

We also consider, for the subspace generated by with , and

 KJ(x,x′)=∑λ∈{0}∪ΛJϕλ(x)ϕλ(x′). (19)

We set and for every , .
We now introduce the Besov body defined for by

 Bδ2,∞(R)=⎧⎨⎩f∈L2([0,1],dν), f=α0ϕ0+∑j∈N2j−1∑k=0αj,kϕj,k /α20≤R2, ∀j∈N, 2j−1∑k=0α2j,k≤R22−2jδ⎫⎬⎭.

For all , we consider the kernel function defined by (18), (19) and the associated test function defined in (10) with . For an optimal choice of , realizing a good compromise between the bias term and the variance term appearing in (2.2), we give a condition of for which ensures that the power of our test is larger than .

###### Proposition 3.2.

Let . For all , let defined by (18), (19) and consider the test function where

 J∗=[log2(n2/(1+4δ))]. (20)

For all such that

 ∥f∥2≥C(α,β,σ,R,∥f∥∞)n−4δ/(1+4δ), (21)

we have .

1. Non asymptotic lower bounds for the rates of testing in signal detection over Besov bodies are given in Baraud et al. (2002). These lower bounds coincide with the bound given in (21), hence our result is sharp.

2. In (20), depends on , the regularility parameter of the Besov body, so it leads to the natural question of the choice if this parameter. In order to propose a procedure that is adaptive with respect to the regularity of the unknown regression function , we introduce aggregated tests in Section 4.

### 3.2 Gaussian kernels.

For this second example, we assume that . We consider the Gaussian kernel defined in (8) and rewrite the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Sobolev balls for this test.

###### Corollary 3.3.

Let , be a constant and be the test function defined in (10) where is defined in (8). For if

 ∥f∥2 ≥∥f−kh∗f∥2+16(∥f∥2∞+σ2)nβ +4∥ν∥∞(2π)1/4√n(n−1)βh(κDn,βln(2α)+√2(∥f∥2∞+σ2)). (22)

We obtain that

 Pf(ΦK,α=0)≤β.

Let and . For in and , for all , we consider

 Kl(x,y)=12−lk(x−y2−l), (23)

with

 k(u)=1√2πexp(−u22).

Let us introduce for the Sobolev ball defined by

 Sδ(R)={s:R→R / s∈L1(R)∩L2(R), ∫R|u|2δ|^s(u)|2du≤2πR2},

where

denotes the Fourier transform of

: .
For all , we consider the kernel function defined by (23) and the associated test function defined in (10) with . For an optimal choice of , realizing a good compromise between the bias term and the variance term appearing in (3.3), we give a condition of for which ensures that the power of our test is larger than .

###### Proposition 3.4.

Let . For all , let defined by (23) and the test function we set

 l∗=[log2(n21+4δ)]. (24)

For all such that

 ∥f∥2≥C(α,β,σ,R,∥f∥∞)n−4δ/(1+4δ). (25)

We have .

1. As in Proposition 3.2, we obtain in the right hand term of (25) a classical bound for the separation rates of testing over regular classes of alternatives such as Holderian balls (see Ingster (1993)) for nonparametric minimax rates of testing in various setups.

2. Non asymptotic lower bounds for the rates of testing in signal detection over Sobolev balls are given in Fromont and Lévy-Leduc (2006). These bounds coincide with the bound given in (25).

3. In (24), as previously, depends on , the regularity parameter of the Sobolev ball, so it leads to the natural question of the choice of this parameter answered through the aggregated tests in Section 4.

## 4 Multiple or aggregated tests based on collections of kernel functions.

In the previous section, we have considered testing procedures based on a single kernel function . However, the following question is natural: how can we choose the kernel, and its parameters. For instance, the orthonormal family in the projection kernel in Section 3.1, the bandwidth in the Gaussian kernel in Section 3.2. Baraud et al. (2003) proposed adaptive testing procedures based on the aggregation of a collection of tests. This idea is presented in a series of papers, among which Fromont et al. (2013) proposed an aggregation procedure. Following this idea, we consider in this section a collection of kernel functions instead of a single one. Beside that, we define a multiple testing procedure by aggregating the corresponding single tests, with an adapted choice of the critical values.

### 4.1 The aggregated testing procedure.

Let us describe the aggregated testing procedure by introducing a finite collection of symmetric kernel functions: . For , we replace in (3) and (9) by to define and and let be a collection of positive numbers such that . Conditionally on , for , we denote by the quantile of . Given in , we consider the test which rejects when there exists at least one in such that

 VKm>q(X)m,1−u(X)αe−wm,

where is defined by

 u(X)α=sup{u>0, P(supm∈M(VKm−q(X)m,1−ue−wm)>0∣∣∣X)≤α}. (26)

We consider the test function defined by

 Φα=1{supm∈M(VKm−q(X)m,1−u(X)αe−wm)>0}. (27)

Using the Monter Carlo method, we can estimate and the quantiles for all . The following theorem provides a coltrol of the first and second kind error for the test . The detailed proof is given in the Appendix.

###### Theorem 4.1.

Let be fixed levels in and be the test defined by (27). We have

 P(H0)(Φα=1)≤α. (28)

And for all regression function , we have

 Pf(Φα=0)≤β, (29)

as soon as there exists in such that

 P(VKm≤q(X)Km,1−αe−wm)≤β.

Comments. This theorem shows that the aggregated test is of level , for all . Moreover, as soon as the second kind error is controlled by for at least one test in the collection, the same holds for the aggregated procedure with the price that the level is replaced by to guarantee that the aggregated procedure is of level .

### 4.2 The aggregation of projection kernels.

Let us specify the performance of the aggregated test for a collection of projection kernels.

###### Corollary 4.2.

Let be fixed levels in . Let be a finite collection of linear subspaces of , generated by the functions and we assume that the dimension of is equal to . We set, for all , . Let be defined by (27) with the collection of kernels and the collection of positive numbers such that .

Then is a level test. Moreover,