# Multi-level Bayes and MAP monotonicity testing

In this paper, we develop Bayes and maximum a posteriori probability (MAP) approaches to monotonicity testing. In order to simplify this problem, we consider a simple white Gaussian noise model and with the help of the Haar transform we reduce it to the equivalent problem of testing positivity of the Haar coefficients. This approach permits, in particular, to understand links between monotonicity testing and sparse vectors detection, to construct new tests, and to prove their optimality without supplementary assumptions. The main idea in our construction of multi-level tests is based on some invariance properties of specific probability distributions. Along with Bayes and MAP tests, we construct also adaptive multi-level tests that are free from the prior information about the sizes of non-monotonicity segments of the function.

## Authors

• 3 publications
• 1 publication
• ### On One Problem in Multichannel Signal Detection

We consider a statistical problem of detection of a signal with unknown ...
12/18/2017 ∙ by Evgeny Burnaev, et al. ∙ 0

• ### On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing

We consider the probabilistic group testing problem where d random defec...
08/04/2018 ∙ by Huseyin A. Inan, et al. ∙ 0

• ### Multi-Level Group Testing with Application to One-Shot Pooled COVID-19 Tests

One of the main challenges in containing the Coronoavirus disease 2019 (...
10/12/2020 ∙ by Alejandro Cohen, et al. ∙ 0

• ### High Dimensional Multi-Level Covariance Estimation and Kriging

With the advent of big data sets much of the computational science and e...
01/01/2017 ∙ by Julio E. Castrillon-Candas, et al. ∙ 0

• ### Approximate Information Tests on Statistical Submanifolds

Parametric inference posits a statistical model that is a specified fami...
03/20/2019 ∙ by Michael W. Trosset, et al. ∙ 0

• ### Consistency of invariance-based randomization tests

Invariance-based randomization tests – such as permutation tests – are a...
04/25/2021 ∙ by Edgar Dobriban, et al. ∙ 0

• ### Bayesian Goodness of Fit Tests: A Conversation for David Mumford

The problem of making practical, useful goodness of fit tests in the Bay...
03/29/2018 ∙ by Persi Diaconis, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The literature on non-parametric monotonicity testing deals usually with the model

 Y=f(X)+ξ,

where

is a scalar dependent random variable,

a scalar independent random variable, an unknown function, and an unobserved scalar random variable with

. We are interested in testing the null hypothesis,

that is increasing against the alternative, that there are and such that and . The decision is to be made based on the i.i.d. sample from the distribution of . Typical applications of monotonicity testing are related to econometric models, see, e.g., Chetverikov [4].

Usual approaches to this problem have in their core simple heuristic ideas and assumptions. So, the tests proposed in Gijbels et. al.

[9] and Ghosal, Sen, and van der Vaart [8] are based on the signs of . Hall and Heckman [10]

developed a test based on the slopes of local linear estimates of

. Along with these papers we can cite Schlee [15], Bowman, Jones, and Gijbels [2], Dümbgen and Spokoiny [6], Durot [7], Baraud, Huet, and Laurent [1], Wang and Meyer [17], and Chetverikov [4]. As to typical hypothesis about , it is often assumed that is a Lipschitz function, i.e.,

 |f(y)−f(x)|≤L|y−x|,

where the constant may be known or unknown.

In this paper, we look at the problem of monotonicity testing from a little different and less intuitive viewpoint. As we will see below, our approach permits, in particular, to understand links between this problem and sparse vectors detection and to construct new powerful tests. In order to simplify technical details and to get rid of supplementary assumptions, we begin with monotonicity testing of an unknown function

, in the so-called white noise model similar to that one considered in

[6]. So, it is assumed we have at our disposal the noisy data

 Y(t)=f(t)+σn(t), t∈[0,1], (1)

where is a standard white Gaussian noise and is a known noise level. With the help of these observations we want to test

 the null hypothesis H0: f′(t)≥0,for all t∈[0,1], vs. the alternative H1: f′(t)<0,for some t∈[0,1].

Our approach to this problem is based on estimating the following linear functionals:

 θh,t(f)def=1h∫t+htf(u)du−1h∫tt−hf(u)du

for all that are admissible, i.e., such that . It is clear that may be interpreted as approximations of the derivative since

 limh→0θh,t(f)h=f′(t),

for any given .

With the help of (1), the functionals are estimated as follows:

 ^θh,t(Y)=1h∫t+htY(u)du−1h∫tt−hY(u)du

and these estimates admit the obvious representation

 ^θh,t(Y)=θh,t(f)+σhξh,t, (2)

where

Notice that if is true, then for all admissible , otherwise ( is true) there exist such that . That is why in what follows we will focus on testing

 \textslthenullhypothesisH0: θh,t(f)≥0, for all admissible%  h,t\textslvs.thealternativeH1: θh,t(f)<0, for some admissible h,t (3)

based on the observations (2).

Let us denote for brevity

 θh,t=θh,t(f),^θh,t=^θh,t(Y).

In order to explain our approach to the problem (3), we begin with the simple case assuming that are given. So, we have to test two composite hypotheses

 Hh,t0:θh,t≥0  vs. Hh,t1:θh,t<0.

Intuitively, the most powerful test with the type I error probability

rejects if

 ^θh,t≤−σhtα, (4)

where is

-value of the standard Gaussian distribution, i.e., a solution to

 Φ(tα)=1−α,

where

 Φ(x)=1√2π∫x−∞exp(−x22)dx.

Of course, there exist a lot of motivations for this test. In this paper, we make use of the so-called improper Bayes approach assuming that in (2

) is a random variable uniformly distributed on the interval

, if is true, and on if is true. So, we observe a random variable with the probability density

 pA0(x|Hh,t0 is true)=1A∫A0exp[−(x−θ)22σ2h]dθ

and

 pA1(x|Hh,t1 is true)=1A∫0−Aexp[−(x−θ)22σ2h]dθ.

Thus, we deal with the simple hypothesis testing and by the Neyman-Pearson lemma, the most powerful test at significance level rejects when

 pA1(^θh,t)pA0(^θh,t)≥tAα.

Taking the limit in this equation as , we arrive at the improper Bayes test that rejects if

 S(^θh,tσh)≥t′α, (5)

where

 S(x)=∫0−∞exp[−(x−θ)2/2]dθ∫∞0exp[−(x−θ)2/2]dθ=1Φ(x)−1. (6)

Since is decreasing in , the tests (4) and (5) are obviously equivalent.

In what follows, we will make use of the following asymptotic result:

 S(x)= [1+O(1x2)]√2π(1−x)exp(x22), as x→−∞. (7)

Along with this method, one can apply the maximum likelihood (ML) or minimax approaches. Finally, all these methods result in (4) but their initial forms are different. For instance, the ML test rejects when

 (8)

Emphasize that from a viewpoint of testing vs. there is no difference between (8) and (5), but the aggregation of these methods for testing vs. from (3) results in different tests. In this paper, we make use of the tests defined by (5) since their aggregation is simple.

In order to aggregate the statistical tests, we will make use of the so-called multi-resolution approach assuming that

1. belongs to the following set of dyadic bandwidths

 Hdef={12,14,…12k,…};
2. belongs to the family of dyadic grids , defined by

 Ghdef={h,3h,…,1−h}, h∈H.

There are simple arguments motivating these assumptions

• random variables and in (2) are independent if . This fact simplifies significantly the statistical analysis of tests.

• are the Haar coefficients admitting a fast computation in the discrete version of (1).

## 2 Testing at a given resolution level

Let us fix some bandwidth and denote for brevity by . In this section, we focus on testing

 the null hypothesis Hh0: θh,t≥0 for all t∈Gh vs. the alternative Hh1: θh,t<0 for some t∈Gh.

In order to construct Bayes and MAP tests, we assume that for given

• the set contains the only one negative entry ;

• is an unobservable random variable uniformly distributed on .

### 2.1 A Bayes test

With the arguments used in deriving (5), we get the following Bayes test: is rejected if

 1nh∑t∈GhS(^θh,tσh)≥tBα,

where is defined by (6). The critical level is defined by a conservative way, i.e., as a solution to

 maxΘ≥0PΘ{1nh∑t∈GhS(^θh,tσh)>tBα}=α,

where here stands for the measure generated by observations defined by (2) for given .

It follows from Mudholkar’s theorem [12], see also Theorem 6.2.1 in [16], that for any with nonnegative entries

 (9)

and, thus, may be computed as a solution to

 P{1nh∑t∈GhS(ξh,t)≥tBα}=α. (10)

Therefore our next step is to study the following random variable:

 Bh(ξ)def=1nh∑t∈GhS(ξh,t).

#### 2.1.1 A weak approximation of Bh(ξ)

We begin with computing a weak limit of as . Recall some standard definitions (see, e.g., [13]).

###### Definition.

Let and be independent copies of a random variable . Then is said to be stable if for any constants and the random variable has the same distribution as for some constants and .

In the class of stable distributions there is an interesting sub-class of the so-called stable distributions with the index of stability . For brevity, we will call them 1-stable distributions. The formal definition of this class is as follows:

###### Definition.

A random variable

is called 1-stable if its characteristic function can be written as

 Eexp(itX)=exp(μit−|ct|−i2β|c|πtlog(|t|)). (11)

The next theorem shows that the weak limit of is a 1-stable distribution.

###### Theorem 1.
 limh→0Eexp{it[Bh(ξ)−log(nh)+γ]}=exp{itlog1|t|−π|t|2},

where is Euler’s constant.

In other words, this theorem states that

 limh→0[Bh(ξ)−log(nh)+γ]D=ζ,

where is a 1-stable random variable (see (11)) with

 μ=0, c=π2, β=1. (12)

Apparently, appeared firstly in [5]. Emphasize also that this random variable originate usually in Bayes hypothesis testing related to sparse vectors, see e.g. [3], [11].

The probability distribution of has the following invariance property that plays an important role in Bayes tests aggregation.

###### Proposition 1.

Let be i.i.d. copies of and be a probability distribution on with a bounded entropy. Then

 ∞∑k=1¯πk(ζk−log1¯πk)D=ζ. (13)

The proof of (13) follows immediately from (11) and (12).

#### 2.1.2 A strong approximation of Bh(ξ)

Theorem 1 is not very informative about the tail behavior of the distribution of . However, for obtaining a good approximation of in (10) this behavior may play a crucial role because in some applications may be very small (of order ) and so, the Monte-Carlo method and Theorem 1 may not be good in this case.

Therefore our goal is to find an approximation of that controls well the tail of its distribution. Fortunately, this can be easily done. It is clear that

 Φ(ξk)D=Uk,

where are i.i.d. random variables uniformly distributed on . Hence

 Bh(ξ)D=1nhnh∑k=1[1Uk−1]=1nhnh∑i=11U(k)−1,

where is a non-decreasing permutation of . The distribution of can be easily obtained with the help of the Pyke theorem [14]

 U(k)D=EkEnh+1, (14)

where

 Ek=k∑l=1ϰl

is the cumulative sum of i.i.d. standard exponentially distributed random variables

 P{ϰl≥y}=exp(−y).

In other words, . With this in mind, we obtain

 (15)

Next, we make use of the following simple equations:

 {E[∞∑k=nh+1(1Ek−1k)]2m}1/(2m)≤O(1√nh)

and

 nh∑k=11k=log(nh)+γ+O(1nh).

So, substituting them in (15), we arrive at the following theorem.

###### Theorem 2.

Let

 ζ∘=∞∑k=1(1Ek−1k). (16)

Then

 Bh(ξ)−log(nh)+γD=[1+O(εh)]ζ∘+2γ−1+O(εh)log(nh), (17)

where is such that

 [E(ε2mh)]1/(2m)≤Cm√nh. (18)
###### Remark.

The random variable in Theorem 1 admits the following representation

Notice also that it follows immediately from (17) that convergence rate in Theorem 1 is , i.e., as ,

 Eexp{it[Bh(ξ)−log(nh)+γ]}=exp{itlog1|t|−π|t|2+O(log(nh)√nh)}.

Figure 1 illustrates numerically Theorem 2 and the above remark showing log-tail approximation error

 Δ(x;nh)=log[P{Bh{ξ)−log(nh)+γ≥x}]−log[P{ζ∘+2γ−1≥x}].

computed with the help of the Monte-Carlo method with replications. This picture shows that even for small the approximation (17) works very good.

### 2.2 A MAP test

Similarly to the Bayes test, we can construct the MAP test that rejects if

 maxt∈Gh1nhS(^θh,tσh)≥tMα,

where is defined as a solution to

 maxΘ≥0PΘ{maxt∈Gh1nhS(^θh,tσh)>tMα}=α.

Similarly to (9), may be obtained from

 P{maxt∈GhS(ξh,t)nh>tMα}=α.

As to the limit distribution of , as , it follows immediately from (14) that

 maxt∈GhS(ξh,t)nhD=1+εhϰ, as h→0, (19)

where is a standard exponential random variable and satisfies (18).

## 3 Multi-level testing

### 3.1 MAP multi-level tests

A heuristic idea behind our construction of multi-level MAP tests for (3) is related to (19) and consists in computing a positive deterministic function bounding from above the random process where are independent standard exponential random variables. In other words, we are looking for such that

 ζU=suph∈H[log1ϰh−Uh]

would be a non-degenerate random variable.

Let be -value of , i.e., solution to

 P{ζU≥qUα}=α.

Therefore with (19), upper bounding random process by , we arrive at the test that rejects if

 suph∈H{maxt∈Ghlog[1nhS(^θh,tσh)]−Uh}≥qUα. (20)

Computing is based on the following simple fact. Assume that

 KU=log[∑h∈He−Uh]<∞.

Then

 suph∈H[log1ϰh−Uh]−KUD=log1ϰ. (21)

The proof of this identity is very simple. Indeed,

 P{suph∈H[log1ϰh−Uh]−KU>x}=1−∏h∈HP{log1ϰh≤Uh+x+KU}=1−exp{−∑h∈Hexp[−x−Uh−KU]}=1−exp[−exp(−x)].

Let us we denote

 ¯πh=e−Uh/∑h∈He−Uh,

then (21) can be rewritten in the following form:

###### Proposition 2.

Let be a probability distribution on . Then

 suph∈H[log1ϰh+log(¯πh)]D=log1ϰ. (22)

Therefore with the help of (21) we can compute -critical level in (20)

 qUα=qϰα+KU,

where

 qϰα=−log(log11−α)

is -value of .

Summarizing (see (20)), the MAP multi-level test rejects if

 suph∈H{ZMh+log(¯πh)}≥qϰα, (23)

where

 ZMh=maxt∈Ghlog[1nhS(^θh,tσh)]

and is a probability distribution on .

In order to study the performance of this method, we analyze the type II error probability. For given

and define

 Θρ,τ(A)={θh,t:θρ,τ=−A; θh,t≥0,(h,t)≠(ρ,τ)}. (24)

In other words, we consider the situation, where all shifts in (2) are positive except the only one. The position of the negative entry and its amplitude are unknown, but it is assumed that are random variables with the distribution defined by

• ,

• ,

where is a probability distribution on with a bounded entropy

 H¯π=∑h∈H¯πhlog1¯πh.

In what follows, we will deal with priors with large uncertainties assuming that , or more precisely, but such that

 lim¯π→01log[H¯π]∑h∈H¯πh∣∣∣H¯π−log1¯πh∣∣∣=0. (25)

In particular, we will consider the following class of prior distributions:

 (26)

This class is characterized by the bandwidth and the probability density , which is assumed to be continuous, bounded, and with

 Hν=∫∞0ν(x)log1ν(x)dx<∞,∫∞0ν(x)log(x+1)dx<∞. (27)

A typical example of a such distribution is the uniform one that corresponds to .

It is clear that as and that Condition (25) holds.

Let us begin with the case, where the prior distribution is known, the case of unknown will be considered later in Section 4.

The type II error probability over of the MAP test (23) is defined as follows:

 βMρ,τ(A)=supΘ∈Θρ,τ(A)PΘ{maxh∈H[ZMh+log(¯πh)]≤qϰα}.

Our goal is to study the average type II error probability

 ¯βM¯π(A)=∑h∈H¯πhnh∑t∈GhβMh,t(Ah),

where here and below .

Denote for brevity

 Rh(q,H)=2[q+log(nh)+H]−log[4π(q+log(nh)+H)]

and

 log∗(x)=log[log(x)],H∗¯π=log(H¯π).

The next theorem shows that is a critical signal/noise ratio. Roughly speaking, this means that if

 Ahσh¯π≤√Rh(qϰα,H¯π)+x

for any given , then the MAP multi-level test cannot discriminate between and . Otherwise, if

 Ahσh¯π≥√Rh(qϰα,H¯π)+√ϵH∗¯π,

for some , then reliable testing is possible.

In the next theorem, stands for the expectation w.r.t. .

###### Theorem 3.

Suppose (25) holds. If for some and

 lim¯π→01H∗¯πE¯π[(Ahσh−x)2+ϵH∗¯π−Rh(qϰα,H¯π)]+=0, (28)

then

 lim¯π→0¯βM¯π(A)≥(1−α)[1−Φ(x)]. (29)

If for some

 lim¯π→01H∗¯πE¯π[Rh(qϰα,H¯π)+2√ϵH∗¯πAhσh−A2hσ2h]+=0, (30)

then

 lim¯π→0¯βM¯π(A)=0. (31)

### 3.2 Multi-level Bayes tests

To construct these tests, let us consider the following statistics:

 ZBh=1nh∑t∈GhS(^θh,tσh)−log(nh)−γ+1, h∈H.

When all , in view of Theorem 2, these random variables are approximated by the family of independent and identically distributed random variables , defined by (16). An important property of this family is provided by (13), which is used in our construction multi-level Bayes tests. More precisely, the multi-level Bayes test rejects if

 ∑h∈H¯πh(ZBh−log1¯πh)≥q∘α,

where is -value of .

The type II error probability over (see (24)) is defined by

 βBρ,τ(A)=supΘ∈Θρ,τ(Aρ)PΘ{∑h∈H¯πh(ZBh−log1¯πh)≤q∘α}

and our goal is to analyze the average type II error probability

 ¯βB¯π(A)=∑h∈H¯πhnh∑τ∈GhβBh,τ(Ah).
###### Theorem 4.

Suppose (25) holds and for some and

 lim¯π→01H∗¯πE¯π[(Ahσh−x)2+ϵH∗¯π−Rh[log(q∘α),H¯π]]+=0, (32)

then

 lim¯π→0¯βB¯π(A)≥(1−α)[1−Φ(x)]. (33)

If for some

 lim¯π→01H∗¯πE¯π[Rh[log(q∘α),H¯π]+2√ϵH∗¯πAhσh−A2hσ2h]+=0, (34)

then

 lim¯π→0¯βB¯π(A)=0. (35)
###### Remark.

Notice that as

 log(q∘α)=(1+o(1))qϰα=(1+o(1))log1α.

Therefore, since as , conditions (28) and (32) along with (30) and (34) are almost equivalent. This means that in the considered statistical problem there is no substantial difference between MAP and Bayes tests.

The main drawback of the MAP and Bayes tests is related to their dependence on the prior distribution that is hardly known in practice. Therefore our next goal is to construct a test that, on the one hand, does not depend on , but on the other hand, has a nearly optimal critical signal-noise ratio.

In order to simplify our presentation, we will deal with the class of prior distributions defined by (26). The entropy of obviously satisfies

 H¯πω,ν=log(ω)+Hν+o(1), ω→∞, (36)

and therefore denote for brevity

 ˜Rh(q,ω)=2[q+log(nh)+log(ω)]−log[4π(q+log(nh)+log(ω)]. (37)

With (36), Condition (25) is checked easily and the next result follows immediately from Theorem 3.

###### Corollary 1.

If for some and

 limω→∞1log∗(ω)E¯πω,ν[(Ahσh−x)2+ϵlog∗(ω)−˜Rh(qϰα,ω)]+=0,

then

If for some

 limω→∞1log∗(ω)]E¯πω,ν[˜Rh(qϰα,ω)+2√ϵlog∗(ω)Ahσh−A2hσ2h]+=0,

then

In order to construct an adaptive test, let us compute a nearly minimal function in (21). We begin with

 ψ0(x)=1+log(x),x∈R+,

and then iterate this function times

 ψl(x)=ψ0[ψl−1(x)], l=1,…,m.

Finally, for given , define

 Lm,ε(k)=−log{1ε[ψm(k)]ε−1ε[ψm(k+1)]ε}, k∈Z+. (38)

Since , it is clear that

 ∞∑k