Maxiset point of view for signal detection in inverse problems

03/15/2018 ∙ by Florent Autin, et al. ∙ 0

This paper extends the successful maxiset paradigm from function estimation to signal detection in inverse problems. In this context, the maxisets do not have the same shape compared to the classical estimation framework. Nevertheless, we introduce a robustified version of these maxisets, allowing to exhibit tail conditions on the signals of interest. Under this novel paradigm we are able to compare direct and indirect testing procedures.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the last 20 years, the assessment of the performance of nonparametric function estimation methods mainly relied on the asymptotic minimax and oracle approaches. More marginally used, the maxiset paradigm has been proved to be very useful to accurately describe the behaviour of some estimation procedures. In some cases, it allows to distinguish methods having comparable minimax performance. The question of adapting the maxiset concepts to the signal detection framework was often raised. This is the aim of this paper to rigorously extend this point of view to the signal detection framework and to discuss new related outcomes.

To this end, we will deal all along the paper with the Gaussian sequence space model

(1)

where denotes the observations, is a subset of , a non negative unknown sequence of interest, a given sequence of non negative real numbers, a noise level in and

a sequence of i.i.d. standard Gaussian random variables. The model (

1

) allows to describe several situations, as e.g. nonparametric regression or estimation of a function blurred by white noise. For more details on these models and their connection with the Gaussian sequence space model we refer the interested reader to

Tsybakov (2009). For the sake of convenience, we will consider hereafter that . We also stress that the model (1) allows to deal with so-called inverse problem models as described in Cavalier (2011). In such a setting, one is interested in doing inference on a function in some Hilbert space from indirect and blurred observation of the form

(2)

where denotes a compact operator acting from to another Hilbert space , a noise level and a Gaussian white noise. In particular, the sequence

can be identified as the sequence of eigenvalues of the operator

and the sequence as the one of the coefficients of

in the singular values decomposition (SVD) basis associated to the operator

.

For inverse problem models, the minimax paradigm has been widely used in order to assess the performance of estimation procedures. Roughly speaking, given a structural constraint on the vector

of interest, typically of the form for some , one measures the performance of a given estimator through its maximal risk

where

denotes a given loss function. This paradigm has been widely used and discussed over the years. In several situations, a precise bound can be obtained on

which allows to characterize how the maximal risk decreases with respect to the noise level .

More precisely, one can often exhibit a non decreasing positive sequence , the so–called rate of convergence associated to the estimation procedure with noise level , such that for some positive constant . If this rate appears to be the smallest possible one, namely if there exists a positive constant such that

the sequence is called minimax rate of convergence over . In the previous inequality, the infimum is taken over all possible estimators of . We refer, e.g., to Tsybakov (2009), Johannes et al. (2011) for a non-exhaustive reference list.

Under the minimax estimation paradigm, the performance of two given procedures can be compared through their respective rates of convergence according to a chosen functional set . However, it does not always allow for comparison if both procedures are ’minimax-optimal’. In addition the used criterium is quite pessimistic: the risk is measured at the slowest possible estimation precision over the set . Hence, it does not provide a fair comparison. To tackle these issues, an alternative point of view has been proposed in the seminal paper of Kerkyacharian and Picard (2002). The main idea can be stated as follows; given an estimation procedure and a sequence of rates , can we determine the set of sequences than are estimated by at the rate ? If yes, the set is called the maxiset associated to the procedure for the rate . Under this paradigm, the best performing procedure, i.e., the ’maxiset-optimal’ procedure, is the one whose associated maxiset strictly includes the maxisets of the others. Note that a very usual criticism concerns the situation where estimation methods have non nested maxisets. Autin et al. (2012) discuss this important aspect of the maxiset approach explaining that, first, it is is somehow normal to find that some estimation methods are better in estimating some specific functions. In such a case, examining the ’form’ of the maxiset will bring interesting information. Second, it may be possible to combine these procedures such that the maxiset combined procedure contains the union of the maxisets. The maxiset point of view has been generalized to various settings, see, e.g., Autin (2006), Rivoirard and Tribouley (2008) or Hohage and Weidling (2017).

In the framework of signal detection, the minimax point of view has been widely investigated and was very fruitfully applied. We refer to Ingster and Suslina (2003), Baraud (2002), Ingster et al. (2012) or Laurent et al. (2012) among others. Nevertheless, as in the estimation case, the minimax paradigm does not allow for a fully satisfying comparison between different testing procedures. The extension of the maxiset theory to this setting is a doorway to novel informative and rigorous math-stat study of these procedures. A flavor of the maxiset approach in signal detection framework has been discussed in Autin et al. (2014). Nevertheless, the proper adaptation of this approach to the signal detection framework is a challenging problem that we tackle in this paper. We further discuss some new issues related to this theory. In particular, in this work, we aim at

  • highlighting the link between the space and the sequence that both appear in the alternative hypothesis of the testing problem for different procedures based on -statistics,

  • comparing inverse and direct approaches in the light of the maxiset point of view.

In Section 2, we recall the minimax paradigm and then present the maxiset point of view for signal detection problems. Thereafter, in Section 3, we state maxiset results for both the inverse approach and the direct one (see Theorems and ). A crucial and perhaps surprising aspect in signal detection in inverse problem has been raised in Laurent et al. (2011) where they compared direct and indirect testing procedures, ie., from the minimax point of view. In Section 4 we succeed in comparing inverse and direct approaches in the light of the maxiset approach in many cases (see Proposition ) as for instance the moderately ill-posed inverse problem (see Proposition ).

Following a brief conclusion on the novelty of our results in Section 5, we postpone in Section 6 all the related proofs.

2 Signal detection in inverse problems

2.1 The minimax paradigm for inverse problems in signal detection

We consider the sequence space model (1)

The signal detection problem aims at determining whether or not the observations contains some signal. This question can be formalized as the following hypothesis testing problem:

(3)

for some non decreasing positive sequence depending on .
In the alternative hypothesis , the set denotes a subset of . The requirement can be thought either as a structural constraint on the signal or as a regularity condition on the underlying function in model (2). In the same time, the constraint

corresponds to an energy condition that allows to quantify the amount of signal available in the observations. Another problem closely related to signal detection is pattern recognition. In this case, one aims at testing the adequation between the observations and a given reference signal

. Having in mind that up to the change of variable , these two problems are equivalent we shall only focus in the sequel on the signal detection problem.

In the sequel we denote as a testing procedure. It is a measurable function of , such that , with the convention that we reject if - that corresponds to accept

- and do not reject the null hypothesis

otherwise. Given a level , is called a level- test if and only if

The risk under the alternative hypothesis is often measured through the maximal Type-II error over the set

, as

In particular, given a level , the level- test is said to be powerful if its maximal Type-II error can be bounded by , namely . In this context, the minimax paradigm has been at the core of several investigations over the last decades. Given both and some fixed levels and a given set , the separation rate associated to a given level -test is defined as

Typically, we expect that as , although it strongly depends on the considered setting. The minimax separation rate associated to the testing problem (3) for a given set is then defined as where

where the infimum is taken over all possible level- tests . We refer to Ingster and Suslina (2003) or Baraud (2002) for exhaustive discussions on theses definitions. Determining the minimax rate of convergence for a given problem is quite informative. For substantial account on the subject, see e.g., Ingster and Suslina (2003), Baraud (2002), Butucea (2007), Laurent et al. (2012) or Lacour and Pham Ngoc (2014) among others.

2.2 Inverse and direct approaches

Several testing procedures have been proposed in order to deal with the testing problem (3). In this section we will focus on two

-based test statistics that have been proved to perform well in some minimax settings.


The amount of signal contained in the observations can be measured through the quantity . Then, a natural way to provide a decision rule is to construct an estimator of this quantity.

Inverse approach (IP): According to the sequence model (1), each coefficient can be estimated by provided that the sequence of is known. For a given non decreasing sequence of non negative integers , this leads to the testing procedure

(4)

and

is a threshold value that allows to control the Type-I error of the test. The integer

plays a similar role than a regularization parameter in estimation. According to the choice of the set , specific choices for are available. We refer, e.g., to Laurent et al. (2012) for more details. We stress that a weighted variant of this procedure has been proposed and investigated in several papers, as e.g. Ingster et al. (2012), allowing to obtain sharp asymptotic results.

Direct approach (DP): Since the sequence model (1) is derived from the model (2), we remark that the test (4) is essentially based on an inversion of the operator at hand. Such an inversion appears to be quite natural in an estimation context in which we provide a reconstruction of the unknown function . In a signal detection framework, this is no longer required. Indeed, setting (i.e. for all ), we can remark that both assertion and are equivalent. In other words, the testing problem, for some non decreasing positive sequence :

(5)

for some set only differs from (3) by the alternative. In some sense, (5) does not take into account the fact that the data are distorted by a compact operator: we threat the data as a ‘direct’ problem and deal with a model of the form

(6)

where . Consequently, for a given non decreasing sequence of non negative integers we can introduce the test as

(7)

and corresponds to an estimator of and denotes an appropriate threshold, allowing a control of the Type-I error. This test provides interesting performance when dealing with (5). But surprisingly, this is also the case for the testing problem (3) in some specific situations. We refer to Laurent et al. (2011) for an extensive discussion on the subject with a minimax point of view. One of the aim of this paper is to complete this discussion using a maxiset point of view. This notion is extended to the signal detection context in the next section.

2.3 The maxiset point of view in signal detection problem

In nonparametric function estimation, an estimator is a sequence possibly indexed by a regularization (smoothing) parameter and by the noise level , is a sequence . In the minimax setting, given a functional set , we determine sequence of rates such that for any

(8)

for some constant . Under the maxiset paradigm, we are giving a sequence of rates , and we exhibit the largest functional set for which (8) holds.

We will now adapt the maxiset point of view to the signal detection framework. Given a sequence , we determine the largest set for which the maximal Type-II error can be controlled with a prescribed error of our testing problem. This is formalized in the following definition.

Definition 2.1

For a fixed , let be a sequence of testing procedures and a decreasing sequence of non negative real numbers. The maxiset of associated to the separation rate , is the largest sequence space in such that, for all

This definition can be generalized in a straightforward way to the testing problem (5). In the following, we denote the maxiset as . Note that it clearly corresponds to the following set:

(9)

In Section 3, we shall derive an explicit expression of the maxisets for some tests based on statistics (see Theorem 3.1 and 3.2 below).

Following Definition 2.1 we let the reader be convinced that, for a given sequence of testing procedures , there is an embedding result between its maxisets associated with different choices of detection rates.

Proposition 2.1

Let and two sequences of detection rates such that as . Consider a sequence of testing procedures . Then, for some

The previous embedding entails that the set of detectable functions increases as soon as we relax the constraint on the lowest minimal energy required.

3 Testing procedures and their maxiset performance

In this section, we first provide a description of the maxisets associated respectively to the procedures and defined in  (4) and (7). From now on, we fix the two thresholds and involved in the definition of these testing procedures as

(10)

where denote two explicit constants that guarantee that the considered procedures have a Type-I error controlled by for all . Undoubtedly, the smaller is the bigger and are. For the sake of simplicity, we do not use the quantile of the respective test statistics and . Such a change will not modify the spirit of the results displayed in this paper but will induce more technical details.

Below we characterize the maxisets associated to the considered procedure for general separation rates. In such a setting, these sets are poorly informative but they highlight valuable information on the problem provided that we impose some structural constraints.

3.1 A general characterization of the maxisets

We start our investigations with a general description of the maxisets associated to the procedures and for any chosen rate of detection . To this end, we introduce the two sets and which will be of first importance in the sequel.

Definition 3.1

Let be an increasing sequence of non negative integers. Let be a decreasing sequence of non negative real numbers . For any we set

The following results emphasizes that these sets provide a characterization of the maxisets associated to the procedures and .

Theorem 3.1

Consider . Let and satisfying (10). Consider the two sequences of testing procedures and defined respectively in (4) and (7). We have the two following maxiset results for any choice of detection rates and :

  1. There exist two positive constants and depending on and such that:

    rewritten as: .

  2. There exist two positive constants and depending on and such that:

    rewritten as: .

Remark 3.1

In Section , we provide explicit values of the constants and (see (18) and (20)). The constants and are obtained from the values of and by replacing by .

Surprisingly, the maxisets in the testing case have a completely different form compared to results obtained in the estimation case. Indeed, according to Kerkyacharian and Picard (2002), the constraint that a given procedure attains the rate induces a tail constraint on the signal of interest in the estimation problem. This is no more the case in the signal detection problem. The Theorem 3.1 above indicates that the procedures and are able to detect only signals satisfying, for any such that

(11)
(12)

and for large enough. In particular, the constraint (12) indicates that there should be enough signal on the frequencies investigated by the test statistics . Nothing is said regarding the high frequencies, i.e. the coefficient after the rank . The maxiset results of Theorem 3.1, constrasts with usual ones, since one does not describe maxisets in terms of smoothness spaces. Moreover, this constraint has already been highlighted in, e.g. Laurent et al. (2012). Hence, the maxiset paradigm is poorly informative in such a context. In the following section, we prove that an additional structural assumption on the maxiset provides valuable informations on the signal that can be detected by the procedures we are interested in.

3.2 A robust version of maxisets for tests

The main spirit of the previous section is that the functions that can be detected by the procedures and have enough energies for low frequencies. It means in particular that our testing procedures are very sensitive to the trend of the signal. In what follows, we shall require some robustness of our procedure with respect to this low frequency part of the signal, provided we have enough information. Indeed, in many practical situation, the signal is preprocessed or filtered and we want to have theoretical guarantees about signal detection remaining still valid in this context.

This structural constraint on the maxiset of interest, can be reformulated in a more formal way as follows:

Definition 3.2

A set satisfies the decimation constraint if

where for any and , .

We stress that such a condition is for instance satisfied by all the sets of the form

for some positive sequence and positive constants . Such a set describes some smoothness conditions through the decay of the coefficients of the function of interest.

Hereafter we define two sequence spaces that are restriction of and to sequences satisfying the decimation constraint and that depend on the chosen detection rates and appearing in the definition of maxiset given in (9).

Definition 3.3

Let be a decreasing sequence of non negative real numbers and be an increasing sequence of non negative integers. For any we set

If we are searching for the largest set, satisfying both the requirement of Definition 2.1 and robust with respect to decimation, we define the so-called robust maxisets and we retrieve exactly the sets introduced in the previous definition, up to some constants.

Undoubtedly, following the definition of , and the sequence spaces above can be identical to the empty space. In the sequel, we especially focus on the cases where these sequences spaces are not the empty space.

Definition 3.4

For any chosen sequences , and , we say that is -admissible (respectively -admissible) if and only if (respectively ) is not the empty space.

Remark 3.2

Following Definition 3.4, note that is -admissible and -admissible for rates of detection that do not converge to zero too fast as tends to zero.

Theorem 3.2

Consider . Let and satisfying (10). Consider the two sequences of testing procedures and defined respectively in (4) and (7). We then respectively denote them by and the respective robust maxisets associated with the chosen rates and . We have the following maxiset results:

  1. If is -admissible, then:

    (13)

    rewritten as:

  2. If is -admissible, then:

    (14)

    rewritten as:

Remark 3.3

The constants stated in Theorem 3.2 are similar to those in Theorem 3.1.

We observe that, as in the estimation case, the maxiset with a decimation constraint depends on the tail of the sequence of interest. Note that in the framework of signal detection, the situation is much more intricate than in estimation since one has several parameters to deal with: the rate of convergence , the nature of the operator eventually involved in the inverse signal problem detection and the Type-II error that has to be controlled.

According to the relative growth of the levels of possible energies of the signal and the sums of the power of the eigenvalues of the operator involved in the signal detection problem, the nature of the maxiset related to the sequence of testing procedures might be different. Consider the case where the sequence of testing procedures is . There are two extreme situations:

  • First case: as . In this case, Theorem 3.2 implies that the robust maxiset is empty. It means that under the considered noise level , whatever the signal we consider is, our procedure is never robust under decimation.

  • Second case: as . In this case, the robust maxiset is non empty and does not depend on the operator. In particular, provided that we have enough energy in our signal, the performance of our detection procedure does not depend on the underlying inverse problem we are considering.

The transition case where the two sequences and are equally balanced. Here, the robust maxiset can be explicitly embedded as follows

for some positive constant . The set on the right-hand side of the previous embedding provides a control on the tail of the sequence of interest by the considered rate . In particular, the faster the sequence converges toward , the smoother the detectable function.

Note that since the sequence space may be empty or not, depending on the value of , one cannot conclude that is non-empty in whole generality. Similar comments are also valid if we considering the sequence of testing procedures and if we compare the behavior of the two sequences and in the sequence space as .

4 Comparison of direct and inverse approaches

In this section, we will take advantage of tools developed in the previous section in order to compare the direct and inverse approaches in a signal detection.

Indeed, we have seen that both problems (3) and (5) only differ by there alternatives. The tests and have been specially designed in order to answer separately to each of these problems. Now, a challenging question is to compare the alternative and to check whether the inverse (resp. direct) approach is pertinent for the problem (5) (resp. (3)). This comparison will be provided under the maxiset paradigm, using the robust version displayed in Section 3.2 above. To improve readability of our results, we now denote instead of , when denoting the robust maxisets.

In order to provide a fair comparison between both testing procedure, we have to state a dependency between the rates and . Indeed, both procedures do not come up in the same space. For instance, in the minimax paradigm, the rates are often faster for ’direct’ alternative than for the inverse case. Below, we fix this dependency according to previous calibration that has been investigated in the minimax paradigm (see, e.g. Laurent et al. (2011)). Concerning the regularization parameter , we will keep the same value for both testing procedure: the idea is to work with the same number of coefficients (same amount of information).

Proposition 4.1

Fix . Choose , and such that, for any , . Then, provided that

(15)

we get

(16)

This proposition indicates that all the functions that can be detected by , can be also be detected by . In other words, the direct test appears to be more efficient in the sense that its associated maxiset is larger. One may ask the question of the strict inclusion. In order to provide an answer, we will consider a specific setting and prove in particular that the inverse testing procedure may miss some functions that can be detected by the direct case.

We are now considering the classical setting of the moderately ill-posed inverse problem, namely we assume that for some and any , . We also assume that we are in the case where the calibration is the minimax one. In this case the two terms and are equally balanced, so that we are in the limit case described in Section 3.1.

Proposition 4.2

Let . Consider the case where for any , . Assume that and with . Then, there exist functions such that

Remark 4.1

With the choice of operator given in Proposition 4.2, (15) is clearly satisfied and therefore (16) holds.

5 Conclusion

In this paper, we adapt the maxiset approach for signal detection in inverse problems. This novel tool for assessing the performance of testing procedures has been exposed to different classical settings. In particular, it allows to compare the so-called direct and inverse approaches. We have established that direct methods are associated to strictly larger maxisets in many cases, which make such testing procedure more interesting for practical purpose.

This contribution provides a novel way for researchers to assess the performance of their testing procedures. At the core of our future investigations will be adaptation of our methods to the operator setting leading to the new concept of maxi-class.

In order to conclude this discussion, we mention that we are aware of a recent paper of Ermakov (2018), that has been recently published in a similar setting while we were finalizing this article. Although this paper also provide a definition of maxiset in testing context, we stress that it uses different constraints on the set of interest. Moreover, it does not consider the inverse problem setting, while we provide a comparison between direct and inverse approach. In our opinion, both contributions are complementary and reveals different aspects of the same problem.

6 Proofs

6.1 Technical results

In this section, we recall and slightly extend some results that will be useful in the following. More details regarding these results, e.g., context and extended discussions, can be found in Baraud (2002) and Laurent et al. (2012).

Proposition 6.1

Fix . There exists and such that, for all and

Proof: We start with the proof of item (i). A more precise proof is provided in Laurent et al. (2012). In particular, the authors take advantage on available results on weighted statistics. This allows a better dependency of the constant w.r.t. the values of and . For the sake of completeness, we reproduce a simpler version of the proof, based on Markov Inequality. Recall that we defined, for any , as and for some constant . Then

Let be such that:

(17)

Provided that , by using the Bienayme-Chebyshev inequality, one gets

The last inequality is obtained if is large enough. More precisely, we can choose which depends on both and as:

(18)
Remark 6.1

We let the reader be convinced that the smaller the larger the chosen and .

We now prove the item (ii) of Proposition 6.1. To prove that is equivalent to prove that

(19)

To show this inequality, we apply Lemma 2 of Laurent et al. (2012) with and . Setting , we then get that (19) holds provided that

Observe that