Minimax detection of localized signals in statistical inverse problems

12/10/2021
by   Markus Pohlmann, et al.
0

We investigate minimax testing for detecting local signals or linear combinations of such signals when only indirect data is available. Naturally, in the presence of noise, signals that are too small cannot be reliably detected. In a Gaussian white noise model, we discuss upper and lower bounds for the minimal size of the signal such that testing with small error probabilities is possible. In certain situations we are able to characterize the asymptotic minimax detection boundary. Our results are applied to inverse problems such as numerical differentiation, deconvolution and the inversion of the Radon transform.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

03/15/2018

Maxiset point of view for signal detection in inverse problems

This paper extends the successful maxiset paradigm from function estimat...
04/13/2021

On Minimax Detection of Gaussian Stochastic Sequences and Gaussian Stationary Signals

Minimax detection of Gaussian stochastic sequences (signals) with unknow...
02/18/2020

Adaptive minimax testing in inverse Gaussian sequence space models

In the inverse Gaussian sequence space model with additional noisy obser...
04/09/2020

On the asymptotical regularization for linear inverse problems in presence of white noise

We interpret steady linear statistical inverse problems as artificial dy...
10/16/2018

Clustering in statistical ill-posed linear inverse problems

In many statistical linear inverse problems, one needs to recover classe...
01/17/2020

Chebyshev Inertial Landweber Algorithm for Linear Inverse Problems

The Landweber algorithm defined on complex/real Hilbert spaces is a grad...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many practical applications one aims to infer on properties of a quantity which is not directly observable. As a guiding example, consider computerized tomography (CT), where the interior (more precisely the tissue density) of the human body is imaged via the absorption of X-rays along straight lines. Mathematically, the relation between the available measurements (absorption along lines, the so-called sinogram) and the unknown quantity of interest (the tissue density) is described by the Radon transform, which is an integral operator to be described in more detail later (cf. Figure 1

for illustration). Potential further applications include astronomical image processing, magnetic resonance imaging, non-destructive testing and super-resolution microscopy, to mention a few. Typically, the measurements are either of random nature themselves (as e.g. in positron emission tomography (PET, see

[28]), magnetic resonance imaging (MRI, see [18]) or super-resolution microscopy (see [25])) and/or additionally corrupted by measurement noise. This motivates us to consider the inverse Gaussian white noise model

(1.1)

with a (known) bounded linear operator mapping between (real or complex) Hilbert spaces and , noise level and a Gaussian white noise on (details will be given in section 2).

A major effort of research is devoted to the development and analysis of estimation and recovery methods of the signal

from the measurements (see Section 1.2 for some references). However, when is expected to be very close to some reference , by which we mean that either or deviates from by only a few localized components (anomalies), then instead of full recovery of , one might be more interested in testing whether or not. This is especially relevant, since, when the signal-to-noise level is too small for full recovery, then testing may still be informative as it is well-known to be a simpler task (see e.g. [27] and the references therein). Although of practical importance, testing in model (1.1) is a much less investigated endeavor than estimation and a full theoretical understanding has not been achieved yet. Hence, in this paper, we are interested in analyzing such testing methodology for inferring on based on the available data . Note that, due to the linearity of the model (1.1), we can w.l.o.g. assume that . Thus, we suppose that either (no anomaly is present) or (an anomaly given by is present), where for some (finite) class of non-zero functions, that are – in some sense – normalized, and the constant factor describes its orientation, and – more importantly – how “large” or “pronounced” the signal is. We consider the family of testing problems

(1.2)

where is a family of non-negative real numbers. This can be viewed as the problem of detecting an anomaly from the set .

We suppose that the family of classes is chosen in advance. This choice is crucial for the analysis of the problem and it depends solely on the specific application: For CT we might think of small inclusions such as tumors, cf. Figure 1, where certain waveletes are used as mathematical representation. If no a priori knowledge about potential anomalies is known, it is natural to start by considering dictionaries with good expressibility in , e.g. frames or wavelets, and set for subsets of . The particular choices that we analyze in this paper will be built from such dictionaries, see also [7] and [12] for recent references in the context of estimation.

(a) A reference image
(b) Sinogram of the reference image
(c) Noisy sinogram
(d) Distorted Image
(e) Sinogram of the distorted image
(f) Noisy sinogram
Figure 1: Illustration of structured hypothesis testing in the CT example. To infer whether the unknown signal deviates from a reference image, we use a test based on the noisy sinogram. In the above example, when the distortion is assumed to be a linear combination of certain wavelets (cf. Sections 3.2.3 and 4), then the results of Theorem 3.9 imply the existence of a test which is able to distinguish the distorted (0(d)) from the undistorted image (0(a)

) with type I and type II error both at most

, based on the measurements 0(f).

1.1 Aim of the paper

Given a family of classes , our main objective will be to assess to what extent powerful tests for the testing problem (1.2) exist. The answer will usually depend on the size of : If is large enough, then powerful tests exist, and if is too small, then no test has high power. Hence, we aim to find a minimal family of thresholds , such that powerful detection at a controlled error rate is still possible. Vice versa, such a minimal family would determine which signals can not be detected reliably, even when they are present.

To this end, we extend the existing theory on minimax signals detection in inverse problems focusing on localized signals and linear combinations of localized signals, which are common in practice. This has, to the best of our knowledge, not been investigated yet. We present upper bounds, lower bounds and asymptotics for the minimal values of such that powerful tests for testing problems given by (1.2) exist. They depend on the difficulty of the inverse problem induced by the forward operator , the cardinality of (denoted by ) and the inner products between the images , , of the potential anomalies. We stress that our results can be applied to a variety of dictionaries , such as wavelets, whereas previous results were restricted to dictionaries based on the SVD of the operator .

Figure 1 serves as an illustrative example. If it is known a priori, that the anomaly which distorts the reference image is a linear combination of a certain collection of wavelets (see the discussion in Sections 3.2.3 and 4 for details), then our results suggest that the anomaly that is present in 0(d) is large enough, such that there is a test which is able distinguish the distorted (0(d)) from the undistorted image (0(a)) with type I and type II error both at most , based on the measurements 0(f) (see Theorem 3.9). Note that our results are not restricted to wavelets. In fact, most of our results are applicable under very mild conditions on the dictionary .

We stress that this paper does not constitute an exhaustive study of the subject. Rather, we aim to provide some first analysis and discuss some illustrative examples.

1.2 Connection to existing literature

As the literature on estimating in model (1.1) is vast (see e.g. [8], [11], [4], [6], [1], [2], [30], [7]), we confine ourselves to briefly reviewing the literature on (minimax) testing theory, the topic of the present paper.

First of all, there is extensive literate about minimax signal detection for the direct problem, i.e. when and is the identity. We only mention the seminal works [14] and [16]. Usually, the hypothesis “” is tested against alternatives of the form “”, where is a certain class of functions, for example defined by certain smoothness conditions. The indirect case where is allowed to differ from the identity has e.g. been treated in [19], [15], [13], [24], [3].

Note, that our testing problem (1.2) has an alternative which is substantially different to testing against a smoothness condition with sufficiently large norm. Our approach is different, as instead of e.g. smoothness constraints, expressed through , we consider the alternative that is an element of a very specific set of candidate functions. We refer to [10] and [9], where systems of scaled and translated rectangle functions (bumps) in a direct setting were considered.

Finally, we want to highlight [20] explicitly as they consider alternatives consisting of linear combinations of anomalies given in terms of the SVD of the operator , which served as a point of reference and inspiration to parts of this study.

1.3 Outline

We start by giving a detailed description about our model and some basic facts about testing and minimax signal detection in section 2. Section 3 contains the main results: In section 3.1 we assume that is a collection of frame elements, and in section 3.2 we assume that contains functions in the linear span of a collection of frame elements. Both sections also include discussions about conditions that frames need to satisfy for our results to be applicable. We present illustrative simulation studies in section 4. All proofs are postponed to section 6.

2 Preliminaries

2.1 Detailed model assumptions

The model (1.1) has to be understood in a weak sense, i.e.

(2.1)

The error is a Gaussian white noise on :

  • If and are real Hilbert spaces, we suppose that , for some some probability space , is a linear mapping satisfying and for all .

  • If and are complex Hilbert spaces, instead we suppose that and . Here means that

    is distributed according to the standard complex normal distribution, i.e.

    , where .

We will use the notation for convenience.

2.2 Notation

For a complex number , we denote its real and imaginary part by and , repectively.
For two families , of non-negative real numbers we write if there exists such that whenever . Analogously, we write if there exists such that whenever . If , we write , and if , we write .

2.3 Testing and distinguishability

In the above testing problem (1.2), we wish to test the hypothesis against the alternative , which means making an educated guess (based on the data) about the correctness of the hypothesis when compared to the alternative, while keeping the error of wrongly deciding against under control. Tests are based on test statistics, i.e. measurable functions of the data

. We suppose that any test statistics can be expressed in terms of the Gaussian sequence

given by

(2.2)

where is a basis of the Hilbert space , and, consequently, (in the real case) or (in the complex case) for . In the following, we use the notation interchangebly for either the random process given by (2.1) or the random sequence given by (2.2), since they are equivalent in terms of the data they provide.

A test for the testing problem (1.2) can now be viewed as a measurable function of the sequence given by

where is either or . The test can be understood as a decision rule in the following sense: If , the hypothesis is accepted. If , the hypothesis is rejected in favor of the alternative.

If is true, i.e. , but , we call this a type I error

(the hypothesis is rejected although it is true). The probability to make a type I error is

where denotes the distribution of given that is true. Likewise, the alternative might be true, but . We call this a type II error (the hypothesis is accepted although the alternative is true). Let us, for simplicity, introduce the notation . The type II error probabilty, given that a specific is the true signal, is denoted as

where denotes the distribution of given that is the true underlying signal. Since the alternative is – in general – composite, i.e. does not only consist of only one element, the type II error probability will in general depend on the element . For such composite alternatives we consider the worst case error given by the maximum type II error probability over for our analysis.

We say that the hypothesis is asymptotically distinguishable (in the minimax sense) from the family of alternatives when there exist tests for the testing problems “ against ”, , that have both small type I and small maximum type II error probabilities. We define

where is the set of all tests for the testing problem “ against ”. In terms of we say that and are distinguishable if , as . If , we say that they are indistinguishable. We refer to [15] for an in-depth treatment.

For prescribed families , we are interested in determining the smalles possible values , such that and are still asymptotically distinguishable, if possible. Suppose that there exist two families and , that satisfy

as . If, additionally, , then we call a family that satisfies the (asymptotic) minimax detection boundary. We may say that separates detectable and undetectable signals.

It is, however, not always possible to find such a sharp threshold. If the family only satisfies the weaker conditions

we call it the separation rate of the family of testing problems “ against ”.

Remark:

Although we are mostly interested in the asymptotics of the problem, we will also state non-asymptotic results, which we deem interesting.

3 Results

Throughout the rest of the paper, we will assume that is a countable collection of functions in , and is a family of finite subsets of .

3.1 Alternatives given by finite collections of functions

We first suppose that consists of the appropriately normalized functions , , i.e. . As above, we write , so that testing problem (1.2) can be written as

(3.1)

3.1.1 An upper bound for the detection boundary

Any family of tests for the family of testing problems (3.1) yields an upper bound for . It seems natural to choose maximum likelihood type tests as candidates, which are given by

(3.2)

for a given significance level , and for appropriately chosen thresholds (which depend on whether the spaces and are real or complex Hilbert spaces).

Theorem 3.1.

Let and assume that , as . In addition, assume that

where and as . Then and thus, .

The bound given in Theorem 3.1 does not depend on and it depends on set of anomalies and the family of candidate indices only through the cardinality . Thus, Theorem 3.1 has the advantage that it is (almost) always applicable, but it might be not very well suited for specific applications. We will see examples, where the bound is essentially sharp, and an example, where it is basically useless.

3.1.2 A lower bound for

Theorem 3.2.

Let

and assume that . In addition, assume that

(3.3)

where is a family of positive real numbers such that and as . Then and thus, .

This theorem can be proven by using Proposition 4.10 and Lemma 7.2 of [16]

. However, we will provide a self-contained and simple proof employing a weak law of large numbers for dependent random variables in section

6.

Theorem 3.2 implies that the number of negatively correlated image elements is the relevant quantity which determines the difficulty of testing (1.2). The actual cardinalty plays no role in the lower bound (3.3).

3.1.3 The detection boundary

As a consequence, we are now in position to describe the asymptotic detection boundary precisely in several situations. First, a combination of the previous theorems yields the following:

Corollary 3.3.

Assume that , and let

and assume that for a family that satifies and as . Then .

In particular, Corollary 3.3 yields the asymptotic detection boundary, when is orthogonal. Note that the assumptions of Corollary 3.3 are satisfied when is constant as . This has several applications, as we will see e.g. in Section 3.1.5.

Assume now that the operator

is compact and has a singular value decomposition given by orthonormal systems

and in and

, respectively, and singular values

.

Corollary 3.4.

Let and and for , and let be any family of finite subsets of , such that , as . Then .

Remark:

The detection thresholds for the SVD are clearly very easy to find, and could be deduced from other known reults (see [15] for example). We include it here, since, as far as we know, it has not been stated explicitely before.

3.1.4 Frame decompositions

We have seen that sharp detection thresholds for the SVD can easily be found, but this does (usually) not cover the situation when we are interested in local anomalies. We will thus focus on other options for anomaly systems, particularly frames, for which be briefly introduce the most important notation. Let be a separable Hilbert space, and let be a countable index set. A sequence is called a frame of if there exist constants , such that for any

Since frames not have to be orthonormal, they provide great flexibility. Theorems 3.2 and 3.1 clearly apply to testing (1.2) with , however, the fact that constitutes a frame is, on its own, not enough to guarantee that we obtain a sharp detection boundary from Corollary 3.3.

In the following we show how frames can be constructed, for which Corollary 3.3 can be applied. The idea is as follows: Since the bounds for the detection threshold mostly depend on properties of the images in , we will simply start by defining a frame in that will guarantee that the needed properties are satisfied, and then construct the corresponding frame in , such that the pair , is a decomposition of the operator , and such that the assumptions of Corollary 3.3 are satisfied for any family of subsets .

Assumption 3.5.
  • There is a dense subspace with inner product and norm , and constants , such that

    (3.4)

    for all .

  • There is a frame of and a sequence of real numbers with , and constants , such that

    for all .

Assumption 3.5 implies that as an operator from to is invertible. Now let be a frame of as in (ii). We apply the Gram-Schmidt procedure with respect to the inner product to . This results in a sequence , which is a frame in and which is orthogonal with respect to . Now we define

for . The system clearly yields sharp detection thresholds, as for any subset it holds that by construction. Furthermore, it is a frame in , since for

and

As a consequence we obain the following.

Theorem 3.6.

Suppose that Assumption (3.5) is satisfied. Then for any frame of , constructed as above, and for any family of subsets of indices with as , we have .

3.1.5 Examples

We discuss several commonly used operators and present a few typical examples of collections , for which the above theorems may or may not apply.

Integration

Let and let be the linear Fredholm integral operator given by

for . Suppose that is a (mother) wavelet in , that satisfies , and for which the collection given by

forms an orthogonal frame of . For an in-depth treatment of wavelet theory, we refer to [23] or [5].

Let us suppose that the system of possible anomalies is given by this wavelet system, i.e. we consider with . Assume further that is compactly supported with support size , which implies that for any pair of indices the number of indices , such that is at most .

Since, in practical applications, we would not expect to be able to obtain obervations on the whole plane , we suppose that an anomaly, if one exists, must lie within some compact subset of , e.g. the unit interval . For some family of integers that satisfies as we define the family of “candidate” indices by

(3.5)

Note that . Since , it follows that for any , the number of indices such that is bounded by . Thus, the number of indices such that is also bounded by . This means that and . Consequently, the conditions of Theorem 3.3 are satisfied, and it follows that, in this case, .

Periodic convolution

Let be a -periodic and continuously differentiable function, and let be the integral operator given by

The system , where , is a Hilbert basis of , which consists of singular functions of , since . Thus, Corollary 3.4 yields the detection threshold for the detection of anomalies given by .

Let us now try to come up with another system of possible anomalies. Motivated by the previous example, let

be a system of compactly supported wavelets with one vanishing moment (i.e.

) forming an orthonormal frame of . We define the periodic wavelets for . The system given by for then forms an orthonormal frame of . Let be a family of integers that satisfies as and set

Lemma 3.7.

In the above setting, with defined as above, if as , then .

This means that in this case, Theorem 3.2 cannot be applied and the upper bound from Theorem 3.1 is basically useless. We see that in this case the number of anomalies has no direct influence on the detection limit.

Radon transform

Let us finally discuss the example of computerized tomography already mentioned in the introduction. Here, we restrict ourselves to spatial dimension , in order to ease readibility. We stress, however, that all subsequent results can be extended to any dimensions. Mathematically, this is modelled by the integral operator , where and , given by

known as the Radon transform. The singular system of is analytically known (see [26]). Let . We define functions , by

where are the Jacobi polynomials uniquely determined by the equations . The system is an orthonormal basis of and, together with the appropriate basis and constants forms the SVD of the Radon transform . Thus, Corollary 3.4 yields the detection thresholds for the system .

However, the discussion in Section 3.1.4 gives rise to another option to choose systems of anomalies that attain the same detection boundaries. For we define the usual Sobolev space

where , and set (in the notation of [26])

In addition, let

where

The Radon transform is an operator from to that satisfies (see Theorem 5.1 of [26])

for any . Thus, Theorem 3.6 can be applied. The range of in is . Thus, any orthonormal frame of gives rise to a frame of with sharp detection boundaries given by Theorem 3.6.

3.2 Alternatives given by the linear span of collections of anomalies

Assume now that possibles anomalies might be linear combinations of the , . For the upcoming analysis it is necessary to assume that the satisfy the following.

Assumption 3.8.

There is a collection of functions in , and a sequence of non-zero complex numbers, such that for any it holds that

Assumption 3.8 guarantees that we can present our results in terms of the . Clearly, it is satisfied, when for all . In addition, if we were to assume that the collections and have some kind of useful structure (we may for example assume that they constitute frames of and , respectively, as we did in Subsection 3.1.4), then the sequence from Assumption 3.8 takes the role of what might be called quasi-singular values.

In this section, we suppose that consists of functions in the linear span of the functions , , namely . Thus, testing problem 1.2 becomes

(3.6)

where

for some family of positive real numbers (we use the notation instead of to avoid confusion with the results from the previous section).

3.2.1 Nonasymptotic results

For a subset , we define the matrix by , , and the matrix by , , where

for . We denote the Frobenius norm of a matrix by .

The next theorem (the nonasymptotic upper bound for the detection threshold) can not be given in terms of the minimax sum of errors . Instead we define

where is the set of all level tests for the testing problem . In other words, we consider the minimax sum of errors when only level tests are allowed.

Theorem 3.9.

Suppose that Assumption 3.8 holds. Assume that the family of subsets is such that the matrices are positive definite for all . Then, for any and , we have if

where , and is given by if and are real Hilbert spaces and if and are complex Hilbert spaces.

It is now obvious why it is necessary to allow only tests at a prescribed level . Making arbitrarily small would require the detection threshold to become arbitrarily large in order to keep the type II error small.

Contrary to the upper bound, the nonasymptotic lower bound for the detection threshold can be stated in terms of .

Theorem 3.10.

Suppose that Assumption 3.8 holds, and assume that the family of subsets is such that the matrices are positive definite for all . Then, for any , we have if

where .

Remark 1:

The assumption that and , respectively, are positive definite (and consequently invertible, since they are Hermitian) is a technical necessity. However, it is also intuitively justified, because it prevents certain “unreasonable” choices of (for example any subset such that is linearly dependent).

Remark 2:

Note that it can be easily seen that, if we redefine to

then we would obtain the same bounds as above with replaced by the matrix , which is given by , and replaced by the matrix given by It follows, that our results are compatible with the results obtained in [20], where the above testing problem was considered when the system is given by the SVD of .

3.2.2 Asymptotic results

The asymptotic results for this section can now be easily deduced from the previuos theorems.

Corollary 3.11.

Suppose that the assumptions of Theorems 3.9 and 3.10 hold.

  • and are asymptotically distinguishable if