In many practical applications one aims to infer on properties of a quantity which is not directly observable. As a guiding example, consider computerized tomography (CT), where the interior (more precisely the tissue density) of the human body is imaged via the absorption of X-rays along straight lines. Mathematically, the relation between the available measurements (absorption along lines, the so-called sinogram) and the unknown quantity of interest (the tissue density) is described by the Radon transform, which is an integral operator to be described in more detail later (cf. Figure 1
for illustration). Potential further applications include astronomical image processing, magnetic resonance imaging, non-destructive testing and super-resolution microscopy, to mention a few. Typically, the measurements are either of random nature themselves (as e.g. in positron emission tomography (PET, see), magnetic resonance imaging (MRI, see ) or super-resolution microscopy (see )) and/or additionally corrupted by measurement noise. This motivates us to consider the inverse Gaussian white noise model
with a (known) bounded linear operator mapping between (real or complex) Hilbert spaces and , noise level and a Gaussian white noise on (details will be given in section 2).
A major effort of research is devoted to the development and analysis of estimation and recovery methods of the signalfrom the measurements (see Section 1.2 for some references). However, when is expected to be very close to some reference , by which we mean that either or deviates from by only a few localized components (anomalies), then instead of full recovery of , one might be more interested in testing whether or not. This is especially relevant, since, when the signal-to-noise level is too small for full recovery, then testing may still be informative as it is well-known to be a simpler task (see e.g.  and the references therein). Although of practical importance, testing in model (1.1) is a much less investigated endeavor than estimation and a full theoretical understanding has not been achieved yet. Hence, in this paper, we are interested in analyzing such testing methodology for inferring on based on the available data . Note that, due to the linearity of the model (1.1), we can w.l.o.g. assume that . Thus, we suppose that either (no anomaly is present) or (an anomaly given by is present), where for some (finite) class of non-zero functions, that are – in some sense – normalized, and the constant factor describes its orientation, and – more importantly – how “large” or “pronounced” the signal is. We consider the family of testing problems
where is a family of non-negative real numbers. This can be viewed as the problem of detecting an anomaly from the set .
We suppose that the family of classes is chosen in advance. This choice is crucial for the analysis of the problem and it depends solely on the specific application: For CT we might think of small inclusions such as tumors, cf. Figure 1, where certain waveletes are used as mathematical representation. If no a priori knowledge about potential anomalies is known, it is natural to start by considering dictionaries with good expressibility in , e.g. frames or wavelets, and set for subsets of . The particular choices that we analyze in this paper will be built from such dictionaries, see also  and  for recent references in the context of estimation.
) with type I and type II error both at most, based on the measurements 0(f).
1.1 Aim of the paper
Given a family of classes , our main objective will be to assess to what extent powerful tests for the testing problem (1.2) exist. The answer will usually depend on the size of : If is large enough, then powerful tests exist, and if is too small, then no test has high power. Hence, we aim to find a minimal family of thresholds , such that powerful detection at a controlled error rate is still possible. Vice versa, such a minimal family would determine which signals can not be detected reliably, even when they are present.
To this end, we extend the existing theory on minimax signals detection in inverse problems focusing on localized signals and linear combinations of localized signals, which are common in practice. This has, to the best of our knowledge, not been investigated yet. We present upper bounds, lower bounds and asymptotics for the minimal values of such that powerful tests for testing problems given by (1.2) exist. They depend on the difficulty of the inverse problem induced by the forward operator , the cardinality of (denoted by ) and the inner products between the images , , of the potential anomalies. We stress that our results can be applied to a variety of dictionaries , such as wavelets, whereas previous results were restricted to dictionaries based on the SVD of the operator .
Figure 1 serves as an illustrative example. If it is known a priori, that the anomaly which distorts the reference image is a linear combination of a certain collection of wavelets (see the discussion in Sections 3.2.3 and 4 for details), then our results suggest that the anomaly that is present in 0(d) is large enough, such that there is a test which is able distinguish the distorted (0(d)) from the undistorted image (0(a)) with type I and type II error both at most , based on the measurements 0(f) (see Theorem 3.9). Note that our results are not restricted to wavelets. In fact, most of our results are applicable under very mild conditions on the dictionary .
We stress that this paper does not constitute an exhaustive study of the subject. Rather, we aim to provide some first analysis and discuss some illustrative examples.
1.2 Connection to existing literature
As the literature on estimating in model (1.1) is vast (see e.g. , , , , , , , ), we confine ourselves to briefly reviewing the literature on (minimax) testing theory, the topic of the present paper.
First of all, there is extensive literate about minimax signal detection for the direct problem, i.e. when and is the identity. We only mention the seminal works  and . Usually, the hypothesis “” is tested against alternatives of the form “”, where is a certain class of functions, for example defined by certain smoothness conditions. The indirect case where is allowed to differ from the identity has e.g. been treated in , , , , .
Note, that our testing problem (1.2) has an alternative which is substantially different to testing against a smoothness condition with sufficiently large norm. Our approach is different, as instead of e.g. smoothness constraints, expressed through , we consider the alternative that is an element of a very specific set of candidate functions. We refer to  and , where systems of scaled and translated rectangle functions (bumps) in a direct setting were considered.
Finally, we want to highlight  explicitly as they consider alternatives consisting of linear combinations of anomalies given in terms of the SVD of the operator , which served as a point of reference and inspiration to parts of this study.
We start by giving a detailed description about our model and some basic facts about testing and minimax signal detection in section 2. Section 3 contains the main results: In section 3.1 we assume that is a collection of frame elements, and in section 3.2 we assume that contains functions in the linear span of a collection of frame elements. Both sections also include discussions about conditions that frames need to satisfy for our results to be applicable. We present illustrative simulation studies in section 4. All proofs are postponed to section 6.
2.1 Detailed model assumptions
The model (1.1) has to be understood in a weak sense, i.e.
The error is a Gaussian white noise on :
If and are real Hilbert spaces, we suppose that , for some some probability space , is a linear mapping satisfying and for all .
If and are complex Hilbert spaces, instead we suppose that and . Here means that
is distributed according to the standard complex normal distribution, i.e., where .
We will use the notation for convenience.
For a complex number , we denote its real and imaginary part by and , repectively.
For two families , of non-negative real numbers we write if there exists such that whenever . Analogously, we write if there exists such that whenever . If , we write , and if , we write .
2.3 Testing and distinguishability
In the above testing problem (1.2), we wish to test the hypothesis against the alternative , which means making an educated guess (based on the data) about the correctness of the hypothesis when compared to the alternative, while keeping the error of wrongly deciding against under control. Tests are based on test statistics, i.e. measurable functions of the data
. We suppose that any test statistics can be expressed in terms of the Gaussian sequencegiven by
where is a basis of the Hilbert space , and, consequently, (in the real case) or (in the complex case) for . In the following, we use the notation interchangebly for either the random process given by (2.1) or the random sequence given by (2.2), since they are equivalent in terms of the data they provide.
A test for the testing problem (1.2) can now be viewed as a measurable function of the sequence given by
where is either or . The test can be understood as a decision rule in the following sense: If , the hypothesis is accepted. If , the hypothesis is rejected in favor of the alternative.
If is true, i.e. , but , we call this a type I error
(the hypothesis is rejected although it is true). The probability to make a type I error is
where denotes the distribution of given that is true. Likewise, the alternative might be true, but . We call this a type II error (the hypothesis is accepted although the alternative is true). Let us, for simplicity, introduce the notation . The type II error probabilty, given that a specific is the true signal, is denoted as
where denotes the distribution of given that is the true underlying signal. Since the alternative is – in general – composite, i.e. does not only consist of only one element, the type II error probability will in general depend on the element . For such composite alternatives we consider the worst case error given by the maximum type II error probability over for our analysis.
We say that the hypothesis is asymptotically distinguishable (in the minimax sense) from the family of alternatives when there exist tests for the testing problems “ against ”, , that have both small type I and small maximum type II error probabilities. We define
where is the set of all tests for the testing problem “ against ”. In terms of we say that and are distinguishable if , as . If , we say that they are indistinguishable. We refer to  for an in-depth treatment.
For prescribed families , we are interested in determining the smalles possible values , such that and are still asymptotically distinguishable, if possible. Suppose that there exist two families and , that satisfy
as . If, additionally, , then we call a family that satisfies the (asymptotic) minimax detection boundary. We may say that separates detectable and undetectable signals.
It is, however, not always possible to find such a sharp threshold. If the family only satisfies the weaker conditions
we call it the separation rate of the family of testing problems “ against ”.
Although we are mostly interested in the asymptotics of the problem, we will also state non-asymptotic results, which we deem interesting.
Throughout the rest of the paper, we will assume that is a countable collection of functions in , and is a family of finite subsets of .
3.1 Alternatives given by finite collections of functions
We first suppose that consists of the appropriately normalized functions , , i.e. . As above, we write , so that testing problem (1.2) can be written as
3.1.1 An upper bound for the detection boundary
Any family of tests for the family of testing problems (3.1) yields an upper bound for . It seems natural to choose maximum likelihood type tests as candidates, which are given by
for a given significance level , and for appropriately chosen thresholds (which depend on whether the spaces and are real or complex Hilbert spaces).
Let and assume that , as . In addition, assume that
where and as . Then and thus, .
The bound given in Theorem 3.1 does not depend on and it depends on set of anomalies and the family of candidate indices only through the cardinality . Thus, Theorem 3.1 has the advantage that it is (almost) always applicable, but it might be not very well suited for specific applications. We will see examples, where the bound is essentially sharp, and an example, where it is basically useless.
3.1.2 A lower bound for
and assume that . In addition, assume that
where is a family of positive real numbers such that and as . Then and thus, .
This theorem can be proven by using Proposition 4.10 and Lemma 7.2 of 6.
3.1.3 The detection boundary
As a consequence, we are now in position to describe the asymptotic detection boundary precisely in several situations. First, a combination of the previous theorems yields the following:
Assume that , and let
and assume that for a family that satifies and as . Then .
In particular, Corollary 3.3 yields the asymptotic detection boundary, when is orthogonal. Note that the assumptions of Corollary 3.3 are satisfied when is constant as . This has several applications, as we will see e.g. in Section 3.1.5.
Assume now that the operator
is compact and has a singular value decomposition given by orthonormal systemsand in and
, respectively, and singular values.
Let and and for , and let be any family of finite subsets of , such that , as . Then .
The detection thresholds for the SVD are clearly very easy to find, and could be deduced from other known reults (see  for example). We include it here, since, as far as we know, it has not been stated explicitely before.
3.1.4 Frame decompositions
We have seen that sharp detection thresholds for the SVD can easily be found, but this does (usually) not cover the situation when we are interested in local anomalies. We will thus focus on other options for anomaly systems, particularly frames, for which be briefly introduce the most important notation. Let be a separable Hilbert space, and let be a countable index set. A sequence is called a frame of if there exist constants , such that for any
Since frames not have to be orthonormal, they provide great flexibility. Theorems 3.2 and 3.1 clearly apply to testing (1.2) with , however, the fact that constitutes a frame is, on its own, not enough to guarantee that we obtain a sharp detection boundary from Corollary 3.3.
In the following we show how frames can be constructed, for which Corollary 3.3 can be applied. The idea is as follows: Since the bounds for the detection threshold mostly depend on properties of the images in , we will simply start by defining a frame in that will guarantee that the needed properties are satisfied, and then construct the corresponding frame in , such that the pair , is a decomposition of the operator , and such that the assumptions of Corollary 3.3 are satisfied for any family of subsets .
There is a dense subspace with inner product and norm , and constants , such that
for all .
There is a frame of and a sequence of real numbers with , and constants , such that
for all .
Assumption 3.5 implies that as an operator from to is invertible. Now let be a frame of as in (ii). We apply the Gram-Schmidt procedure with respect to the inner product to . This results in a sequence , which is a frame in and which is orthogonal with respect to . Now we define
for . The system clearly yields sharp detection thresholds, as for any subset it holds that by construction. Furthermore, it is a frame in , since for
As a consequence we obain the following.
Suppose that Assumption (3.5) is satisfied. Then for any frame of , constructed as above, and for any family of subsets of indices with as , we have .
We discuss several commonly used operators and present a few typical examples of collections , for which the above theorems may or may not apply.
Let and let be the linear Fredholm integral operator given by
for . Suppose that is a (mother) wavelet in , that satisfies , and for which the collection given by
Let us suppose that the system of possible anomalies is given by this wavelet system, i.e. we consider with . Assume further that is compactly supported with support size , which implies that for any pair of indices the number of indices , such that is at most .
Since, in practical applications, we would not expect to be able to obtain obervations on the whole plane , we suppose that an anomaly, if one exists, must lie within some compact subset of , e.g. the unit interval . For some family of integers that satisfies as we define the family of “candidate” indices by
Note that . Since , it follows that for any , the number of indices such that is bounded by . Thus, the number of indices such that is also bounded by . This means that and . Consequently, the conditions of Theorem 3.3 are satisfied, and it follows that, in this case, .
Let be a -periodic and continuously differentiable function, and let be the integral operator given by
The system , where , is a Hilbert basis of , which consists of singular functions of , since . Thus, Corollary 3.4 yields the detection threshold for the detection of anomalies given by .
Let us now try to come up with another system of possible anomalies. Motivated by the previous example, let
be a system of compactly supported wavelets with one vanishing moment (i.e.) forming an orthonormal frame of . We define the periodic wavelets for . The system given by for then forms an orthonormal frame of . Let be a family of integers that satisfies as and set
In the above setting, with defined as above, if as , then .
Let us finally discuss the example of computerized tomography already mentioned in the introduction. Here, we restrict ourselves to spatial dimension , in order to ease readibility. We stress, however, that all subsequent results can be extended to any dimensions. Mathematically, this is modelled by the integral operator , where and , given by
known as the Radon transform. The singular system of is analytically known (see ). Let . We define functions , by
where are the Jacobi polynomials uniquely determined by the equations . The system is an orthonormal basis of and, together with the appropriate basis and constants forms the SVD of the Radon transform . Thus, Corollary 3.4 yields the detection thresholds for the system .
However, the discussion in Section 3.1.4 gives rise to another option to choose systems of anomalies that attain the same detection boundaries. For we define the usual Sobolev space
where , and set (in the notation of )
In addition, let
The Radon transform is an operator from to that satisfies (see Theorem 5.1 of )
3.2 Alternatives given by the linear span of collections of anomalies
Assume now that possibles anomalies might be linear combinations of the , . For the upcoming analysis it is necessary to assume that the satisfy the following.
There is a collection of functions in , and a sequence of non-zero complex numbers, such that for any it holds that
Assumption 3.8 guarantees that we can present our results in terms of the . Clearly, it is satisfied, when for all . In addition, if we were to assume that the collections and have some kind of useful structure (we may for example assume that they constitute frames of and , respectively, as we did in Subsection 3.1.4), then the sequence from Assumption 3.8 takes the role of what might be called quasi-singular values.
In this section, we suppose that consists of functions in the linear span of the functions , , namely . Thus, testing problem 1.2 becomes
for some family of positive real numbers (we use the notation instead of to avoid confusion with the results from the previous section).
3.2.1 Nonasymptotic results
For a subset , we define the matrix by , , and the matrix by , , where
for . We denote the Frobenius norm of a matrix by .
The next theorem (the nonasymptotic upper bound for the detection threshold) can not be given in terms of the minimax sum of errors . Instead we define
where is the set of all level tests for the testing problem . In other words, we consider the minimax sum of errors when only level tests are allowed.
Suppose that Assumption 3.8 holds. Assume that the family of subsets is such that the matrices are positive definite for all . Then, for any and , we have if
where , and is given by if and are real Hilbert spaces and if and are complex Hilbert spaces.
It is now obvious why it is necessary to allow only tests at a prescribed level . Making arbitrarily small would require the detection threshold to become arbitrarily large in order to keep the type II error small.
Contrary to the upper bound, the nonasymptotic lower bound for the detection threshold can be stated in terms of .
Suppose that Assumption 3.8 holds, and assume that the family of subsets is such that the matrices are positive definite for all . Then, for any , we have if
The assumption that and , respectively, are positive definite (and consequently invertible, since they are Hermitian) is a technical necessity. However, it is also intuitively justified, because it prevents certain “unreasonable” choices of (for example any subset such that is linearly dependent).
Note that it can be easily seen that, if we redefine to
then we would obtain the same bounds as above with replaced by the matrix , which is given by , and replaced by the matrix given by It follows, that our results are compatible with the results obtained in , where the above testing problem was considered when the system is given by the SVD of .
3.2.2 Asymptotic results
The asymptotic results for this section can now be easily deduced from the previuos theorems.