Let be a linear injective operator between the infinite-dimensional Hilbert spaces and with the inner products and , respectively. Let be the space of functions between a Polish space and a real separable Hilbert space . Here we study the linear ill-posed operator problems governed by the operator equation
We observe noisy values of
at some points, and the foremost objective is to estimate the true solution. The problem of interest can be described as follows: Given data under the model
where is the observational noise, and denotes the sample size, determine (approximately) the underlying element with being the regression function.
For classical inverse problems, the observational noise is assumed to be deterministic. Here we assume that the random observations
are independent and follow some unknown probability distribution, defined on the sample space , and hence we are in the context of statistical inverse problems.
The reconstruction of the unknown true solution will be based on spectral regularization schemes. Various schemes can be used to stably estimate the true solution. Tikhonov regularization is widely-considered in the literature. This scheme consists of the error term measuring the fitness of the data and a penalty term, controlling the complexity of the reconstruction. In this study we enforce smoothness of the approximated solution by introducing an unbounded, linear, self-adjoint, strictly positive operator with a dense domain of definition , and then we define Tikhonov regularization scheme in Hilbert scales as follows:
where is a positive regularization parameter and the operator influences the properties of the approximated solution. Standard Tikhonov regularization corresponds to , the identity mapping. In many practical problems, the operator is chosen to be a differential operator in some appropriate function spaces, e.g., -spaces.
Also, the Tikhonov minimization problem would reduce to the standard one
albeit for a different operator . Accordingly, the error bounds relate as
Therefore, error bounds for in the weak norm, in , yield bounds for . The latter bounds are not known from previous studies. Also, we are interested in the oversmoothing case, when , such that we provide a detailed error analysis, here. However, the above relation will implicitly be utilized in the subsequent proofs.
We review literature related to the considered problem. Regularization schemes in Hilbert scales are widely considered in classical inverse problems (with deterministic noise), starting from F. Natterer , and continued in [9, 18, 20, 21, 23, 24, 25, 27, 31]. G. Blanchard and N. Mücke  considered general regularization schemes for linear inverse problems in statistical learning and provided (upper and lower) rates of convergence under Hölder type source conditions. Here we consider general (spectral) regularization schemes in Hilbert scales for the statistical inverse problems. We discuss rates of convergence for general regularization under certain noise conditions, approximate source conditions, and a specific link condition between the operators , governing the equation (1), and the smoothness promoting operator as used e.g. in (3). We study error estimates by using the concept of reproducing kernel Hilbert spaces. The concept of the effective dimension plays an important role in the convergence analysis.
The key-points in our results can be described as follows:
We consider general regularization schemes in Hilbert scales. It is well-known that Tikhonov regularization suffers the saturation effect. On the contrary, this saturation is delayed for Tikhonov regularization in Hilbert scales.
The analysis uses the concept of link conditions, see Assumption 4, required to transfer information in terms of properties of the operator to the covariance operator.
We analyze the regular case, i.e., when the true solution belongs to the domain of operator .
We also focus on the oversmoothing case, when the true solution does not belong to the domain of operator .
The paper is organized as follows. The basic definitions, assumptions, and notation required in our analysis are presented in Section 2. In Section 3 we discuss the bounds of the reconstruction error in the direct learning setting and inverse problem setting by means of distance functions. This section comprises of two main results: The first result is devoted to convergence rates in the oversmoothing case, while the second result focuses on the regular case. When specifying smoothness in terms of source conditions we can bound the distance functions, and this gives rise to convergence rates in terms of the sample size . This program is performed in Section 4. In case that both, the smoothness as well as the link condition are of power type we establish the optimality of the obtained error bounds in the regular case in Section 5. In the Appendix, we present probabilistic estimates which provide the tools to obtain the error bounds.
2. Notation and Assumptions
In this section, we introduce some basic concepts, definitions, notation, and assumptions required in our analysis.
We assume that is a Polish space, therefore the probability distribution allows for a disintegration as
is the conditional probability distribution ofgiven , and is the marginal probability distribution. We consider random observations which follow the model with centered noise . We assume throughout the paper that the operator is injective.
Assumption 1 (The true solution).
The conditional expectation w.r.t. of given exists (a.s.), and there exists such that
The element is the true solution which we aim at estimating.
Assumption 2 (Noise condition).
There exist some constants such that for almost all ,
This assumption is usually referred to as a Bernstein-type assumption.
We return to the unbounded operator . By spectral theory, the operator is well-defined for , and the spaces equipped with the inner product are Hilbert spaces. For , the space is defined as completion of under the norm . The space is called the Hilbert scale induced by
. The following interpolation inequality is an important tool for the analysis:
which holds for any [11, Chapt. 8].
2.1. Reproducing Kernel Hilbert space and related operators
We start with the concept of reproducing kernel Hilbert spaces. It is a subspace of (the space of square-integrable functions from to with respect to the probability distribution
) which can be characterized by a symmetric, positive semidefinite kernel and each of its functions satisfies the reproducing property. Here we discuss the vector-valued reproducing kernel Hilbert spaces, following, which are the generalization of real-valued reproducing kernel Hilbert spaces .
Definition 2.1 (Vector-valued reproducing kernel Hilbert space).
For a non-empty set and a real separable Hilbert space , a Hilbert space of functions from to is said to be the vector-valued reproducing kernel Hilbert space, if the linear functional , defined by
is continuous for every and .
Definition 2.2 (Operator-valued positive semi-definite kernel).
Suppose is the Banach space of bounded linear operators. A function is said to be an operator-valued positive semi-definite kernel if
For a given operator-valued positive semi-definite kernel , we can construct a unique vector-valued reproducing kernel Hilbert space of functions from to as follows:
We define the linear function
where for and .
The span of the set is dense in .
in other words .
Moreover, there is a one-to-one correspondence between operator-valued positive semi-definite kernels and vector-valued reproducing kernel Hilbert spaces, see .
We assume the following assumption concerning the Hilbert space :
The space is assumed to be a vector-valued reproducing kernel Hilbert space of functions corresponding to the kernel such that
is a Hilbert-Schmidt operator for with
For , the real-valued function is measurable.
In case that the set is a bounded subset of then the reproducing kernel Hilbert space becomes real-valued reproducing kernel Hilbert space. The corresponding kernel becomes the symmetric, positive semi-definite with the reproducing property . Also, in this case the Assumption 3 simplifies to the condition that the kernel is measurable and .
Now we introduce some relevant operators used in the convergence analysis. We introduce the notation for the vectors , , . The product Hilbert space is equipped with the inner product and the corresponding norm . We define the sampling operator , then the adjoint is given by
Let denotes the canonical injection map . Then we observe that, under Assumption 3, both the operators and are bounded by , since
We denote the population operators , , , and their empirical versions , , . The operators , , , are positive, self-adjoint and depend on the kernel. Under Assumption 3, the operators , are bounded by and the operators , are bounded by for , i.e., , , and .
2.2. Link condition
In the subsequent analysis, we shall derive convergence rates by using approximate source conditions, which are related to a certain benchmark smoothness. This benchmark smoothness is determined by the user. In order to have handy arguments to derive the convergence rates, we shall fix an (integer) power . We shall use a link condition to transfer smoothness in terms of the operator L to the covariance operator . This link condition will involve an index function.
Definition 2.4 (Index function).
A function is said to be an index function if it is continuous and strictly increasing with .
An index function is called sub-linear whenever the mapping is nondecreasing. Further, we require this index function to belong to the following class of functions.
The representation is not unique, therefore can be assumed as a Lipschitz function with Lipschitz constant . Now we phrase an important result, needed in our analysis [28, Corollary 1.2.2]:
The polynomial function , and the logarithm function are examples of functions in the class .
Assumption 4 (link condition).
There exist a power and an index function , for which the function is sub-linear. There are constants such that
The function belongs to the class .
As shown in , Assumption 4 implies the range identity . In the context of a comparison of operators we mention the well-known Heinz Inequality, see [11, Prop. 8.21], which asserts that a comparison , for non-negative self-adjoint operators yields for every exponent that . Applying this to the above link condition we obtain the following:
Under Assumption 4 we have
Moreover, we have that
The first assertions are a consequence of Heinz Inequality. For the last one, we argue as follows. Since is assumed to be sub-linear. Hence we find that
which completes the proof. ∎
Link conditions as in Assumption 4 imply decay rates for the singular numbers of the operators, known as Weyl’s Monotonicity Theorem [4, Cor. III.2.3]. In our case, this yields that . For classical spaces, as e.g. Sobolev spaces, when , then (one spatial dimension). For the above index function this means that .
Example 2.8 (Finitely smoothing).
In case that the function , and hence its inverse is of power type then this implies a power type decay of the singular numbers of . In this case, the operator is called finitely smoothing.
Example 2.9 (Infinitely smoothing).
If, on the other hand, the function is logarithmic, as e.g., , then . In this case, the operator is called infinitely smoothing.
2.3. Effective dimension
Now we introduce the concept of the effective dimension which is an important ingredient to derive the rates of convergence under Hölder’s source condition [7, 10, 12] and general source condition [16, 29]. The effective dimension for the trace–class operator is defined as,
The integral operator is a trace class operator, hence the effective dimension is finite, and we have that
In the subsequent analysis, we shall need a relationship between the effective dimensions and . For this, the link condition (Assumption 4) is crucial. The arguments will be based on operator monotonicity and concavity. Below, for an operator we assign the singular numbers of the operator .
The following assumption was introduced in . There, it was shown that it is satisfied for both moderately ill-posed and severely ill-posed operators.
There exists a constant such that for we have
The relation between the effective dimensions is established in the following proposition, with proof will given in Appendix A.
For a power type function the above concavity assumptions hold true whenever and . In particular the number is uniquely determined.
2.4. Regularization Schemes
General regularization schemes were introduced and discussed in ill-posed inverse problems and learning theory (See [17, Section 2.2] and [2, Section 3.1] for brief discussion). By using the notation from § 2.1, the Tikhonov regularization scheme from (3) can be re-expressed as follows:
and its minimizer is given by
We consider the following definition.
Definition 2.12 (General regularization).
We say that a family of functions , , is a regularization scheme if there exists such that
For some constant (independent of ), the maximal satisfying the condition:
is said to be the qualification of the regularization scheme .
The qualification covers the index function if the function is nondecreasing.
We mention the following result.
Suppose is a nondecreasing index function and the qualification, say , of the regularization covers . Then
Also, we have that
The first assertion is a restatement of [19, Proposition 3]. For the second assertion, we stress that , which follows from convexity. This yields
which implies the second assertion and completes the proof. ∎
Essentially all the linear regularization schemes (Tikhonov regularization, Landweber iteration or spectral cut-off) satisfy the properties of general regularization. Inspired by the representation for the minimizer of the Tikhonov functional we consider a general regularized solution in Hilbert scales corresponding to the above regularization in the form
3. Convergence analysis
Here we study the convergence for general regularization schemes in the Hilbert scale of the linear statistical inverse problem based on the prior assumptions and the link condition.
The analysis will distinguish between two cases, the ‘regular’ one, when , and the ‘low smoothness’ case, when . In either case, we shall first utilize the concept of distance functions. This will later give rise to establish convergence rates in a more classical style.
For the asymptotical analysis, we shall require the standard assumption relating the sample sizeand the parameter such that
It will be seen, that asymptotically the condition (8) is always satisfied for the parameter which is optimally chosen under known smoothness.
The fact that is decreasing function of and implies that . Hence from condition (8) we obtain,
Several probabilistic quantities will be used to express the error bounds. Precisely, for an index function we let
3.1. The oversmoothing case
As mentioned before, we shall use distance functions, and these are called ‘approximate source conditions’ sometimes, because these measure the violation of a benchmark smoothness. Here the benchmark will be .
Definition 3.1 (Approximate source condition).
We define the distance function by
We denote the element which realizes the above minimization problem.
Notice the following: If then for some the minimizer of the distance function will obey .
Let be i.i.d. samples drawn according to the probability measure . Suppose the Assumptions 1–5 hold true. Suppose that the qualification of the regularization covers the function (for from Assumption 4) and that , are concave, or operator concave functions for some , respectively. Then for all , and for satisfying the condition (8) the following upper bound holds for the regularized solution (7) with confidence :
where depends on , , , , , , .
For the minimizer of the distance function defined in (14), the error can be expressed as follows:
By using Proposition 2.6 the error for the regularized solution can be bounded as
We shall bound each summand on the right in (15).
For under the fact that is increasing function and , for small enough, we get
This together with Proposition 2.10 implies that
where depends on .
By construction of we have that . Using the fact that covers we bound
First, if then there is such that , and
where depends on , , , , , , .
Otherwise, in the low smoothness case, , we introduce the following function