1. Introduction
Let be a linear injective operator between the infinitedimensional Hilbert spaces and with the inner products and , respectively. Let be the space of functions between a Polish space and a real separable Hilbert space . Here we study the linear illposed operator problems governed by the operator equation
(1) 
We observe noisy values of
at some points, and the foremost objective is to estimate the true solution
. The problem of interest can be described as follows: Given data under the model(2) 
where is the observational noise, and denotes the sample size, determine (approximately) the underlying element with being the regression function.
For classical inverse problems, the observational noise is assumed to be deterministic. Here we assume that the random observations
are independent and follow some unknown probability distribution
, defined on the sample space , and hence we are in the context of statistical inverse problems.The reconstruction of the unknown true solution will be based on spectral regularization schemes. Various schemes can be used to stably estimate the true solution. Tikhonov regularization is widelyconsidered in the literature. This scheme consists of the error term measuring the fitness of the data and a penalty term, controlling the complexity of the reconstruction. In this study we enforce smoothness of the approximated solution by introducing an unbounded, linear, selfadjoint, strictly positive operator with a dense domain of definition , and then we define Tikhonov regularization scheme in Hilbert scales as follows:
(3) 
where is a positive regularization parameter and the operator influences the properties of the approximated solution. Standard Tikhonov regularization corresponds to , the identity mapping. In many practical problems, the operator is chosen to be a differential operator in some appropriate function spaces, e.g., spaces.
Notice from (3), that the reconstruction belongs to , such that formally we may introduce . In the regular case, when , then we let . With this notation we can rewrite (1) as
Also, the Tikhonov minimization problem would reduce to the standard one
albeit for a different operator . Accordingly, the error bounds relate as
Therefore, error bounds for in the weak norm, in , yield bounds for . The latter bounds are not known from previous studies. Also, we are interested in the oversmoothing case, when , such that we provide a detailed error analysis, here. However, the above relation will implicitly be utilized in the subsequent proofs.
We review literature related to the considered problem. Regularization schemes in Hilbert scales are widely considered in classical inverse problems (with deterministic noise), starting from F. Natterer [26], and continued in [9, 18, 20, 21, 23, 24, 25, 27, 31]. G. Blanchard and N. Mücke [7] considered general regularization schemes for linear inverse problems in statistical learning and provided (upper and lower) rates of convergence under Hölder type source conditions. Here we consider general (spectral) regularization schemes in Hilbert scales for the statistical inverse problems. We discuss rates of convergence for general regularization under certain noise conditions, approximate source conditions, and a specific link condition between the operators , governing the equation (1), and the smoothness promoting operator as used e.g. in (3). We study error estimates by using the concept of reproducing kernel Hilbert spaces. The concept of the effective dimension plays an important role in the convergence analysis.
The keypoints in our results can be described as follows:

We consider general regularization schemes in Hilbert scales. It is wellknown that Tikhonov regularization suffers the saturation effect. On the contrary, this saturation is delayed for Tikhonov regularization in Hilbert scales.

The analysis uses the concept of link conditions, see Assumption 4, required to transfer information in terms of properties of the operator to the covariance operator.

We analyze the regular case, i.e., when the true solution belongs to the domain of operator .

We also focus on the oversmoothing case, when the true solution does not belong to the domain of operator .
The paper is organized as follows. The basic definitions, assumptions, and notation required in our analysis are presented in Section 2. In Section 3 we discuss the bounds of the reconstruction error in the direct learning setting and inverse problem setting by means of distance functions. This section comprises of two main results: The first result is devoted to convergence rates in the oversmoothing case, while the second result focuses on the regular case. When specifying smoothness in terms of source conditions we can bound the distance functions, and this gives rise to convergence rates in terms of the sample size . This program is performed in Section 4. In case that both, the smoothness as well as the link condition are of power type we establish the optimality of the obtained error bounds in the regular case in Section 5. In the Appendix, we present probabilistic estimates which provide the tools to obtain the error bounds.
2. Notation and Assumptions
In this section, we introduce some basic concepts, definitions, notation, and assumptions required in our analysis.
We assume that is a Polish space, therefore the probability distribution allows for a disintegration as
where
is the conditional probability distribution of
given , and is the marginal probability distribution. We consider random observations which follow the model with centered noise . We assume throughout the paper that the operator is injective.Assumption 1 (The true solution).
The conditional expectation w.r.t. of given exists (a.s.), and there exists such that
The element is the true solution which we aim at estimating.
Assumption 2 (Noise condition).
There exist some constants such that for almost all ,
This assumption is usually referred to as a Bernsteintype assumption.
We return to the unbounded operator . By spectral theory, the operator is welldefined for , and the spaces equipped with the inner product are Hilbert spaces. For , the space is defined as completion of under the norm . The space is called the Hilbert scale induced by
. The following interpolation inequality is an important tool for the analysis:
(4) 
which holds for any [11, Chapt. 8].
2.1. Reproducing Kernel Hilbert space and related operators
We start with the concept of reproducing kernel Hilbert spaces. It is a subspace of (the space of squareintegrable functions from to with respect to the probability distribution
) which can be characterized by a symmetric, positive semidefinite kernel and each of its functions satisfies the reproducing property. Here we discuss the vectorvalued reproducing kernel Hilbert spaces, following
[22], which are the generalization of realvalued reproducing kernel Hilbert spaces [1].Definition 2.1 (Vectorvalued reproducing kernel Hilbert space).
For a nonempty set and a real separable Hilbert space , a Hilbert space of functions from to is said to be the vectorvalued reproducing kernel Hilbert space, if the linear functional , defined by
is continuous for every and .
Definition 2.2 (Operatorvalued positive semidefinite kernel).
Suppose is the Banach space of bounded linear operators. A function is said to be an operatorvalued positive semidefinite kernel if
For a given operatorvalued positive semidefinite kernel , we can construct a unique vectorvalued reproducing kernel Hilbert space of functions from to as follows:

We define the linear function
where for and .

The span of the set is dense in .

Reproducing property:
in other words .
Moreover, there is a onetoone correspondence between operatorvalued positive semidefinite kernels and vectorvalued reproducing kernel Hilbert spaces, see [22].
We assume the following assumption concerning the Hilbert space :
Assumption 3.
The space is assumed to be a vectorvalued reproducing kernel Hilbert space of functions corresponding to the kernel such that

is a HilbertSchmidt operator for with

For , the realvalued function is measurable.
Example 2.3.
In case that the set is a bounded subset of then the reproducing kernel Hilbert space becomes realvalued reproducing kernel Hilbert space. The corresponding kernel becomes the symmetric, positive semidefinite with the reproducing property . Also, in this case the Assumption 3 simplifies to the condition that the kernel is measurable and .
Now we introduce some relevant operators used in the convergence analysis. We introduce the notation for the vectors , , . The product Hilbert space is equipped with the inner product and the corresponding norm . We define the sampling operator , then the adjoint is given by
Let denotes the canonical injection map . Then we observe that, under Assumption 3, both the operators and are bounded by , since
and
We denote the population operators , , , and their empirical versions , , . The operators , , , are positive, selfadjoint and depend on the kernel. Under Assumption 3, the operators , are bounded by and the operators , are bounded by for , i.e., , , and .
2.2. Link condition
In the subsequent analysis, we shall derive convergence rates by using approximate source conditions, which are related to a certain benchmark smoothness. This benchmark smoothness is determined by the user. In order to have handy arguments to derive the convergence rates, we shall fix an (integer) power . We shall use a link condition to transfer smoothness in terms of the operator L to the covariance operator . This link condition will involve an index function.
Definition 2.4 (Index function).
A function is said to be an index function if it is continuous and strictly increasing with .
An index function is called sublinear whenever the mapping is nondecreasing. Further, we require this index function to belong to the following class of functions.
(5)  
The representation is not unique, therefore can be assumed as a Lipschitz function with Lipschitz constant . Now we phrase an important result, needed in our analysis [28, Corollary 1.2.2]:
Example 2.5.
The polynomial function , and the logarithm function are examples of functions in the class .
Assumption 4 (link condition).
There exist a power and an index function , for which the function is sublinear. There are constants such that
The function belongs to the class .
As shown in [9], Assumption 4 implies the range identity . In the context of a comparison of operators we mention the wellknown Heinz Inequality, see [11, Prop. 8.21], which asserts that a comparison , for nonnegative selfadjoint operators yields for every exponent that . Applying this to the above link condition we obtain the following:
Proposition 2.6.
Proof.
The first assertions are a consequence of Heinz Inequality. For the last one, we argue as follows. Since is assumed to be sublinear. Hence we find that
which completes the proof. ∎
Remark 2.7.
From the assertion, it is heuristically clear that the function
cannot increase faster than linearly, because the operator has in it. More details will be given in Section 5.Link conditions as in Assumption 4 imply decay rates for the singular numbers of the operators, known as Weyl’s Monotonicity Theorem [4, Cor. III.2.3]. In our case, this yields that . For classical spaces, as e.g. Sobolev spaces, when , then (one spatial dimension). For the above index function this means that .
Example 2.8 (Finitely smoothing).
In case that the function , and hence its inverse is of power type then this implies a power type decay of the singular numbers of . In this case, the operator is called finitely smoothing.
Example 2.9 (Infinitely smoothing).
If, on the other hand, the function is logarithmic, as e.g., , then . In this case, the operator is called infinitely smoothing.
2.3. Effective dimension
Now we introduce the concept of the effective dimension which is an important ingredient to derive the rates of convergence under Hölder’s source condition [7, 10, 12] and general source condition [16, 29]. The effective dimension for the trace–class operator is defined as,
It is known that the function is continuous and decreasing from to zero for for an infinite dimensional operator (see for details [5, 8, 15, 16, 32]).
The integral operator is a trace class operator, hence the effective dimension is finite, and we have that
In the subsequent analysis, we shall need a relationship between the effective dimensions and . For this, the link condition (Assumption 4) is crucial. The arguments will be based on operator monotonicity and concavity. Below, for an operator we assign the singular numbers of the operator .
The following assumption was introduced in [15]. There, it was shown that it is satisfied for both moderately illposed and severely illposed operators.
Assumption 5.
There exists a constant such that for we have
The relation between the effective dimensions is established in the following proposition, with proof will given in Appendix A.
Proposition 2.10.
Remark 2.11.
For a power type function the above concavity assumptions hold true whenever and . In particular the number is uniquely determined.
2.4. Regularization Schemes
General regularization schemes were introduced and discussed in illposed inverse problems and learning theory (See [17, Section 2.2] and [2, Section 3.1] for brief discussion). By using the notation from § 2.1, the Tikhonov regularization scheme from (3) can be reexpressed as follows:
and its minimizer is given by
We consider the following definition.
Definition 2.12 (General regularization).
We say that a family of functions , , is a regularization scheme if there exists such that

.

.

.

For some constant (independent of ), the maximal satisfying the condition:
is said to be the qualification of the regularization scheme .
Definition 2.13.
The qualification covers the index function if the function is nondecreasing.
We mention the following result.
Proposition 2.14.
Suppose is a nondecreasing index function and the qualification, say , of the regularization covers . Then
Also, we have that
Proof.
The first assertion is a restatement of [19, Proposition 3]. For the second assertion, we stress that , which follows from convexity. This yields
which implies the second assertion and completes the proof. ∎
Essentially all the linear regularization schemes (Tikhonov regularization, Landweber iteration or spectral cutoff) satisfy the properties of general regularization. Inspired by the representation for the minimizer of the Tikhonov functional we consider a general regularized solution in Hilbert scales corresponding to the above regularization in the form
(7) 
3. Convergence analysis
Here we study the convergence for general regularization schemes in the Hilbert scale of the linear statistical inverse problem based on the prior assumptions and the link condition.
The analysis will distinguish between two cases, the ‘regular’ one, when , and the ‘low smoothness’ case, when . In either case, we shall first utilize the concept of distance functions. This will later give rise to establish convergence rates in a more classical style.
For the asymptotical analysis, we shall require the standard assumption relating the sample size
and the parameter such that(8) 
It will be seen, that asymptotically the condition (8) is always satisfied for the parameter which is optimally chosen under known smoothness.
The fact that is decreasing function of and implies that . Hence from condition (8) we obtain,
(9) 
Several probabilistic quantities will be used to express the error bounds. Precisely, for an index function we let
(10)  
(11)  
(12)  
and  
(13) 
In case that we abbreviate by and by , not to be confused with the power. High probability bounds for these quantities are known, and these will be given correspondingly in Propositions B.1 and B.2.
3.1. The oversmoothing case
As mentioned before, we shall use distance functions, and these are called ‘approximate source conditions’ sometimes, because these measure the violation of a benchmark smoothness. Here the benchmark will be .
Definition 3.1 (Approximate source condition).
We define the distance function by
(14) 
We denote the element which realizes the above minimization problem.
Notice the following: If then for some the minimizer of the distance function will obey .
Remark 3.2.
Theorem 3.3.
Let be i.i.d. samples drawn according to the probability measure . Suppose the Assumptions 1–5 hold true. Suppose that the qualification of the regularization covers the function (for from Assumption 4) and that , are concave, or operator concave functions for some , respectively. Then for all , and for satisfying the condition (8) the following upper bound holds for the regularized solution (7) with confidence :
where depends on , , , , , , .
Proof.
For the minimizer of the distance function defined in (14), the error can be expressed as follows:
By using Proposition 2.6 the error for the regularized solution can be bounded as
(15)  
We shall bound each summand on the right in (15).
 :

From the estimates of Propositions B.1, B.2 we get with confidence that
(16) For under the fact that is increasing function and , for small enough, we get
This together with Proposition 2.10 implies that
(17) Under the condition (8) from the estimates (9), (16), (17) we get with confidence :
(18) where depends on .
 :

By construction of we have that . Using the fact that covers we bound
(19)  :
Summarizing, using the estimates of Propositions B.1, B.2, and (18)–(20), we get with confidence :
(21) 
For any parameter choice satisfying the condition (8) using the inequality (9) we get that
and
This implies
(22) 
provided that . Inserting the bound from inequality (22) into the estimate (21) completes the proof. ∎
The bound from Theorem 3.3 is valid for all , and we shall now optimize the bound from Theorem 3.3 with respect to the choice of .
First, if then there is such that , and
where depends on , , , , , , .
Otherwise, in the low smoothness case, , we introduce the following function
Comments
There are no comments yet.