We consider the nonlinear ill-posed operator equation of the form
with a nonlinear forward operator between the infinite-dimensional Hilbert spaces and . Moreover, is the space of functions for a Polish space (the input space) and a real separable Hilbert space (the output space). Ill-posed inverse problems have important applications in the field of science and technology (see, e.g., [13, 15, 29, 31]).
In classical inverse problem setting, we observe the approximation of the function with for some known noise level , then we reconstruct the estimator of the quantity through the regularization schemes. Here we consider the problem in statistical learning setting in which we observe the random noisy image at the points . The problem can be described as follows:
where is the random observational noise with and is called the sample size.
The model (1) covers nonparametric regression under random design (which we also call the direct problem, i.e., ), and the linear statistical inverse learning problem. Thus, introducing a general nonlinear operator gives a unified approach to the different learning problems.
Suppose the random observations are drawn identically and independently according to the joint probability measureon the sample space and the probability measure can be splitting as follows:
is the conditional probability distribution ofgiven and is the marginal probability distribution on .
For the statistical inverse problem (1), the goodness of an estimator can be measured through the expected risk:
Further, we assume that for any . Then for the function
the expected risk can be expressed as follows:
Hence we observe that finding the minimizer of the expected risk is equivalent to obtaining the minimizer of the quantity .
Since the probability measure is unknown, the only information of the probability measure is known through the sample. Therefore we use the regularization methods to stably reconstruct the estimator of the quantity
. The Tikhonov regularization is widely considered in both the classical inverse problems and the statistical learning theory. We consider the Tikhonov regularization in Hilbert scales which consists of the error term measuring the fitness of data and oversmoothing penalty. We introduce an unbounded, closed, linear, self-adjoint, strictly positive operatorwith a dense domain of definition to treat an oversmoothing penalty in terms of a Hilbert scale. For some , the operator satisfies:
For a given sample , we define Tikhonov regularization scheme in Hilbert scales:
Here denotes some initial guess of the true solution, which offers the possibility to incorporate a priori information. Here is a positive regularization parameter which controls the trade-off between the error term and the complexity of the solution.
In many practical problems, the operator which influences the properties of the regularized approximation is chosen to be a differential operator in some appropriate function spaces, e.g., the space of square-integrable functions . It is well-known that the standard Tikhonov regularization suffers the saturation effect. The finite qualification of Tikhonov regularization can be overcome using the Hilbert scales. The problem (5) is non-convex, therefore the minimizer may not exist in general. For the continuous and weakly sequentially closed111i.e., if a sequence converges weakly to some and if the sequence converges weakly to some , then and . operator , there exists a global minimizer of the functional in (5). But it is not necessarily unique since is nonlinear (see [29, Section 4.1.1]).
Generally, in the classical inverse problem literature (see [4, 13, 17, 29] and references therein), the 2-step approaches are considered in which first they construct the estimator of the function by from the observations , then estimate the quantity stably using the various regularization schemes. Here we estimate the quantity in a 1-step method using the Tikhonov regularization scheme (5) in the statistical learning setting.
Now we review the work in the literature related to the considered problem. Regularization schemes in Hilbert scales are widely considered in classical inverse problems (with deterministic noise) [12, 16, 22, 24, 25, 26, 30]. On the contrary, the inverse problems with random observations are not well-studied. The linear statistical inverse problems are studied in , under the assumption that the marginal probability measure is known which is an unrealistic assumption since the only information is available through the input points . This problem is also discussed in  for the general random design with an unknown marginal probability measure.
In this nonlinear setup, the reference  established the error estimates for the generalized Tikhonov regularization for (1) using the linearization technique in a random design setting. In other work, the authors  consider a 2-step approach, however, again under the assumption of the norm in being known. The references  and [17, 32] consider respectively a Gauss-Newton algorithm and the Tikhonov regularization for certain nonlinear inverse problem, but also in the idealized setting of Hilbertian white or colored noise with known covariance, which can only cover sampling effects when is known. Loubes et al.  discussed the problem (1) under a fixed design and concentrate on the problem of model selection. Finally, the recent work  discussed the rates of convergence for the Tikhonov regularization of the nonlinear inverse problem.
We do not restrict ourselves to the Hilbertian white or colored noise.
We consider a 1-step approach rather than existing 2-step approaches for the nonlinear inverse problems.
The considered approach does not suffer the saturation effect of standard Tikhonov regularization.
Following the work [1, 7], we develop the error analysis for the Tikhonov regularization scheme for the nonlinear inverse problems in Hilbert scales in the statistical learning setting. We establish the error bounds for the statistical inverse problems in reproducing kernel approach. We discuss the rates of convergence for Tikhonov regularization under certain assumptions on the nonlinear forward operator and the prior assumptions.
Some structural assumptions are required on the nonlinear mappings to establish the convergence analysis. We consider the widely assumed conditions in the literature of the classical inverse problems, first assumed in , and presented in detail in the monograph . We assume that the operator is Fréchet differentiable at the true solution, the Fréchet derivative is Lipschitz continuous and satisfies the link condition (for precise statement see Assumption 4).
The goal is to analyze the theoretical properties of the Tikhonov estimator , in particular, the asymptotic performance of the regularization scheme is evaluated by the error estimates of the Tikhonov estimator
in the reproducing kernel approach. Precisely, we develop a non-asymptotic analysis of Tikhonov regularization (5) for the nonlinear statistical inverse problem based on the tools that have been developed for the modern mathematical study of reproducing kernel methods. The challenges specific to the studied problem are that the considered model is an inverse problem (rather than a pure prediction problem) and nonlinear. The rate of convergence for the Tikhonov estimator to the true solution is described in the probabilistic sense by exponential tail inequalities. For sample size and the confidence level , we establish the bounds of the form
Here the function is a positive decreasing function and describes the rate of convergence as .
The paper is organized as follows. In Section 2, we discuss the basic definition and assumptions required in our analysis. In Section 3, we discuss the bounds of the reconstruction error under certain assumptions on the (unknown) joint probability measure , and the (nonlinear) mapping . In Appendix, we present the probabilistic estimates and the preliminary results which provide the tools to obtain the error bounds in reproducing kernel approach.
2. Notation and assumptions
In this section, we introduce some basic concepts, definitions, and notations required in our analysis.
2.1. Reproducing Kernel Hilbert space and related operators
We start with the concept of the reproducing kernel Hilbert spaces. It is a subspace of (the space of square-integrable functions from to with respect to the probability distribution
) which can be characterized by a symmetric, positive semidefinite kernel and each of its function satisfies the reproducing property. Here we discuss the vector-valued reproducing kernel Hilbert spaces which are the generalization of real-valued reproducing kernel Hilbert spaces .
Definition 2.1 (Vector-valued reproducing kernel Hilbert space).
For a non-empty set and a real separable Hilbert space , a Hilbert space of functions from to is said to be the vector-valued reproducing kernel Hilbert space, if the linear functional , defined by
is continuous for every and .
Throughout the paper, denotes adjoint of an operator .
Definition 2.2 (Operator-valued positive semi-definite kernel).
Suppose is the Banach space of bounded linear operators. A function is said to be an operator-valued positive semi-definite kernel if
For a given operator-valued positive semi-definite kernel , we can construct a unique vector-valued reproducing kernel Hilbert space of functions from to as follows:
We define the linear function
where for and .
The span of the set is dense in .
in other words .
Moreover, there is a one-to-one correspondence between operator-valued positive semi-definite kernels and vector-valued reproducing kernel Hilbert spaces . The reproducing kernel Hilbert space becomes real-valued reproducing kernel Hilbert space, in the case that is a bounded subset of , and the corresponding kernel becomes the symmetric, positive semi-definite with the reproducing property .
We assume the following assumption concerning the Hilbert space :
The space is assumed to be a vector-valued reproducing kernel Hilbert space of functions corresponding to the kernel such that
is a Hilbert-Schmidt operator for with
For , the real-valued function is measurable.
Note that in case of real-valued functions (), Assumption 1 simplifies to the condition that the kernel is measurable and .
Now we introduce some relevant operators used in the convergence analysis. We introduce the notations for the discrete ordered sets , , . The product Hilbert space is equipped with the inner product and the corresponding norm . We define the sampling operator , then the adjoint is given by
Let be the canonical injection map to . Then we observe that both the canonical injection map and the sampling operator are bounded by under Assumption 1, since
We denote the population version , the corresponding covariance operator. The operator is positive, self-adjoint and depends on both the kernel and the marginal probability measure . We also introduce the sampling version operator which is positive, self-adjoint and depends on both the kernel and the inputs .
By the spectral theory, the operator is well-defined for , and the spaces equipped with the inner product are Hilbert spaces. For , the spaces is defined as completion of under the norm . The space is called the Hilbert scale induced by . We notice that the space is
according to the above notations. The interpolation inequality is an important tool for the analysis:
which holds for any .
2.2. The true solution, noise condition, and nonlinearity structure
We consider that random observations follow the model with a centered noise .
We assume throughout the paper that the operator is injective.
Assumption 2 (The true solution).
The conditional expectation w.r.t. of given exists (a.s.), and there exists such that
From (3) we observe that is the minimizer of the expected risk. The element is the true solution which we aim at estimating.
Assumption 3 (Noise condition).
There exist some constants such that for almost all ,
This Assumption is usually referred to as a Bernstein-type assumption. The distribution of the observational noise reflects in terms of the parameters , . For the convergence analysis, the output space need not be bounded as long as the noise condition for the output variable is fulfilled.
We need the assumption on the nonlinearity structure of operator to establish the rates of convergence. Following the work of Engl et al. [13, Chapt. 10],  on ‘classical’ nonlinear inverse problems, we consider the following assumption:
Assumption 4 (nonlinearity structure).
is convex, is weakly sequentially closed and is Fréchet differentiable with derivative .
the Fréchet derivative is bounded in a ball of sufficiently large radius , i.e., there exists such that
(Link condition) There exists constants and such that for all ,
(Lipschitz continuity of ) For all , there exists a constant such that
A sufficient condition for weak sequential closedness is that is weakly closed (e.g. closed and convex) and is weakly continuous. The link condition (Assumption 4 (iii)) is an interplay between the operator and the Fréchet derivative of the operator . This link condition is known as finitely smoothing. This condition is satisfied in various types of problems (for examples see [9, Example 10.2], [32, Example 4, 5]).
2.3. Effective dimension
Using the singular value decompositionfor an orthonormal sequence
of eigenvectors of
with corresponding eigenvaluessuch that , we get
Since the integral operator is a trace class operator, the effective dimension is finite and we have that
Assumption 5 (Polynomial decay condition).
Assume that there exists some positive constant such that
Assumption 6 (Logarithmic decay condition).
Assume that there exists some positive constant such that
Lu et al.  showed that different kernels with some probability measures show different behavior of the effective dimension. For Gaussian kernel with the uniform sampling on , the effective dimension exhibits the log-type behavior (Assumption 6), on the other hand, the kernel exhibits the power-type behavior (Assumption 5).
3. Convergence analysis
Here we establish the error bounds for the Tikhonov regularization for the nonlinear statistical inverse problems in the -norm in the probabilistic sense. The explicit expression of is not known, therefore we use the definition (5) of the Tikhonov estimator to derive the error estimates. The linearization techniques is used for nonlinear operator in the neighborhood of the true solution . The rates of convergence are established by exploiting the nonlinearity structure of operator (see Assumption 4). We discuss the rates of convergence for the Tikhonov estimator by measuring the effect of random sampling which is governed by the noise condition (Assumption 3). The bounds of the reconstruction error depend on the effective dimension, the smoothness parameter of the true solution and the parameter related to the link condition.
It is convenient to introduce the “standardized” quantities used in our analysis. Here we introduce shorthand notation for some key quantities. We let
The error bound discussed in the following theorem holds non-asymptotically, but this holds with the following choice of the regularization parameter and sample size . We can choose appropriate regularization parameter and sample size such that the following holds:
The condition (8) says that as the regularization parameter decreases, the sample size must increase.
By the definition of as the solution of minimization problem in (5), we have
By linearizing the nonlinear operator at we get
where is the error term by linearizing the operator at true solution . Using this we reexpress the inequality (9) as follows,
Then we have,
which can be re-expressed as
In the analysis, we will make repeated use of the following:
which holds for and .
We apply this inequality to the estimate (13) for and . First we take , and and we obtain
Then we choose , and and we get
Replacing the term that contains on the right-hand side in (12) and using the inequality for we obtain
Applying (14) repeatedly for , and , , , , and we obtain
Under the condition (8) the spectral decomposition of the operator gives
From (8) we get
Hence we get,
By balancing the error terms in (16), we consider the parameter choice for . We have with the probability ,