We consider the efficient numerical solution of linear inverse problems with operator valued data modeled by abstract operator equations
We assume that , representing the possibly perturbed measurements, is a linear operator of Hilbert-Schmidt class between Hilbert spaces and , the dual of . We further assume that the forward operator is linear and compact, and admits a factorization of the form
Problems of this kind arise, for instance, as mathematical models for tomographic applications [1, 25] or inverse scattering problems [6, 13], and as linearizations of related nonlinear problems, see e.g., [8, 20, 29] or  and the references given there. In such applications, typically models the propagation of excitation fields generated by the sources, describes the interaction with the medium to be probed, and models the emitted fields which can be recorded by the detectors. In the following, we briefly outline our basic approach towards the numerical solution of Eq. 1–Eq. 2 and report about related work in the literature.
1.1 Regularized inversion
Due to the special functional analytic setting, the inverse problem Eq. 1–Eq. 2 amounts to an ill-posed linear operator equation in Hilbert spaces and standard regularization theory can be applied for its stable solution [2, 9]. Following the usual arguments, we assume that is a perturbed version of the exact data and that a bound on the measurement noise
is available. We further denote by the minimum norm solution of Eq. 1 with replaced by . A stable approximation for the solution can then be obtained by the regularized inversion of Eq. 1, e.g., via spectral regularization methods
For the actual computation of the regularized solution , a sufficiently accurate finite dimensional approximation of the operator is required, which is usually obtained by some discretization procedure; in the language of model order reduction, this is called the truth or high-fidelity approximation [3, 27]. In the following discussion, we will not distinguish between infinite dimensional operators and their truth approximations. We thus assume that and , with dimensions typically very large, which is required to guarantee that the truth approximation is sufficiently accurate. We may then identify
with a vector in, with a matrix in , and with a -tensor in or a matrix in .
1.2 Related work
The high dimensionality of the problem poses severe challenges for the numerical solution of the inverse problem Eq. 1–Eq. 2 and different model reduction approaches have been proposed to reduce the computational complexity. These typically rely on the construction of certain low-rank approximations for the forward operator or its adjoint , e.g., by truncated singular value decomposition. For problems with regular geometries and constant coefficients, fast analytic singular value decompositions of linear operators have been used in  based on Fourier techniques. In general, the full assembly and decomposition of is, however, computationally prohibitive for many applications. Krylov subspace methods [16, 30] and randomized algorithms [14, 23] then provide alternatives that allow to construct approximate singular value decompositions using only a moderate number of evaluations of and its adjoint . By combining randomized singular value decompositions for subproblems associated to a single frequency in a recursive manner, approximate singular value decompositions have been constructed in  in the context of inverse medium problems. A particular strategy towards dimension reduction consists in reducing synthetically the number of sources. Such simultaneous or encoded sources have been used in various applications with a large number of excitations and detectors, e.g., geophysics [15, 19] and tomography ; see  for further references.
In a recent work , motivated by , the forward operator is assumed to be the Khatri-Rao product of the matrices corresponding to the adjoint operators and in our setting; this induces a similar structure to Eq. 2 if amounts to a diagonal matrix with on its diagonal. The Khatri-Rao product structure allows the efficient evaluation of , required for the solution of the inverse problem, using pre-computed low-rank approximations for and ; see also  for a survey on tensor decompositions. The computational cost of the proposed reconstruction algorithms in  is still rather high and may be prohibitive for problems with distributed parameters. For a survey of model reduction techniques that aim to reduce the dimension of the parameter space and to accelerate the solution of the computational models let us refer to [3, 27].
The approach developed in this paper aims at systematically constructing approximations for the operator with a quasi-optimal low rank comparable to that of the truncated singular value decomposition while at the same time allowing for a more efficient construction and also guaranteeing provable approximation error bounds. After model reduction, even very high dimensional inverse problem can be solved in parts of a second. In the following, we outline our approach in more detail.
1.3 Model reduction
A possible and rather general strategy towards dimension reduction, which we also use in this paper, amounts to projection in data space
where is chosen as some orthogonal projection with finite rank , which is the dimension of the range of . Since we assume that is compact, we can approximate it by finite rank operators, i.e., we can choose sufficiently large such that
Note that typically , where , are the dimensions of the truth approximation used for the computations. Let us recall that the approximation of minimal rank , satisfying Eq. 6, is obtained by truncated singular value decomposition of the operator , which will serve as the benchmark in the following discussion.
As shown in , the low-rank approximation defined in Eq. 7 has essentially the same quality as the infinite dimensional approximation , as long as the perturbation bound Eq. 6 can be guaranteed.
In the sequel, we therefore focus on the numerical realization of Eq. 7, which can be roughly divided into the following two stages:
Setup of the approximations , , and . This compute intensive part can be done in an offline stage and the constructed approximations can be used for repeated solution of the inverse problem Eq. 1 for multiple data.
For the complexity and memory estimates above, we assumed thatis the truth approximation obtained after discretization. Let us note that the analysis step is completely independent of the large system dimension of the truth approximation and therefore the compression and synthesis step are the compute intensive parts in the online stage. If , which is the typical situation [5, 21], then the data compression turns out to be the most compute and memory expensive step.
1.4 Tensor product compression
To further reduce the memory cost of the data compression, we may exploit the particular structure Eq. 2 of the forward operator reflected in the tensor product structure of the measurement space . We define a tensor product projection operator
via two separate projections , of rank in the spaces and of sources and detectors. After defining and , which are again operators of rank , we obtain a tensor product approximation
of the forward operator whose rank is . In the spirit of [15, 19], the columns of the operators and could be interpreted as optimal sources and detectors; their choice and construction is also strongly related to optimal experiment design .
With similar arguments as before, we may choose sufficiently large such that
which yields a corresponding low-dimensional approximation for the regularized solution of Eq. 1 with still optimal approximation properties . Further note that the tensor product structure of allows to compute the projected data
in two steps and that the first projection can be applied already during recording of the data. Simultaneous access to the full data is therefore never required and the memory cost of data recording and compression is thereby reduced to . If only is required in Eq. 10, then this is substantially smaller than the memory cost for computing with a generic projection which does not take advantage of the underlying tensor product structure. Similar projections and of and were also used in  to speed-up the data compression step.
One major disadvantage coming with the tensor product projection is its still relatively high rank , which is typically much larger than the optimal rank achievable by truncated singular value decomposition. To overcome this, we employ another compression of , giving rise to a projection
with rank that can be proven to be virtually the same as the optimal rank of the truncated singular value decomposition. In this way, we can combine the advantages of an almost optimal rank approximation and the tensor product pre-compression of the data. It turns out that this two-step construction is also beneficial for the computation of the projections , , and and the operators and in the offline phase. Our analysis reveals that actually only a hyperbolic cross approximation  for the tensor product approximation is required for the recompression, which substantially improves the computational complexity.
1.6 Main contributions and outline of the paper
We will present a complete analysis of the proposed model reduction approach in an infinite-dimensional functional analytic setting. Our results thus become independent of the underlying truth approximation, which is only used for the actual computations, and we obtain a certified reduced order model with guaranteed error bounds. In addition, we demonstrate that these models can be constructed at substantially lower cost than typical low-rank approximations obtained by approximate singular value decompositions.
The remainder of the manuscript is organized as follows: In Section 2, we discuss in detail the construction of for problems of the form Eq. 2. Under mild assumptions on the mapping properties of the operators , , and , we show how to define appropriate projections , , and in order to rigorously establish the approximation property Eq. 6. In particular, we show how to construct the second projection in a post processing step that only requires access to the tensor product approximation or its hyperbolic cross approximation , but not to the full operator . To illustrate the applicability of our theoretical results, we discuss in Section 3 a particular example stemming from fluorescence diffuse optical tomography. An appropriate choice of function spaces allows us to verify all conditions required for the analysis of our approach. In Section 4, we report in detail about numerical tests, in which we demonstrate the computational efficiency of the model reduction approach and the resulting numerical solution of the inverse problems.
2 Analysis of the model reduction approach
We will start with introducing our basic notation and then provide a complete analysis of the data compression and model reduction approach outlined in the introduction.
Runction spaces will be denoted by and assumed to be separable Hilbert spaces with scalar product and norm . By we denote the dual of , i.e., the space of bounded linear functionals on , and by the corresponding duality product. Furthermore, denotes the Banach space of linear operators with norm . We write for the range of the operator and define . By and we denote the dual and the adjoint of a bounded linear operator defined, respectively, for all , , and by
The two operators and are directly related by Riesz-isomorphisms. Let us further recall that any compact linear operator has a singular value decomposition, i.e., a countable system such that
with singular values and and denoting orthonormal basis for and , respectively. Also note that and . Moreover, by the Courant-Fisher min-max principle , the st singular value can be characterized by
where the denote -dimensional subspaces of . Hence every linear compact operator can be approximated by truncated singular value decompositions
with error . Conversely, any linear bounded operator that can be approximated in norm by finite-rank operators is necessarily compact.
We further denote by the Hilbert-Schmidt class of compact linear operators whose singular values are square summable. Note that is a Hilbert space equipped with the scalar product , where is an orthonormal basis of . Moreover, the scalar product and the associated norm are independent of the choice of this basis. Let us mention the following elementary results, which will be used several times later on.
(a) Let . Then there exists a sequence of linear operators of rank , such that .
(b) Let , be two linear bounded operators and at least one of them Hilbert-Schmidt. Then the composition is Hilbert-Schmidt and
Here and below, we use to express with some constant that is irrelevant in the context, and we write when and .
For convenience of the reader, we provide a short proof of these assertions.
The assumption implies that is compact with square summable singular values, and hence . The truncated singular value decomposition then satisfies which yields (a). After choosing an orthonormal basis , we can write
which implies the first inequality of assertion (b). The second inequality follows from the same arguments applied to the adjoint and noting that the respective norms of an operator and its adjoint are the same.
2.2 Preliminaries and basic assumptions
We now introduce in more detail the functional analytic setting for the inverse problem Eq. 1 used for our considerations. We assume that the operators , , appearing in definition Eq. 2 satisfy Let , , and . Following our convention, all function spaces appearing in these conditions, except the space , are separable Hilbert spaces. We can now prove the following assertions.
Let Section 2.2 be valid. Then defines a bounded linear operator and, additionally, is compact.
Linearity of is clear by construction and the linearity of , , and . Now let denote an orthonormal basis of and let be arbitrary. Then
where we used Lemma 1 in the second step, and the boundedness of the operators in the third. Since , we obtain
for all , which shows that is bounded. Using Lemma 1(a), we can further approximate and by operators , of rank , such that
and we can define an operator by , which defines an approximation of of rank . From Lemma 1(b), we infer that
2.3 Tensor product approximation
As an immediate consequence of the arguments used in the previous theorem, we obtain the following approximation result.
Let Section 2.2 hold. Then for any there exists with and rank approximations and such that
Here and are orthogonal projections on and , respectively. Furthermore, the operator defined by has rank and satisfies
If the singular values of and satisfy for some , then the assertions hold with , and consequently .
The operators and can be obtained by truncated singular value decomposition of and , and and then are the projections onto the spaces spanned by the first right singular vectors of and , respectively. The assertions of the lemma further imply in particular that the singular values of decay at least like with ; the latter follows from the fact the and are Hilbert-Schmidt, and thus their singular values are square summable.
Hyperbolic cross approximation
Any operator for with will be called a -approximation for in the following. Note that is a -approximation of rank , while the -approximation of minimal rank is obtained by truncated singular value decomposition Eq. 15. In particular, this implies that . We will illustrate now, that the converse statement is in general not true, i.e., the tensor product approximation may have substantially higher rank than required for the -approximation property.
Let and for some and . Then we have , and for any , we can find with and an approximation of rank , such that
Let denote the singular systems for and , respectively. We now show that the hyperbolic cross approximation 
with the choice , , and has the required properties. By counting, one can verify that , since by construction is summable. Furthermore, we can bound
By observing that , , and and by using the decay properties of the singular values, we obtain
In the last step, we used the fact that and , which follows immediately from the construction.
Comparing the results of Lemmas 4 and 3, we expect to obtain a tensor product approximation of rank while the hyperbolic cross approximation and consequently also the truncated singular value decomposition of the same accuracy only have rank , which may be substantially smaller for . Hence the rank of the tensor product approximation will, in general, not be of optimal order.
2.4 Quasi-optimal low-rank approximation via recompression
We will now show that a further compression of the tensor product approximation allows to obtain a low-rank approximation with quasi-optimal rank.
Let and denote a -approximation for according to Lemma 3. Further assume that the singular values of decay like
Then there exists an orthogonal projection on with rank and
Moreover, can be constructed using only knowledge of the approximation .
For ease of notation, we use to abbreviate the corresponding operator norm. Now let denote the truncated singular value decomposition of with rank . Using assumption Eq. 20 we obtain that
Furthermore, let be the truncated singular value decomposition of with the same rank as above. Then by the triangle inequality
The first term can be bounded using the -approximation property of . From the min-max characterization of the singular values Eq. 14, we know that the truncated singular value decomposition yields the best-approximation in the set of bounded linear operators with rank ; also known as the Eckart-Young-Mirsky theorem. Hence the second term can be further estimated by
Here we used that and the -approximation property of . The result then follows by combination of the two estimates derived above.
In the previous lemma, we could use instead of also any other -approximation of the operator , e.g., the hyperbolic cross approximation constructed in Lemma 4; the proof carries over verbatim. In fact, the lemma relies on a well-known result from perturbation theory , viz., the singular values of the -approximation are in a -neighborhood of the singular values of .
Let us briefly summarize the main observations and results of this section. We constructed a certified reduced order model , i.e., -approximation, for the operator with quasi-optimal rank comparable to that of truncated singular value decomposition. The given construction is based on certified low-rank approximations , for the operators and which can be computed more efficiently than a low-rank approximation for the full operator . The resulting tensor product approximation can then be further compressed by truncated singular value decomposition yielding the quasi-optimal low-rank approximation .
As can be seen from the proof of Lemma 4 and Remark 3, the tensor product approximation is not really needed but can be replaced by its hyperbolic cross approximation when computing the final approximation . This allows to substantially improve the computational complexity of the offline phase and is a key ingredient for the efficient realization of our model reduction approach.
The analysis in this section is done in abstract spaces and applies verbatim to infinite-dimensional operators as well as to their finite-dimensional truth approximations obtained after discretization. As a consequence, the computational results, e.g., the rank and of the approximations, can be expected to be essentially independent of the actual truth approximation used for computations.
3 Fluorescence optical tomography
In order to illustrate the viability of the theoretical results derived in the previous section, we now consider in some detail a typical application arising in medical imaging.
3.1 Model equations
Fluorescence optical tomography aims at retrieving information about the concentration of a fluorophore inside an object by illuminating this object from outside with near infrared light and measuring the light reemitted by the fluorophores at a different wavelength. The distribution of the light intensity inside the object generated by a source at the boundary, is described by
We assume that , , is a bounded domain with smooth boundary enclosing the object under consideration. The light intensity emitted by the fluorophores is described by a similar equation
The model parameters , , and , , characterize the optical properties of the medium at excitation and emission wavelength; we assume these parameters to be known, e.g., determined by independent measurements . As shown in , the above linear model, which can be interpreted as a Born approximation or linearization, is a valid approximation for moderate fluorophore concentrations.
3.2 Forward operator
The forward problem in fluorescence optical tomography models an experiment in which the emitted light resulting from excitation with a known source and after interaction with a given fluorophore concentration is measured at the boundary. The measurable quantity is the outward photon flux, which is proportional to ; see  for details. The potential data for a single excitation with source measured by a detector with characteristic can be described by
where and are determined by the boundary value problems Eq. 22–Eq. 25. The inverse problem finally consists of determining the concentration of the fluorophore marker from measurements for multiple excitations and detectors .
We now illustrate that fluorescence optical tomography perfectly fits into the abstract setting of Section 2. Let us begin with defining the excitation operator
In dimension , the product of two functions and , lies in and can thus be interpreted as a bounded linear functional on ; this shows that is a bounded linear operator. We further introduce the linear operator
which maps to the weak solution of the adjoint emission problem
As function spaces we choose , , and .
In order to apply the results of Section 2, it remains to verify Section 2.2. We already showed that is a bounded linear operator. The following assertion states that also the remaining conditions on and hold true.
The Hilbert-Schmidt property follows immediately from the decay behavior of the singular values. To show the latter, let be the space of piecewise linear finite elements on a quasi-uniform triangulation of with meshsize . Let be the -orthogonal projection onto and arbitrary. Then standard approximation error estimates, see e.g., , yield
A-priori estimates for elliptic PDEs yield , and hence can be continuously extended to an operator on ; see e.g. . This yields
where is the dimension of the space . From the min-max characterization of the singular values Eq. 14, we may therefore conclude that as required. The result for follows in the same way.
If prior knowledge on the support of the fluorophore concentration is available, which is frequently encountered in practice, elliptic regularity  implies exponential decay of the singular values and . In such a situation, the rank and in Lemmas 5 and 3 will depend only logarithmically on the noise level , and an accurate approximation of very low rank can be found.
4 Numerical illustration
We will now discuss in detail the implementation of the model reduction approach presented in Section 2 for the fluorescence optical tomography problem and demonstrate its viability by some numerical tests.
4.1 Truth approximation
Let denote a quasi-uniform conforming triangulation of the domain with denoting the mesh size. For the discretization of Eq. 22–Eq. 23 and Eq. 30–Eq. 31, we use a standard finite element method with continuous piecewise linear polynomials; the corresponding spaces then have dimension each. We choose the same finite element space also for the approximation of the concentration . The sources for the forward and the adjoint problem are approximated by piecewise linear functions on the boundary of the same mesh ; hence , have dimension
. All approximation spaces are equipped with the topologies induced by their infinite dimensional counterparts. Standard error estimates allow to quantify the discretization errors in the resulting truth approximation of the forward operator and to establish the-approximation property for small enough. The error introduced by the discretization can therefore be assumed to be negligible.
Let us briefly discuss in a bit more detail the algebraic structure of the resulting problems arising in the truth approximation. Choosing standard nodal bases, the finite element approximation of problem Eq. 22–Eq. 23 leads to the linear system
Here are the stiffness and mass matrices with coefficients , , and the matrices , stem from the discretization of the boundary conditions. The columns of regular represent the individual independent sources in the basis of . Any excitation generated by a source in can thus be expressed as a linear combination of columns of the excitation matrix , which serves as a discrete counterpart of the operator . In a similar manner, the discretization of the adjoint problem Eq. 30–Eq. 31 leads to
whose solution matrix can be interpreted as the discrete counterpart of the operator . The system matrices , , , and have a similar meaning as above, and the columns of represent the individual detector characteristics. The algebraic form of the truth approximation finally reads