Many mathematical and computational models depend on parameters. These may be quantities which have to be optimised during a design, or controlled in a real-time setting, or these parameters may be uncertain and represent uncertainty present in the model. Such parameter dependent models are usually specified in such a way that an input to the model, e.g. a process or a field, depends on these parameters. In an analogous fashion, the output or the “state” of the model will depend on those parameters. Any of these entities may be called a parametric model. To make things a bit more specific, we look at an example: Consider the parametric entities in the following equation:
Here is a possibly nonlinear opertor from the Hilbert space into itself, dependent on
— a vector in another Hilbert spaceused to specify the system — is the state of the system described by , whereas is the excitation resp. action on the system. The parameters are elements of some admissible parameter set . Here and are two examples of such parametric entities; and as the whole equation depends on , we assume that for each the system eq:prim-ex will be well-posed and allow for the state to also be a unique function of the parameters — another example of a parametric entity.
When one has to do computations with a system such as eq:prim-ex, one needs computational representations of the parametric entities such as the “inputs” , and also the to be determined state , the “output”. Let us denote any of such generic entities as ; then one seeks a computational expression to compute for any given parameter . The first question which has to be addressed is how to choose “good co-ordinates” on the parameter set . With this we mean scalar functions , so that the collection and specification of all will on one hand specify the particular as regards the system eq:prim-ex, and on the other hand be a computational handle for the parametric entities , which now can be expressed as . Often the parameter set is already given as , so that are directly given co-ordinates, and the co-ordinate functions may directly serve as co-ordinates. But often, and not only, but especially, when is a large number, it may be advisable to choose other co-ordinates , which should be free of possible constraints and be as “independent” as possible. This is usually part of finding a good computational representation for , and will be addressed as part of our analysis. One may term this as a re-parametrisation of the problem.
The second question to be addressed is the actual number of degrees-of-freedom needed to describe the behaviour of the system eq:prim-ex through some finite-dimensional approximation or discretisation. Often the initial process of discretisation produces a first approximation with a large number of degrees-of-freedom; this initial computational model is often referred to as a full-scale or high-fidelity model. For many computational purposes it is necessary to reduce the number of degrees-of-freedom in the computational model in order to be able to carry out the computations involved in an optimisation or uncertainty quantification in a acceptable amount of time; such computational models are then termed reduced order models (ROMs). If the high-fidelity model is a parametric model, the same is required from the ROM.
The question of how to produce ROMs for specific kinds of systems like eq:prim-ex is an important one, and is the subject of extensive current research. For the general subject of model order reduction there is an excellent collection of recent work in  and survey in , as well as an introductory text in ; see also [20, 34] for important contributions. Besides these general considerations, in the present case parametrised ROMs are of particular interest. The general survey  covers the literature up to 2015 very well, as well as the later one , which is concerned mainly with uncertainty quantification. Excellent collections on the topic of parametrised ROMs are contained in  and . A recent systematic monograph is , and important recent contributions are e.g. [9, 45, 46]
. Machine learning and so-called data-driven procedures have also been used in this context, see the recent contributions in[28, 42, 41, 43, 44], but this is at the very beginning.
Here a particular point of view will be taken for the analysis — not to be found in the recent literature just surveyed — namely the identification of a parametric entity with a linear mapping defined on the dual space, which is introduced in parametric. This idea has been around for a long time, and has surfaced mostly when the “strong” notion of a concept has to be replaced by a “weaker” one. In this sense one may see the present point of view as a generalisation of the view of distributions of generalised functions as linear mappings [21, 23]. They were used to define weak notions of random quantities , and some of the present ideas are also contained in . In some sense these ideas are already contained in  — see also the English translation 
— and may most probably be found even earlier. The reason on why to approach the subject in this way is that for linear operators there is a host of methods which can be used for their analysis, and it puts all such parametric entities under one “roof”.
Here we want to explain the basic abstract framework and how it applies to ROMs. This present work is a continuation of  and [38, 37, 35]. The general theory was shown in , and here the purpose is primarily to give an introduction into this kind of analysis, which draws strongly on the spectral analysis of self-adjoint operators (e.g. [22, 24, 16]), and an overview on how to use it in the analysis of ROMs. This is the topic of correlat. Coupled systems and their ROMs are the focus of , and  is a short note on how this is used for random fields and processes. In the xmpls some examples of such refinements of the basic concept are given.
As will be seen, it is very natural to deal with tensor products in this topic of parametrised entities. In the form of the proper generalised decomposition (PGD) this idea has been explained and used in [13, 2, 17, 12, 11]. The topic of tensor approximations  turns out to be particularly relevant here, and recently new connections between such approximations and machine learning with deep networks have been exposed [14, 32]. In concl we conclude with a recapitulation of the main ideas.
2 Parametric models and linear maps
This is a gentle introduction and short recap of the developments in [38, 37, 35], where the interested reader may find more detail. To start, and to take a simple motivating example, one could think of a scalar function , defined on some set , which depends on some parameters in a set — in other words a parametric function. In what follows, this function will be viewed as a mapping
so that for each value of the function is a scalar function defined on the set .
To simplify further and make everything finite-dimensional, assume that we are only interested in four positions in , namely , or, alternatively and even simpler, that has only four elements, and finally for the sake of simplicity, that the parameter set has only three elements . Then one can arrange all the possible values of with the abbreviation in the following matrix:
It is obvious that knowing the function is equivalent with knowing the matrix . As a matrix obviously corresponds to a linear mapping from to , and one has for any that
where — a weighted average of — is a scalar function in the linear space of scalar functions on the parameter set . If we denote the function of eq:def-r-map in this case by , which for every is an element , then the weighted average in eq:def-simpl-ex obviously satisfies , so that
Obviously, knowing is the same as knowing for every — actually a basis in would suffice — which in turn is the same as knowing for every .
The point to take away from this simple example is that the parametric function , where for each parameter value one has in some linear space — of functions on in this case — is equivalent to a linear map
into a space of scalar functions on the parameter set .
It is now easy to see how to generalise this further to cases where the set or or both have infinitely many values, and even further to a case where the vector space of functions just has an inner product, say given by some integral, so that for one has
with some measure on . Then for each parameter one has , a function on , or in other words an element of the linear space . In this case one defines the linear map
which is a linear map from onto a linear space of scalar functions on the parameter set .
This then is almost the general situation, where one views as a map from the parameters , where may be some arbitrary set, into a topological vector space . One then defines a linear map
from the dual space onto a space of scalar functions on by
where is the duality pairing between and its dual space . For the following exposition of the main ideas we shall take a slightly less general situation by assuming for the sake of simplicity that the linear space is in fact a separable Hilbert space with an inner product , and use this in the usual manner to identify it with its dual.
Associated linear map:
So with a vector-valued map , one defines the corresponding associated linear map as
Obviously only the Hilbert subspace actually reached by the map is interesting, whereas is not. Hence from now on we shall only look at , and additionally assume that , or in other words, that the vectors form a total set in . The map is thus formally redefined as
Again, in the linear space of all scalar functions on , only the part is interesting.
Allow here a little digression, to point out similarities and analogies to other connected concepts. First, on the parameter set , where up to now no additional mathematical structure was used, we now have the linear space . This can be viewed as a first step to introduce some kind of “co-ordinates” on the set , and is in line with many other constructs where potentially complicated sets are characterised by algebraic constructs, such as groups or vector spaces for e.g. homology or cohomology. Even if from the outset the parameter set is given as some subset of some and therefore has already coordinates, these may not be good ones, and as we shall see, it may be worthwhile to contemplate re-parametrisations, i.e. choosing some as “co-ordinates”. These real valued functions are in general of course not “real co-ordinates”, as they only distinguish what is being felt by the parametric object .
Reproducing kernel Hilbert space:
The second concept to touch on comes from the idea to use the function space in place of : As is easy to see, the map in eq:U-1 is injective, hence invertible on its image , and this may be used to define an inner product on as
and to denote the completion of with this inner product by . One immediately obtains that is a bijective isometry between and , hence extends to a unitary map between and , and the same hold for , the extension being denoted by .
Given the maps and , one may define the reproducing kernel [7, 29] given by . It is straightforward to verify that , and , as well as the reproducing property for all . Another way of stating this reproducing property is to say that the linear map for all is the identity on . An abstract way of putting this using the adjoint of the unitary map would be to note that that map is in fact .
With the reproducing kernel Hilbert space (RKHS) one can build a first representation and thus obtain a relevant “co-ordinate system” for . As is separable, it has a Hilbert basis or complete orthonormal system (CONS) . As is unitary, the set is a CONS in .
With this, the unitary operator , its adjoint or inverse , and the parametric element become 
Observe that the relations eq:VII0 and eq:VII0-1 exhibit the tensorial nature of the representation mapping. One sees that model reductions may be achieved by choosing only subspaces of , i.e. spanned by a—typically finite—subset of the CONS . Furthermore, the representation of in eq:VII0-1 is linear in the new “parameters” .
The third concept one should mention in this context is the one of coherent states, e.g. see [1, 3]. In this development from quantum theory, these quantum states were initially introduced as eigenstates of certain operators, and the name refers originally to their high coherence, minimum uncertainty, and quasi classical behaviour. What is important here is that the idea has been abstracted, and represents overcomplete sets of vectors or frames in a Hilbert space , which depend on a parameter from a locally compact measure space. This space often has more structure, e.g. a Lie group, and the coherent states are connected with group representations in the unitary group of , i.e. if is a unitary representation, the coherent states may be defined by for some . There are usually further requirements like weak continuity for the map , and that these coherent states form a resolution of the identity, in that one has (weakly)
where is a measure on —naturally defined on some -algebra of subsets of , a detail which needs no further mention here. We shall leave this topic here, and come back to similar representations later, but note in passing the tensor product structure under the integral. The above requirement of the resolution of the identity may sometimes be too strong, and one often falls to the case of RKHS discussed above.
Assume now that is an approximate or reduced order model (ROM) of . One possibility of producing such a ROM was already mentioned above by letting the sum in eq:VII0-1 run over fewer terms. The ROM thus has an associated linear map . As the associated linear maps carry all the relevant information, the analysis of both the original parametric object , and the comparison and analysis of accuracy of the approximation can be carried out in terms of the associated linear maps and . In the present setting is unitary, so can be judged by how well it approximates that unitary mapping. In the next correlat, where a second inner product will be introduced on the space of scalar functions of , this will be even more pronounced, as it will offer the possibility of deciding which CONS or other complete sets in subspaces of are advantageous for ROMs.
3 Correlation and Representation
In what was detailed up to now in the previous parametric with regard to the RKHS, was that the structure of the Hilbert space was carried reproduced on the subspace of the full function space. In the remarks about coherent states one could already see an additional structure, namely a measure on . This measure structure can be used to define the subspace of measurable functions, as well as its Hilbert subspace of square-integrable functions with associated inner product
We shall simply assume here that there is a Hilbert space of functions with inner product , which may or may not come from an underlying measure space. The associated linear map , essentially defined in eq:V-1 with range the RKHS , will now be seen as a map into the Hilbert space , i.e. with a different range with different inner product from the RKHS inner product on . One may view this inner product as a way to tell what is important in the parameter set : functions with large -norm are considered more important than those where this norm is small. The map is thus generally not unitary any more, but for the sake of simplicity, we shall assume that it is a densely defined closed operator, see e.g. . As it may be only densely defined, it is sometimes a good idea to define through a densely defined bilinear form in :
The map — observe that now the adjoint is w.r.t. the -inner product — may be called the “correlation” operator, and is by construction self-adjoint and positive, and if is bounded resp. continuous, so is .
In the above case that the -inner product comes from a measure, one has from eq:IX
This is reminiscent of what was required for coherent states. But it also shows that if were a probability measure — i.e. — with the usual expectation operator
then the above would be really the familiar correlation operator [33, 35] of the -valued random variable (RV) , therefore from now on we shall simply refer to as the correlation operator, even in the general case not based on a probability measure.
The fact that the correlation operator is self-adjoint and positive implies that its spectrum is real and non-negative. This will be used when analysing it with any of the versions of the spectral theorem for self-adjoint operators (e.g. ). The easiest and best known version of this is for finite dimensional maps.
Finite dimensional beginnings:
So let us return to the simple example at the beginning of parametric where the associated linear map can be represented by a matrix . If we remember the each row is the value for the vector for one particular , we see that the matrix can be written as
and that the rows are just “snapshots” for different values . What is commonly done now is the so-called method of proper orthogonal decomposition (POD) to produce a ROM.
The matrix — to generalise a bit, assume it of size — can be decomposed according to its singular value decomposition (SVD)
where the matrices and are orthogonal with unit length orthogonal columns — right and left singular vectors — resp. , and is diagonal with non-negative diagonal elements , the singular values. For clarity, we arrange the singular values in a decreasing sequence,
. It is well known that this decomposition is connected with the eigenvalue or spectral decomposition of the correlation
with eigenvalues, and its companion
with the same eigenvalues, but eigenvectors . The representation is based on , and its accompanying POD or Karhunen-Loève decomposition:
where , and .
The second expression in eq:KLE-fdim is a representation for , and that is the purpose of the whole exercise. Similar expressions may be used as approximations. It clearly exhibits the tensorial nature of the representation, which is also evident in the expressions eq:SVD-R-fdim, eq:spec-C-fdim, and eq:spec-CQ-fdim. One sees here that this is just the -th column of , so that with the canonical basis in , with the Kronecker-, that expression becomes just
by taking other vectors in
to give weighted averages or interpolations.
The general picture which emerges is that the matrix is a kind of “square root” — or more precisely factorisation — of the correlation , and that the left part of this factorisation is used for reconstruction resp. representation. In any other factorisation like
where maps into some other space ; the map will necessarily have essentially the same singular values and right singular vectors as , and can now be used to have a representation or reconstruction of on via
A popular choice is to use the Choleski-factorisation of the correlation into two triangular matrices, and then take for the reconstruction.
As we have introduced the correlation’s spectral factorisation in eq:spec-C-fdim, some other factorisations come to mind, although they may be mostly of theoretical value:
where then the reconstruction map is or . Obviously, in the second case the reconstruction map is symmetric , and is actually the true square root of the correlation .
Other factorisation can come from looking at the companion in eq:spec-CQ-fdim. Any factorisation or approximate factorisation of
is naturally a factorisation or approximate factorisation of the correlation
where and are the left and right singular vectors — see eq:SVD-R-fdim — of the associated map resp. the eigenvectors of the correlation in eq:spec-C-fdim and its companion in eq:spec-CQ-fdim. A new ROM representation can now be found for via
One last observation here is important: the expressions for resp. one of its ROMs are linear in the newly introduced parameters or “co-ordinates” in eq:KLE-fdim, resp. in eq:KLE-fdim-appr, resp. in eq:fact-fdim-h-appr and eq:ROM-fdim, as well as in eq:ROM-Q-fdim; which is an important requirement in many numerical methods.
Reduced order models — ROMs:
As has become clear now, and was mentioned before, that approximations or ROMs to the full model produce associated maps , which are approximate factorisations of the correlation:
This introduces different ways of judging how good an approximation is. If one looks at the difference between the full model and ist approximation as a residual, and computes weighted versions of it
then this is just the difference linear map applied to the weighting vector . In eq:KLE-fdim is was shown that is a representation. As usual, one may now approximate such an expressions by leaving out terms with small or vanishing singular values, say using only , getting an approximation of rank — this also means that the associated linear map in eq:KLE-fdim has rank . As is well known , this is the best -term approximation in the norms of and . But from eq:r-ra-diff one may gather that the error can also be described through the difference . As error measure one may take the norm of that difference, and, depending on which norm one chooses, the error is then in — this example approximation — in the operator norm, or in the trace- resp. nuclear norm, or in the Frobenius- resp. Hilbert-Schmidt norm.
On the other hand, different approximations or ROMs can now be obtained by starting with an approximate factorisation
and introducing a ROM via
Such a representing linear map , may, e.g. via its SVD, be written as a sum of tensor products, and approximations are often lower rank expressions, directly reflected in a reduced sum for the tensor products. As will become clearer at the end of this section, the bilinear forms eq:IX-a resp. eq:IX can sometimes split into multi-linear forms, thus enabling the further approximation of through hierarchical tensor products .
Infinite dimensional continuation — discrete spectrum:
For the cases where both and are infinite dimensional, the operators and live on infinite dimensional spaces, and the spectral theory gets a bit more complicated. We shall distinguish some simple cases. After finite dimensional resp. finite rank operators just treated in matrix form, the next simplest case is certainly the case when the associated linear map and the correlation operator has a discrete spectrum, e.g. if is compact, or a function of a compact operator, like for example its inverse. In this case the spectrum is discrete (e.g. ), and in the case of a compact operator the non-negative eigenvalues of may be arranged as a decreasing sequence with only possible accumulation point the origin. It is not uncommon when dealing with random fields that is a nuclear or trace-class operator, i.e. an operator which satisfies the stronger requirement . The spectral theorem for a an operator with purely discrete spectrum takes the form
where the eigenvectors form a CONS in . Defining a new corresponding CONS in via , one obtains the singular value decomposition of and with singular values :
It is not necessary to repeat in this setting of compact maps all the different factorisations considered in the preceding paragraphs, and especially their approximations, which will be usually finite dimensional as they are made to be used for actual computations, e.g. the approximations will usually involve only finite portions of the infinite series in eq:XIII and eq:XIV, which means that the induced linear maps have finite rank and essentially become finite dimensional, so that the preceding paragraphs apply practically verbatim.
But one consideration is worth to follow up further. In infinite dimensional Hilbert spaces, self-adjoint operators may have a continuous spectrum, e.g. ; this is what is usually the case when homogeneous random fields or stationary stochastic processes have to be represented, This means that the expressions developed for purely discrete spectra in eq:XIII and eq:XIV are not general enough. These expressions are really generalisations of the last equalities in eq:spec-C-fdim and eq:SVD-R-fdim; but is is possible to give meaning to the matrix equalities in those equations, which simultaneously cover the case of a continuous spectrum.
In infinite dimensions — non-discrete spectrum:
To this end we introduce the so called multiplication operator: Let be the usual Hilbert space on some locally compact measure space , and let be an essentially bounded function. Then the map
for is a bounded operator on . Such a multiplication operator is the direct analogue of a diagonal matrix in finite dimensions.
Using such a multiplication operator, one may introduce a formulation of the spectral decomposition different from eq:XIII which does not require to be compact , resp. do not even have to be continuous resp. bounded:
where is unitary between some on a measure space and . In case is continuous resp. bounded, one has . As is positive, the function is non-negative ( a.e. for ). This covers the previous case of operators with purely discrete spectrum if the function is a step function and takes only a discrete (countable) set of values — the eigenvalues. This theorem is actually quite well known in the special case that is the correlation operator of a stationary stochastic process — an integral operator where the kernel is the correlation function; in this case is the Fourier transform, and is known as the power spectrum.
To investigate the analogues of further factorisations of , , and its companion , we need the SVD of and . They derive generally in the same manner as for the finite dimensional case from the spectral factorisations of in eq:ev-mult and a corresponding one for its companion
with a unitary between some on a measure space and . Here in eq:ev-cq-mult, and in eq:ev-mult, the multiplication operator plays the role of the diagonal matrix in eq:spec-C-fdim and eq:spec-CQ-fdim. For the SVD of one needs its square root, and as is non-negative, this is simply given by , i.e. multiplication by . Hence the SVD of and is given by
These are all examples of a general factorisation , where is a map to a Hilbert space with all the properties demanded from —see the beginning of this section. It can be shown  that any two such factorisations and with are unitarily equivalent in that there is a unitary map such that . Equivalently, each such factorisation is unitarily equivalent to , i.e. there is a unitary such that .
Analogues of the factorisations considered in eq:fact2-fdim are
where again is the square root of .
And just as in the case of the factorisations of considered in eq:fact-Q-fdim and the resulting factorisation of in eq:fact-CQ-fdim, it is also here possible to consider factorisations of in eq:ev-cq-mult, such as
with some Hilbert space , which lead again to factorisations of
and representation on the space ; with the representing linear maps given by resp. .
Coming back to the situation where has a purely discrete spectrum and a CONS of eigenvectors in , the map from the decomposition can be used to define a CONS in : , which is an eigenvector CONS of the operator , with , see . From this follows a SVD of and in a manner analogous to eq:XIV. The main result is  that in the case of a nuclear with necessarily purely discrete spectrum every factorisation leads to a separated representation in terms of a series, and vice versa. In case is not nuclear, the representation of a “parametric object” via a linear map is actually more general [38, 35]
and allows to the rigorous and uniform treatment of also “idealised” objects, like for example Gaussian white noise on a Hilbert space.
In this instance of a discrete spectrum and a nuclear and hence nuclear , the abstract equation can be written in a more familiar form in the case when the inner product on is given by a measure on . It becomes for all :
This shows that is really a Fredholm integral operator, and its spectral decomposition is nothing but the familiar theorem of Mercer  for the kernel
where the “factors” are measurable functions on the measure space