Model reduction of controlled dynamical systems has been a long standing, and as yet, unsettled challenge in control theory. The benefits are clear: a low dimensional approximation of a high dimensional system can be manipulated with a simpler controller, and can be simulated at lower computational cost. A complex, high dimensional system may even be replaced by a simpler model all together leading to significant cost savings, as in circuit design, while the “important variables” of a system might shed light on underlying physical or biological processes. Reduction of linear dynamical systems has been treated with some success to date. As we describe in more detail below, model reduction in the linear case proceeds by reducing the dimension of the system with an eye towards preserving its essential input-output behavior, a notion directly related to “balancing” observability and controllability of the system. The nonlinear picture, however, is considerably more involved.
In this paper we propose a scheme for balanced model-order reduction of general, nonlinear control systems. A key, and to our knowledge, novel point of departure from the literature on nonlinear model reduction is that our approach marries approximation and dimensionality reduction methods known to the machine learning and statistics communities with existing ideas in linear and nonlinear control. In particular, we apply a method similar to kernel PCA as well as function learning in Reproducing Kernel Hilbert Spaces (RKHS) to the problem of balanced model reduction. Working in RKHS provides a convenient, general functional-analytical framework for theoretical understanding as well as a ready source of existing results and error estimates. The approach presented here is also strongly empirical, in that observability and controllability, and in some cases the dynamics of the nonlinear system are estimated from simulated or measured trajectories. This emphasis on the empirical makes our approach broadly applicable, as the method can be applied without having to tailor anything to the particular form of the dynamics.
The approach we propose begins by constructing empirical estimates of the observability and controllability Gramians in a high (or possibly infinite) dimensional feature space. The Gramians are simultaneously diagonalized in order to identify directions which, in the feature space, are both the most observable and the most controllable. The assumption that a nonlinear system behaves linearly when lifted to a feature space is far more reasonable than assuming linearity in the original space, and then carrying out the linear theory hoping for the best. Working in the high dimensional feature space allows one to perform linear operations on a representation of the system’s state and output which can capture strong nonlinearities. Therefore a system which is not model reducible using existing methods, may become reducible when mapped into such a nonlinear feature space. This situation closely parallels the problem of linear separability in data classification: A dataset which is not linearly separable might be easily separated when mapped into a nonlinear feature space. The decision boundary is linear in this feature space, but is nonlinear in the original data space.
Nonlinear reduction of the state space already opens the door to the design of simpler controllers, but is only half of the picture. One would also like to be able to write a closed, reduced dynamical system whose input-output behavior closely captures that of the original system. This problem is the focus of the second half of our paper, where we again exploit helpful properties of RKHS in order to provide such a closed system.
The paper is organized as follows. In the next section we provide the relevant background for model reduction and balancing. We then adapt and extend balancing techniques described in the background to the current RKHS setting in Section III. Section IV then proposes a method for determining a closed, reduced nonlinear control system in light of the reduction map described in Section III. Finally, Section V provides experiments illustrating an application of the proposed methods to a specific nonlinear system.
Several approaches have been proposed for the reduction of linear control systems in view of control, but few exist for finite or infinite-dimensional controlled nonlinear dynamical systems. For linear systems the pioneering “Input- Output balancing” approach proposed by B.C. Moore observes that the important states are the ones that are both easy to reach and that generate a lot of energy at the output. If a large amount of energy is required to reach a certain state but the same state yields a small output energy, the state is unimportant for the input-output behavior of the system. The goal is then to find the states that are both the most controllable and the most observable. One way to determine such states is to find a change of coordinates where the controllability and observability Gramians (which can be viewed as a measure of the controllability and the observability of the system) are equal and diagonal. States that are difficult to reach and that don’t significantly affect the output are then ignored or truncated. A system expressed in the coordinates where each state is equally controllable and observable is called its balanced realization.
A proposal for generalizing this approach to nonlinear control systems was advanced by J. Scherpen , where suitably defined controllability and observability energy functions reduce to Gramians in the linear case. In general, to find the balanced realization of a system one needs to solve a set of Hamilton-Jacobi and Lyapunov equations (as we will discuss below). Moore 
proposed an alternative, data-based approach for balancing in the linear case. This method uses samples of the impulse response of a linear system to construct empirical controllability and observability Gramians which are then balanced and truncated using Principal Components Analysis (PCA, or POD). This data-driven strategy was then extended to nonlinear control systems with a stable linear approximation by Lall et al., by effectively applying Moore’s method to a nonlinear system by way of the Galerkin projection. Despite the fact that the balancing theory underpinning their approach assumes a linear system, Lall and colleagues were able to effectively reduce some nonlinear systems.
Phillips  et al. has also studied reduction of nonlinear circuit models in the case of linear but unbalanced coordinate transformations and found that approximation using a polynomial RKHS could afford computational advantages. Gray and Verriest mention in  that studying algebraically defined Gramian operators in RKHS may provide advantageous approximation properties, though the idea is not further explored. Finally, Coifman et al. 
discuss reduction of an uncontrolled stochastic Langevin system. There, eigenfunctions of a combinatorial Laplacian, built from samples of trajectories, provide a set of reduction coordinates but does not provide a reduced system. This method is related to kernel principal components (KPCA) using a Gaussian kernel, however reduction in this study is carried out on a simplified linear system outside the context of control.
Ii-a Balancing of Linear Systems
Consider a linear control system
where is controllable, is observable and is Hurwitz. We define the controllability and the observability Gramians as, respectively,
These two matrices can be viewed as a measure of the controllability and the observability of the system . For instance, consider the past energy , , defined as the minimal energy required to reach from in infinite time
and the future energy , , defined as the output energy generated by releasing the system from its initial state , and zero input for , i.e.
for and . In the linear case, it can be shown that and The columns of span the controllable subspace while the nullspace of coincides with the unobservable subspace. As such, and (or their estimates) are the key ingredients in many model reduction techniques. It is also well known that and satisfy the Lyapunov equations 
The idea behind balancing is to find a representation where the system’s observable and controllable subspaces are aligned so that reduction, if possible, consists of eliminating uncontrollable states which are also the least observable. More formally, we would like to find a new coordinate system such that
where . If is controllable and is observable, then there exists a transformation such that the state space expressed in the transformed coordinates is balanced and
. Typically one looks for a gap in the singular valuesfor guidance as to where truncation should occur. If we see that there is a such that , then the states most responsible for governing the input-output relationship of the system are while are assumed to make negligible contributions.
Ii-B Balancing of Nonlinear Systems
In the nonlinear case, the energy functions and in (II-A) and (II-A) are obtained by solving both a Lyapunov and a Hamilton-Jacobi equation. Here we follow the development of Scherpen . Consider the nonlinear system
with , , , , for , and . Moreover, assume the following Hypothesis.
Hypothesis H: The linearization of (II-B) around the origin is controllable, observable and is asymptotically stable.
 If the origin is an asymptotically stable equilibrium of on a neighborhood of the origin, then for all , is the unique smooth solution of
under the assumption that (II.1) has a smooth solution on . Furthermore for all , is the unique smooth solution of
under the assumption that (II.1) has a smooth solution on and that the origin is an asymptotically stable equilibrium of on .
With the controllability and the observability functions on hand, the input-normal/output-diagonal realization of system (II-B) can be computed by way of a coordinate transformation. More precisely,
Analogous to the linear case, the system’s states can be sorted in order of importance by sorting the singular value functions, and reduction proceeds by removing the least important states.
In the above framework for balancing of nonlinear systems, one needs to solve (or numerically evaluate) the PDEs (II.1), (II.1) and compute the coordinate change , however there are no systematic methods or tools for solving these problems. Various approximate solutions based on Taylor series expansions have been proposed [9, 8, 5]. Newman  introduces a statistical approximation based on exciting the system with white Gaussian noise and then computing the balancing transformation using an algorithm from differential topology. As mentioned earlier, an essentially linear empirical approach was proposed in . In this paper, we combine aspects of both data-driven approaches and analytic approaches by carrying out balancing in a suitable RKHS.
Iii Empirical Balancing of Nonlinear Systems in RKHS
We consider a general nonlinear system of the form
with , , , , and . Let be the reachable set from the initial condition . We assume that the system is zero-state observable, and that the linearization of (III) around the origin is controllable. We also assume that the origin of is asymptotically stable.
We treat the problem of estimating the observability and controllability Gramians as one of estimating an integral operator from data in a reproducing kernel Hilbert space (RKHS) . Our approach hinges on the key modeling assumption that the nonlinear dynamical system is linear in an appropriate high (or possibly infinite) dimensional lifted feature space. Covariance operators in this feature space and their empirical estimates are the objects of primary importance and contain the information needed to perform model reduction. In particular, the (linear) observability and controllability Gramians are estimated and diagonalized in the feature space, but capture nonlinearities in the original state space. The reduction approach we propose adapts ideas from kernel PCA (KPCA)  and is driven by a set of simulated or sampled system trajectories, extending and generalizing the work of Moore  and Lall et al. .
In the development below we lift state vectors of the system into a reproducing kernel Hilbert space, , endowed with a symmetric positive definite kernel function which we assume here to be continuous and bounded by . In particular, we make use of the following important properties: For all , , where . This is the reproducing property. Second, to any RKHS we can associate a feature map satisfying . For example, we can take in which case – the “feature space” is the RKHS. We will further assume that is always separable.
Iii-B Empirical Gramians
Following , we estimate the controllability Gramian by exciting each coordinate of the input with impulses while setting . One can also further excite using rotations of impulses as suggested in , however for simplicity we consider only the original signals proposed in . Let be the -th excitation signal, and let be the corresponding response of the system. Form the matrix , so that is seen as a data matrix with column observations given by the respective responses . Then is given by
We can approximate this integral by sampling the matrix function within a finite time interval assuming the regular partition . This leads to the empirical controllability Gramian
As described in , the observability Gramian is estimated by fixing , setting for , and measuring the corresponding system output responses . As before, assemble the responses into a matrix . The observability Gramian and its empirical counterpart are given by
where . The matrix can be thought of as a data matrix with column observations
so that corresponds to the response at time of the single output coordinate to each of the (separate) initial conditions . This convention will lead to greater clarity in the steps that follows.
Iii-C Kernel PCA
Kernel PCA  generalizes linear PCA by carrying out PCA in a high dimensional feature space defined by a feature map . Taking the feature map and given the set of data , we can consider PCA in the feature space by simply working with the covariance of the mapped vectors,
denotes the tensor product between two vectors in. We will assume the data are centered in the feature space so that . If not, data may be centered according to the prescription in . The principal subspaces are computed by diagonalizing , however as is shown in , one can equivalently form the matrix of kernel products for , and solve the eigenproblem . If , then we have that where
, and the non-zero eigenvalues ofand
coincide. The eigenvectorsof are then normalized so that the eigenvectors of have unit norm in the feature space, leading to the condition . Assuming this normalization convention, sort the eigenvectors according to the magnitudes of the corresponding eigenvalues in descending order, and form the matrix . Similarly, form the matrix of sorted eigenvectors of . The first principal components of a vector in the feature space are then given by . It can be shown however (see ) that principal components in the feature space can be computed in the original space with kernels using the map , where .
Iii-D Model Order Reduction Map
The method we propose consists, in essence, of collecting samples and then performing a process similar to “simultaneous principal components analysis” on the controllability and observability Gramian estimates in the (same) RKHS. As mentioned above, given a choice of the kernel defining a RKHS , principal components in the feature space can be computed implicitly in the original input space using . Because we will find non-orthogonal coordinates in the feature space in which the Gramians become simultaneously diagonal, the process is not strictly speaking PCA, and the favorable properties associated with an orthonormal basis are no longer available. We will, however, continue to refer to the process of diagonalizing a covariance matrix as (K)PCA.
Turning to the controllability Gramian (the case of the observability Gramian is analogous), first note that can be viewed as the sample covariance of a collection of vectors, scaled by :
Thus we can form the controllability kernel matrix of kernel products for in order to carry out PCA in the feature space, where we have re-indexed the set of vectors to use a single linear index. Similarly, we can compute the observability kernel matrix consisting of the pairwise kernel products of the collection of data vectors described in (1). Ordinarily, and will be rank deficient.
We assume here for simplicity that the number of input excitation signals is equal to the dimension of the output so that the number of samples taken from the output and state trajectories can be the same, leading to kernel matrices and of the same size. If one adopts the set of input excitations as above, then an alternative although more restrictive assumption can be that the number of inputs to the system is equal to the number of outputs. Then the pair is simultaneously diagonalized by taking the (reduced) SVD of so that . Conjugation by diagonalizes and conjugation by diagonalizes , where denotes the pseudoinverse of . Finally, the order of the model is reduced by discarding small eigenvalues , and projecting onto the subspace associated with the first largest eigenvalues. This leads to the state-space reduction map given by
Iv Closed Dynamics of the Reduced System
Given the nonlinear state space reduction map , a remaining challenge is to construct a corresponding (reduced) dynamical system on the reduced state space which well approximates the input-output behavior of the original system on the original state space. Setting
and applying the chain rule,
However we are faced with the difficulty that the map is not in general injective (even if ), and moreover one cannot guarantee that an arbitrary point in the RKHS has a non-empty preimage under . We propose an approximation scheme to get around this difficulty: The dynamics will be approximated by an element of an RKHS defined on the reduced state space. When is assumed to be known explicitly it can be approximated to a high degree of accuracy. An approximate, least-squares notion of “
” will be given to first or second order via a Taylor series expansion, but only where it is strictly needed – and at the last possible moment – so that a first or second order approximation will not be as crude as one might suppose. We will also consider, as an alternative, a direct approximation ofwhich takes into account further properties of the reproducing kernel as well as the fact that the Jacobian is to be evaluated at in particular. In both cases, the important ability of the map to capture strong nonlinearities will not be significantly diminished.
Iv-a Representation of the dynamics in RKHS
The vector-valued map can be approximated by a composing a set of regression functions (one for each coordinate) in an RKHS, with the reduction map . It is reasonable to expect that this approximation will be better than directly computing using, for instance, a Taylor expansion notion of “”, which may ignore important nonlinearities at a stage where crude approximations must be avoided.
Let denote a reduced state variable, and concatenate the input examples so that , and is a set of input-output training pairs describing the -th coordinate of the map . The training examples should characterize “typical” behaviors of the system, and can even re-use those trajectories simulated in response to impulses for estimating the Gramians above. We will seek the function which minimizes
here is a regularization parameter. We have chosen the square loss, however other suitable loss functions may be used. It can be shown that in this case takes the form , where defines the RKHS (and is unrelated to used to estimate the Gramians). Note that although our notation takes the RKHS for each coordinate function to be the same, in general this need not be true: different kernels may be chosen for each function. Here the comprise a set of coefficients learned using the regularized least squares (RLS) algorithm. The kernel family and any hyper-parameters can be chosen by cross-validation. For notational convenience we will further define the vector-valued empirical feature map
for . In this notation where .
A broad class of systems seen in the literature  are also characterized by separable dynamics of the form . In this case one need only estimate the functions and from examples and .
Iv-B Approximation of the Jacobian Contribution
We turn to approximating the component appearing in Equation (5).
Iv-B1 Inverse-Taylor Expansion
A simple solution is to compute a low-order Taylor expansion of and then invert it using the Moore-Penrose pseudoinverse to obtain the approximation. For example, consider the first order expansion . Then we can approximate (in the first-order, least-norm sense) as
We may start with , but periodically update the expansion in different regions of the dynamics if desired. A good expansion point could be the estimated preimage of returned by the algorithm proposed in .
Iv-B2 Exploiting Kernel Properties
For certain choices of the kernel defining the Gramian feature space , one can exploit the fact that and its derivative bear a special relationship, and potentially improve the estimate for
. Perhaps the most commonly used off-the-shelf kernel families are the polynomial and Gaussian families. For any two kernels with hyperparametersand (respectively) in one of these classes, we have that . We’ll consider the polynomial kernel of degree , in particular; the Gaussian case can be derived using similar reasoning. For a polynomial kernel we have that
Recalling that and , if was invertible then we would have
The map is not injective however, and in addition the fibers of may be potentially empty, so we must settle for an approximation. It is reasonable then to define as the solution to the convex optimization problem
If a point has a pre-image in this definition is consistent with composing with the formal definition and noting that in this case . Furthermore, a trajectory of the closed dynamical system on the reduced statespace need not (and may not) have a counterpart in the original statespace by virtue of the way in which “” is used in our formulation of the reduction map and corresponding reduced dynamical system.
One will recognize that the solution to (6) is just the Moore-Penrose pseudoinverse . Inserting this solution into the feature map representation of a kernel gives the following definition for :
Substituting into the derivative for a polynomial kernel gives
which immediately gives an expression for . Note that this approximation is global in the sense that the matrix inverse need only be computed once; no updating is required during simulation of the closed system.
Iv-C Reduced System Dynamics
Given an estimate of in the RKHS and a notion of from above, we can write down a closed dynamical system on the reduced statespace. We have
where is a matrix with the vectors as its rows, and is the Jacobian of the empirical feature map defined in Equation (4). Here the expression should be interpreted as notation for either of the Jacobian approximations suggested in Section IV-B.
Equation (IV-C) is seen to give a closed nonlinear control system expressed solely in terms of the reduced variable :
where the map modeling the output function is estimated as described immediately below. Although the “true” reduced system does not actually exist due to non-injectivity of the feature map , in many situations one can expect that the above system will capture the essential input-output behavior of the original system. We leave a precise analysis of the error in the approximations appearing in (IV-C) to future work.
Iv-D Outputs of the Reduced System
Analogous to the case of the dynamics , we are faced with two possibilities for approximating . We can apply the Taylor approximation , or as in Section IV-A we can estimate a map from the reduced state space to the output space directly, using RKHS methods. Given samples , each coordinate function is given in the familiar form , where is the kernel chosen to define the RKHS, and may be different for each coordinate. It should be noted that just given the state space reduction map , one can immediately compare the output of the system defined by to the original system without defining a closed dynamics as above. In fact with and one can design a simpler controller which takes as input the reduced state variable , but controls the original system.
We demonstrate an application of our method to a 7-dimensional nonlinear system with one dimensional input and output appearing in  (Example 3.2, pg. 54):
Impulse and initial-condition responses of the system were simulated as described above, and 800 samples equally spaced in the time interval were sampled to build the kernel matrices and using the third degree polynomial kernel . Recall that these kernel matrices, are the inner product counterparts to the empirical Gramians. Examples of and for this system are shown in Figure 2. We imposed a small amount of regularization when computing the balancing transformation , taking the Cholesky decomposition of instead of . Figure 1 (right pane) shows the Hankel singular values for this problem on a log scale. One can see that perhaps the first two components ought to capture most of the system’s behavior. Thus the reduction map was defined by taking only the eigenvectors (scaled columns of ) corresponding to the largest two Hankel singular values, giving a reduced state space of dimension two.
Next, a map from the reduced variable to was estimated following Section IV-A. The control input was chosen to be a 10hz square wave, and 1000 samples from the simulated system in the interval were mapped down using and then used to solve the RLS regression problems, one for each state variable, again using a third degree polynomial kernel. All initial conditions were set to zero. The desired outputs (dependent variable examples) used to learn were taken to be the true evaluated at the samples from the simulated state trajectory. We also added a bias dimension of 1’s to the data to account for any offset, and used a fast leave-one-out cross-validation (LOOCV) computation  to select the optimal regularization parameter. Two remarks are in order. The above dynamics can in fact be represented explicitly and exactly in a 3rd degree polynomial RKHS; only monomials up to degree 3 appear in the dynamics. Second, the control input is decoupled from the state. Both of these facts can be used to obtain an improved reduced model, however we did not make use of these special properties and instead applied the simplest version of the techniques described above which assume no special structure.
We followed a similar process to learn the output function . Here we used a 10Hz square wave control input, zero initial conditions and 700 samples in the interval . For this function the Gaussian kernel was used to demonstrate that our method does not rely on any particular match between the form of the dynamics and the type of kernel. The scale hyperparameter was chosen to be the average distance between the training examples. We again used LOOCV to select the RLS regularization parameter.
Finally, the closed system was simulated as described above using and a control input different from those used to learn the dynamics and output functions: where denotes the square wave function. This input is shown at the top of Figure 1 (left panel). The Taylor series approximation for was done once about and was not updated further. The simulated outputs of the closed reduced system as well as the output of the original system are plotted at the bottom in Figure 1 (left panel). One can see that, even for a significantly different input, the two dimensional reduced system closely captures the original system. The main source of error is seen to be over- and under-shoot near the square wave transients. This error can be further reduced by simulating the system for different sorts of inputs (and/or frequencies) and including the collected samples in the training sets used to learn and . Indeed, we have had some success driving example systems with random uniform input in some cases.
In this paper we introduced a new model reduction method for nonlinear control systems. The method assumes that the nonlinear system is approximately linear in a high dimensional feature space, and carries out linear balanced truncation in that space. This leads to a nonlinear reduction map, which we suggest can be combined with representations of the dynamics and output functions by elements of an RKHS to give a closed reduced order dynamical system which captures the input-output characteristics of the original system. We then demonstrated an application of our technique to a 7-dimensional system and simulated the original and reduced models for comparison, showing that the approach proposed here can yield good low-order nonlinear reductions of strongly nonlinear control systems. We believe that techniques well known to the machine learning and statistics communities can offer much to control and dynamical systems research, and many further directions remain, including reduction of unstable systems and structure preserving systems.
-  Antoulas, A. C. (2005). Approximation of Large-Scale Dynamical Systems, SIAM Publications.
-  Aronszajn, N. (1950). Theory of Reproducing Kernels. Trans. Amer. Math. Soc., 68:337-404.
-  Coifman, R. R., I. G. Kevrekidis, S. Lafon, M. Maggioni and B. Nadler (2008). Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems, Multiscale Model. Simul., 7(2):842-864.
-  Dullerud, G. E., and F. Paganini (2000). A Course in Robust Control Theory: a Convex Approach, Springer.
-  Fujimoto, K. and D. Tsubakino (2008). Computation of nonlinear balanced realization and model reduction based on Taylor series expansion, Systems and Control Letters, 57, 4, pp. 283-289.
-  Gray, W. S. and E. I. Verriest (2006). Algebraically Defined Gramians for Nonlinear Systems, Proc. of the 45th IEEE CDC.
-  Krener, A. (2006). Model reduction for linear and nonlinear control systems. The Bode Lecture in the 45th IEEE CDC.
-  Krener, A. J. (2007). The Important State Coordinates of a Nonlinear System. In “Advances in control theory and applications”, C. Bonivento, A. Isidori, L. Marconi, C. Rossi, editors, pages 161-170. Springer.
-  Krener, A. J. (2008). Reduced order modeling of nonlinear control systems. In “Analysis and Design of Nonlinear Control Systems”, A. Astolfi and L. Marconi, editors, pages 41-62. Springer.
-  Kwok, J. T. and I.W. Tsang (2003). “The Pre-Image Problem in Kernel Methods”. In Proceedings of the Twentieth International Conference on Machine Learning (ICML).
-  Lall, S., J. Marsden, and S. Glavaski (2002). A subspace approach to balanced truncation for model reduction of nonlinear control systems, International Journal on Robust and Nonlinear Control, 12, 5, pp. 519-535.
-  Laub, A.J. (1980). On Computing “balancing” transformations, Proc. of the 1980 Joint Automatic Control Conference (ACC).
-  Li, J.-R. (2000). Model Reduction of Large Linear Systems via Low Rank System Grammians. Ph.D. thesis, Massachusetts Institute of Technology.
-  Mika, S., B. Schölkopf, A. Smola, K. R. Müller, M. Scholz, and G. Rätsch (1998). Kernel PCA and de-noising in feature spaces, In Proc. Advances in Neural Information Processing Systems (NIPS) 11, pp. 536–542, MIT Press.
-  Moore, B. (1981). Principal Component Analysis in Linear Systems: Controllability, Observability, and Model Reduction, IEEE Tran. Automat. Control, 26, 1, pp. 17-32.
-  Newman, A.J., and P. S. Krishnaprasad (2000). Computing balanced realizations for nonlinear systems, Proc. of the Math. Theory of Networks and Systems (MTNS).
-  Nilsson, O. (2009). On Modeling and Nonlinear Model Reduction in Automotive Systems, Ph.D. thesis, Lund University.
-  Phillips, J., J. Afonso, A. Oliveira and L. M. Silveira (2003). Analog Macromodeling using Kernel Methods. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design.
-  Rifkin, R., and R.A. Lippert. Notes on Regularized Least-Squares, CBCL Paper 268/AI Technical Report 2007-019, Massachusetts Institute of Technology, Cambridge, MA, May, 2007.
-  Scherpen, J. (1994). Balancing for Nonlinear Systems, Ph.D. thesis, University of Twente.
-  Schölkopf, B., Smola, A., and Müller, K (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319.
-  Wahba, G. (1990). Spline Models for Observational Data, SIAM CBMS-NSF Regional Conference Series in Applied Mathematics 59.