An important and routine task in scientific research is to determine parameters in a given mathematical model. Whether the model are built from direct empirical observations, or constructed from an underlying dynamic that is much more comprehensive, model error is inevitable in general. Even in the context of first-principle approaches, e.g., quantum-mechanical descriptions [3, 4], they are rarely implemented in its full form. Instead, various reductions are introduced so that the models fit into practical computations, e.g., tight-binding methods  and pseudo-potentials , which necessarily introduce model error, and the parameters have to be determined in the presence of the model error.
We have in mind a class of stochastic models, whose parameters can be classified as follows. There is a subspace of parameters directly responsible to the equilibrium statistics. This subspace of the parameters can be determined by many standard statistical methods. Once this subspace is determined, the full model parameters can be estimated by fitting the quotient parameter space to the non-equilibrium properties of the dynamics. In this paper, we consider fitting the dynamical behavior through an impulse/response approach by ‘poking’ the system with a small external force. This approach is known as the linear response in statistical physics as the first step to study non-equilibrium phenomena. A hallmark in linear response theory is the Fluctuation-Dissipation Theory (FDT). Basically, it states that for a dynamical system at equilibrium, the full response to small external perturbations can be characterized by a first-order linear response through only the information of appropriate time correlation functions of theunperturbed dynamical system. Motivated by the accessibility of FDT, we have developed a parameter estimation approach  as well as an efficient numerical scheme  under the perfect model assumption. By inferring the estimates from the linear response statistics, we are able to reproduce both the equilibrium statistics and the response properties under small perturbations.
This paper focuses primarily on the response-based method in the presence of model error. In general, the model error may be a result of an empirical assumption, truncation, coarse-graining, multiscale expansion, discounting the memory effect, etc. In practice, the highly desirable parameter estimation methods are those that are not aware of these effects. This is particularly relevant when the only available information is a data set of some observables of interest, which poses several new challenges for the parameter estimation methods. First, the external forcing, when applied to the underlying dynamics and then carried over to the imperfect model, may depend on variables other than the observed ones. We clarify this issue by introducing a concept called admissible external forcing. Another emerging issue is that the linear response involves a variable conjugate to the external force  (also interpreted as a flux variable), which, in general, is only accessible when the statistics of the full model is known. We circumvent this difficulty by introducing the marginal linear response statistics that are practically computable given the available data and parametric/nonparametric modeling.
These ideas are best explained with concrete examples. Our first example is motivated by an important problem in computational chemistry, i.e., coarse-graining (CG) molecular dynamics (MD). The needs for CGMD is driven by the observation that full molecular models are limited by the small intrinsic time scale and it is difficult for direct simulations to reach the time scale of interest [9, 10, 11], where important biological processes occur . CGMD circumvents this problem by projecting the dynamics to the CG variables, a much smaller set, often defined by the local averages of the atomic coordinates and momentum [13, 14, 15]
. The interactions among the CG variables are represented via the free energy (a.k.a. the potential of mean force (PMF)), which in turn provides an explicit form of the probability density function. Statistical methods have played a central role in determining the parameters in the PMF. Within our discussions, these parameters will be called equilibrium parameters.
Meanwhile, there are parameters in the CG dynamics that do not appear in the PMF, a well known example of which is the damping coefficient in the Langevin equation, and more generally, the friction kernel in the generalized Langevin equations. In the latter case, parameters have to be determined by following the statistical properties of the trajectories of the CG variables. Various techniques have been considered, including the Bayesian approach , relative entropy 
, as well as the Padé-type of algorithms that match the time correlation properties[19, 20, 21]. The application of our response-based approach for CGMD models has been motivated by the steered MD simulations [22, 23], where forces are applied to proteins to facilitate the folding and unfolding processes. With a one-dimensional chain model as a simple test problem, we demonstrate how the response-based approach is formulated and implemented to determine the damping coefficient, which is assumed to be a more general band matrix.
Another important class of problems is PDEs with solutions that exhibit chaotic behavior. One well known type of examples is turbulence. These dynamics are often projected to Fourier modes to study the energy transfer among different modes. From a reduced-modeling viewpoint, an outstanding challenge is the model closure that retainis a finite number of modes. Part of the challenge stems from the fact that the equilibrium statistics of the full model is high-dimensional and unknown. We will consider the Kuramoto-Sivashinsky (KS) equation as such an example, and demonstrate how to circumvent this difficulty with a semi-parametric method using the kernel mean embedding (KME) method.
In principle, the linear response theory is a prediction tool by itself, and the imperfect models would seem to be of little value in this case. However, as we previously alluded to, the conjugate variables might not be computable. The semi-parametric approach addresses precisely this issue. In addition, we demonstrate with examples that our approach yields models that also predict the full response with good accuracy.
The paper is organized as follows: We start with a short review of the linear response theory and the parameter estimation approach based on linear response in Section 2. To motivate a general parameter estimation scheme in the presence of model error, a linear fast-slow model will be presented in Section 3 for which explicit calculations can be carried out. In Section 4, we elaborate the general difficulties and outline a parameter estimation scheme to address those issues. The numerical scheme will be followed by two examples: a 1-D chain model (Section 5) and the KS equation (Section 6). We will close by a short summary and discussions in Section 7.
2 The parameter estimation method via linear response statistics
The FDT is a mathematical framework for quantifying the linear response of a dynamical system subjected to small external forcing.  The linear response statistics, determined based on two-point equilibrium statistics of the unperturbed dynamics, provide estimates for the non-equilibrium properties. In statistical mechanics literature, FDT is known as the linear response approach, which is the foundation for defining transport coefficients, e.g., viscosity, diffusion constant, heat conductivity etc.
We begin by reviewing the linear response theory and the concept of the essential statistics, which is a core component of the response-based estimation method . This will be presented in the context of a stochastic dynamics, expressed in the general form of an -dimensional (time-homogeneous) stochastic differential equations (SDEs), also known as the Itô diffusion . The SDEs, together with its perturbed dynamics, are written as follows,
respectively, where and are standard Wiener processes. In the unperturbed system (1
), the vector fielddenotes the drift and
is the diffusion tensor; while in the perturbed system (2), an order- () external perturbation is introduced, in the form of . In both equations, denotes the model parameters with parameter domain .
Throughout this section, which concerns with the perfect model case, our goal is estimating the underlying true parameter value based on the time series of the unperturbed dynamics (1) at the equilibrium. In the case of imperfect model, which is the central problem concerned in this paper, there are no true parameter values since the parameters in the imperfect model are not consistent with the parameters in the true model. Before we introduce the concept of the response statistics, we pose the following assumptions for all in :
The system governed by Eq. (1) is ergodic with equilibrium density , that is, is the unique steady-state solution of the corresponding Fokker-Planck equation. In particular, we denote .
The statistical properties associated with Eq. (2) can be characterized by a perturbed density , which follows the corresponding Fokker-Planck equation under initial condition .
There is an abuse of notation since for certain SDEs, can depend on a subspace of the parameter. We use the notation to emphasis that the corresponding statistics are computed from the sample generated by the model at parameter value at equilibrium.
For any integrable observable , one can define the difference
as the full response statistics. The linear response theory allows us to estimate the order- term of (3) by a convolution integral, that is,
The FDT formulates the linear response operator, in (4), as the following two-point statistics:
where and denote the components of and , respectively. The variable will be called the conjugate variable to .
The significance of FDT is that the response operator is defined without involving the perturbed density . Rather, it can be evaluated at equilibrium of the unperturbed dynamics. To turn this into a practical tool, we first notice that for a given , the value of can be computed using a Monte-Carlo sum based on the time series of the unperturbed system (1) at . For example, let be the time series generated at with step length , then for , the Monte-Carlo approximation can be written as
In practice, the computation of (6) can be done more efficiently using the block averaging algorithm . Meanwhile, the same response statistics can be computed for any by solving (1) over the corresponding parameter value. In particular,
can still be approximated by a Monte-Carlo sum (6) using the time series generated at .
Motivated by the accessibility of as well as the finite-dimensional Padé approximation introduced in , we have proposed to infer the true parameter value from a discretization of , , named as essential statistics[1, 2]. The estimate solves the following nonlinear least-squares problem:
To ensure the solvability of (7), we assume that the total number of involved essential statistics is always strictly greater than the dimension of , that is, .
The accessibility of also allows us to use its time derivatives as the essential statistics to infer . For example, under specific external forcing and observable , the first-order time derivative of at of a Langevin dynamics model provides direct estimates for the damping coefficients. We will return to this in Section 5. But it is worthwhile to mention that estimating higher order derivatives of requires time series with sufficient accuracy, which is not necessarily available, especially when the data are from experimental observations.
It is important to point out that, in general, the quantity in (5) depends on the unknown parameter , which provides difficulties in computing given only a time series of at . Qi and Majda addressed this issue by computing the ‘kicked response’ based on direct simulation of the underlying dynamics [27, 28, 29]. However, since the underlying dynamics is unknown in our scenario, we will consider both parametric and nonparametric estimators for a marginal invariant density defined on the resolved (and identifiable unresolved) variables, in place of the high-dimensional unknown equilibrium density , in Sections 5 and 6, respectively.
One practical challenge in solving (7) is that, except for very special cases, there is no explicit expressions for . Therefore an iterative method, e.g., the Gauss-Newton method, would necessarily require solving the model (1) repeatedly in a sequential manner, which can be rather computationally demanding. This issue has been mitigated by a polynomial based surrogate model, where the cost functions in (7) are approximated by linear combinations of orthogonal polynomials, denoted as , based on the precomputed values over a set of grid points , and such pre-computation can be proceed in parallel. Here, stands for the order of the polynomials involved. In the subsequently approximations, we will replace in (7) by , and formulate a new least-squares problem for the order- polynomial based surrogate model.
So far, we have reviewed a parameter estimation approach[1, 2] for Itô diffusions (1) as well as an efficient numerical scheme under the perfect model assumption. By inferring from the linear response statistics, our estimates are able to reproduce both the equilibrium statistics and the response under certain perturbation. A much more difficult problem, however, is when the available model is a truncated dynamics or only partially known. In this case, several interesting issues will emerge, which will be investigated in the remaining part of the paper. In the next section, we start with a simple two-scale system to demonstrate the importance of the response statistics in determining the model parameters.
3 A linear fast-slow model for insight
Consider the following 2-D linear fast-slow dynamics as the underlying model
where and are the slow and the fast variables, respectively. Here, and are two independent Wiener processes and characterizes the time-scale gap. Such multi-scale Ornstein-Uhlenbeck process has been used as a linear test model in a variety of situations . For the sake of non-degeneracy and ergodicity, we assume that and
which are enough to ensure the drift coefficients
to be negative definite as well as the existence of the averaged (effective) dynamics 
We will keep the form of (10) by posing
as our imperfect model of the fast-slow dynamics with as the unknown parameters to be determined. Applying the parameter estimation scheme reviewed in Section 2 to by taking observable and a constant external forcing , we find that (see the proof of Theorem 1 in our previous work for computational details)
is the equilibrium variance of. In contrast, is the variance that corresponds to the ‘true parameter’ value , when the underlying dynamics is exactly of the form (11) with . Fitting the essential statistics () leads to
In sharp contrast to the perfect model situation, in our case, the partial observations are from an underlying 2-D dynamics (8)-(9) while the essential statistics in (12) are derived based on an 1-D imperfect model (11). Thus, the left-hand side of (12), legitimately, is not available, and will be replaced by the two-point statistics of the slow variable in (8). Eq. (12) now becomes,
where the covariance matrix is determined by the stationary Lyapunov equation 
One can interpret Eq. (13) as the consistency between the two time auto-covariance functions of the slow variable and the variable in the imperfect model. Obviously, seeking an exact fit for (13) is impossible since its left-hand side has a total of degrees of freedom ( and ), which is much greater than that of the right-hand side. However, taking advantage of the multi-scale structure of the underlying dynamics, we can still find estimates for when such that (13) is fulfilled up to an error of certain order. To see this, we first simplify the matrix exponential in (13
) by computing the eigenvalues of,
where the is the same as in Eq. (10). Assuming that the two eigenvalues are real, via a direct asymptotic expansions with respect to , one can show that
Notice that due to our assumption, and we are able to truncate the term containing as long as is large enough. Thus, up to a negligible error (13) becomes
which can be achieved by taking
as our estimates. The same estimates have been obtained based on the Riccati equation.
Compared with the averaged dynamics (10), our estimates (15), based on matching the linear response statistics, exhaust the capability of the imperfect model (11) in the sense that the two-point statistics () of the slow variable can be reproduced up to any order of as . As a test, we set , , and in (8) and (9). Figure 1 shows the absolute error in reproducing the two-point statistics under two different choices of , in comparison with various estimation methods of the imperfect model. For small , the estimate in (15) (which we called order-) produces the optimal result. However, when is relatively large, we are no longer allowed to truncate the term in the simplification of (13), which in turn invalidates the estimate in (15) as shown in the second panel of Figure 1. However, the solution of (13) in the least-squares sense is accurate and this justifies the importance of solving the nonlinear least-squares problem in (7) for parameter estimation in the presence of model error.
To summarize, we have elucidated the importance of the proposed parameter estimation scheme in the presence of model error on an idealistic problem. In particular, we have showed that the linear response-based parameter estimation scheme produces accurate linear mean response statistics, outperforming the conventional method, the averaging theory, even in a regime when such theory is valid.
4 Parameter estimation in the presence of model error
Inspired by the result in Section 3, we consider extending the response-based parameter estimation scheme in the presence of model error arises from coarse-graining or missing dynamics. In current section, we provide a formal discussion on the key issues in such a difficult scenario and propose a strategy to address them.
To elaborate the general ideas, let us use the abstract notations
to denote the underlying full dynamics with partial observation of the variable of interest, . Here, denotes a projection operator to the observables that is not necessarily known. We assume that is unknown, instead, the available imperfect model is given as follows,
Here, we have included an additional variable as possible auxiliary variables and are the unknown model parameters. Our goal is to estimate the parameter in (17) using the time series of observed from (16) such that the resulting model in (17) can predict the full nonlinear response of the underlying dynamics (16) under admissible external forcing, as explained below. While the choice of model in (17) is central to the accuracy of the prediction, our focus is not on the specification of this model. Instead, we assume that the model in (17) is given and our focus, again, is to generalize the proposed parameter estimation scheme reviewed in Section 2 on this scenario.
In this case, the linear response operator (5) associated with the underlying dynamics (16) is no longer accessible since, in general, it is a two-point statistic that can be computed only if the full data set of is available. Thus, to infer the parameter in (17) from the linear response statistics of the underlying dynamics (16), we need to answer the following two questions:
How to find an alternative for the essential statistics in (7) which is computable given only the time series of ?
To address these issues, we introduce the concepts of (i) admissible external forcing, and (ii) marginal linear response (MLR). Using the notations in (16), similar to (1) and (2), the perturbed underlying dynamics can be written in the form
where denotes the Jacobian matrix of . Thus, the term can be interpreted as the external force on the dynamics of . In the case when the full dynamics (16) is a system of SDEs with an external force in the drift term, a similar observation can be made by using the Itô’s formula. To this end, we define:
An external forcing to the full dynamics in (18) is admissible with respect to the dynamics if can be written as a function of denoted as .
Basically, an admissible external forcing of the underlying dynamics introduces a perturbation to the imperfect model (17),
Intuitively, this condition rules out other external forces that depend on , which is not accessible in our scenario. To simplify the discussion, let us define as the components of the parameters that directly appear in the equilibrium density of the imperfect model (17); they usually can be estimated directly by matching equilibrium statistics. For example, in the linear problem in (11), while the model parameters consist of the damping coefficient and noise amplitude , the parameter that appears in the equilibrium density, , is , which can be directly obtained by matching the equilibrium variance.
where the expectation over will be realized through Monte-Carlo averages with respect to the solutions of the unperturbed imperfect model in (17) for parameter that satisfies the constrained of .
Let be the marginal invariant density of , which can be estimated from the given data of at the equilibrium. We define the MLR operator,
We call the marginal essential statistics. Furthermore, we define,
as the MLR statistics.
The two-point statistics in (21), defined with respect to the equilibrium distribution of the underlying dynamics (16), can be approximated via Monte-Carlo, like (6), based on the available time series of . We require the external forcing to be admissible; otherwise, in (21) would depend on and the marginal essential statistics are not accessible when the data of is not available. In practice, as we will show in the next two sections, the marginal density can be learned from the available time series of by using standard statistical methods. This way, the two-point statistics in (21) is computable as opposed to (6).
where is the marginal essential statistics (21) and is the linear response operator (20) of the imperfect model (17). This is a dynamic constrained nonlinear least-squares problem through the imperfect model in (17). We should point out the minimization is over a subspace of the parameter set, characterized by the constraint, , where is the estimate for the equilibrium parameters, , obtained from direct matching to the available equilibrium statistics.
We validate the estimate by comparing the nonlinear response of the imperfect model at the estimated parameters with that of the underlying model under admissible external forcing. These statistics are computed via Monte-Carlo averages over solutions of the perturbed underlying dynamics (18) and perturbed imperfect model (19), where the former is only possible when the full underlying dynamics is known. See Appendix A for the detailed computational strategy to compute these statistics for stochastic model.
While the choice of imperfect model in (17) depends on the specific applications, in Sections 5 and 6, we will consider Langevin dynamics, regardless of whether the underlying dynamics are Langevin or not. In the molecular dynamics example (Section 5), the imperfect Langevin model, as a result of coarse-graining, is a closed equation of , with no dependent on any auxilliary variables . In this case, the associated equilibrium density has a natural parametric form, , given by the Gibbs measure. Thus, we simply use the direct estimates together with maximum entropy principle to estimate .
In the PDE example (Section 6), choosing Langevin dynamic as the imperfect model is motivated largely by the separable structure of its equilibrium distribution, , where denotes the (possibly) non-Gaussian density of the variable of interest, while denotes the Gaussian auxiliary components. In this case, can be estimated directly from samples of and we will use a nonparametric kernel mean embedding (KME) method to recover from the data of .
For the molecular dynamics example (Section 5), we consider since is the natural equilibrium density obtained through coarse-graining. We should point out that choosing to depend only on in (21) is only sensible (in the linear response theory) if is independent to the orthogonal component, , at equilibrium state, in that . In this case, we have
which suggests that the MLR operator is identical to the linear response operator of the underlying dynamics, . In general, however, the distributions of and are not independent and fitting to the marginal statistics defined through may not be the optimal strategy. In the PDE example (Section 6), the variable of interest is the first Fourier mode, , obtained from a discrete Fourier transform of the PDE solution. In addition to
, obtained from a discrete Fourier transform of the PDE solution. In addition to, we are allowed to have access to the time series of the second Fourier mode , as well as the dynamical equation of the only variable . Given these information, we can extract the identifiable irrelevant component, , by regressing to the dynamical equation for . In this case, we will employ the nonparametric KME formulation to estimate . We will show that the resulting MLR statistics, computed via the formula in (22), which carried some information of through and , is an accurate estimator of the full nonlinear response statistics of the underlying dynamics under admissible external disturbances up to finite time.
5 A one-dimensional molecular model
We consider a molecular dynamics model consisting of a chain of atoms with mass . Let the equilibrium spacing between the particles be , and the displacement of the particles from its equilibrium position be . We take the Lennard-Jones (LJ) potential, expressed as
5.1 The underlying dynamics
Our underlying dynamics, as a finite-dimensional approximation of the LJ lattice model, contains a total of particles equipped with periodic boundary conditions. Let and be the displacement and the velocities of the particles, respectively. The potential energy of the finite system, as a function of the relative displacement , can be written as,
Driven by the potential energy , the frictional, and random forcing, obeys the Langevin equation of motion,
where the mass has been set to unity (), and is an N-dimensional Wiener process; and denote the friction constant and the temperature, respectively. The Langevin dynamics is a means to model canonical ensemble . Also known as the Gibbs measure of (26
), the probability distribution of the relative displacementand the velocity , is of the form
under the constraint induced by the periodic boundary condition.
In the numerical simulation, we set and in (26). A Verlet-type of integration algorithm  is applied with sub-sampled at the equilibrium state with sample size and step length . The temperature is set to be high enough, so that at the equilibrium, the marginal density of is non-Gaussian. Throughout the section, the underlying dynamics (26) will only be used for generating CG observations and computing the full response for verification of the parameter estimates. We do not assume the explicit formula of the Gibbs measure (27) is available.
5.2 The coarse-grained model
We first define CG variables by dividing the system equally to consecutive blocks, each of which contains atoms. The displacement and velocities of the CG particles can be defined as
respectively. Here, denotes the number of CG particles. Formally, the CG Langevin equation of the underlying dynamics (26) can be written as
where , to distinguish from in (26), is an -dimensional Wiener process, the PMF , as a function of the CG relatively displacement , is of the form
with unknown parameters . In other words, the CG particles are assumed to follow a nearest-neighbor interaction driven by a quartic potential.
On the other hand, rather than assuming a scalar damping coefficient like the underlying dynamics (26), we propose a symmetric positive definite (SPD) damping matrix in the reduced Langevin equation (29), while in the diffusion term of (29) satisfies due to the fluctuation-dissipation theorem. In practice, such can be determined by a Cholesky decomposition of . Motivated by the symmetry of the system, the damping matrix is assumed to have a symmetric circulant structure:
which can be fully determined by the first elements in the first row. To guarantee that is positive definite, we pose a diagonal dominant constraint:
There are many other approaches in the model reduction of Langevin dynamics, e.g. the partitioning technique , the Petrov-Galerkin projection , and the maximum likelihood estimator . Here, we consider the problem in the presence of model error, and use the one-dimensional system to demonstrate the parameter estimation procedure.
5.3 Parameter estimation and numerical results
So far, we have introduced our underlying dynamics (26) and the corresponding CG model (29) with unknown parameters (32). In the numerical experiment, the observation of the CG model (29) will be acquired from Eq. (28) based on the time series of generated by the underlying dynamics. For instance, the CG relatively displacement can be determined by
Here, Eq. (33), together with the second equation of (28) correspond to the function in our general setup (16) with the variable of interest . Thus, we are able to obtain the time series of , containing model error, for the imperfect model (29), which will be the only information used in estimating .
Parameters , ) appeared in the equilibrium statistics. Therefore they constitute the parameter set that we discussion in Section 4. To determine these parameters, consider the Gibbs measure of the imperfect model (29) (following the notation in (20)),
which suggests that the temperature can be directly estimated from the sample variance of the CG velocities. In practice, since are independent identical distributed at the equilibrium, we take
For the parameters in the potential (30), we will apply the maximum entropy method[37, 38]. The CG relative displacements, due to the translational symmetry, are assumed to be independent and identically distributed at the equilibrium. In particular, the marginal distribution of with respect to (34) is proposed to be of the form,
. By maximizing the Shannon’s entropy under the moment constraints, we formulate the following maximum entropy estimates for,
where is determined by (35) and the moments on the right hand side of (37) will be approximated using Monte-Carlo based on the available observations of . Thus, such constrained minimization problem (36) and (37) is equivalent to the reverse Monte Carlo method . Figure 2 shows the comparison between and the histogram of , and excellent agreement has been found.
Up to this point, we have estimated in the Gibbs measure (34) of the imperfect model (34). We are going to apply our parameter estimation approach to estimate in the damping matrix . These parameters are responsible for the non-equilibrium property. To satisfy the assumptions posed in Section 4, the external forcing of the underlying dynamics is set to be a constant external forcing, exerted equally on the first block of atoms, and the observable are the velocities of the first two CG particles. As a result, following (19), the corresponding external forcing for the imperfect model satisfies
where is the estimates of in (35). Recall that, as we have mentioned in Remark 4.5, we set in computing the marginal essential statistics (21). Interested readers are referred to  for the computational details. Droping in (39), we formulate the following constrained nonlinear least-squares problem:
Recall that both and will be approximated by Monte-Carlo of the form (6). The only difference is that is based on the available CG observations, while