1. Introduction
Nonlinear dynamic models are widely used for characterizing functional forms of processes that govern complex biological pathway systems. Of particular interest in this context are socalled canonical formats, which are very flexible in their possible responses, yet involve a very restricted domain of functional forms. Outside linear systems, the bestknown canonical formats are LotkaVolterra (LV) models (Lotka (1956), Volterra (1926), Peschel & Mende (1986), and May (2001)), which use binomial terms, and powerlaw systems within the framework of Biochemical Systems Theory (BST), which exclusively use products of power functions. BST was originally devised for the analysis of biochemical and gene regulatory systems, but has subsequently found much wider application in various biomedical and other areas (Savageau (1976) and Voit (2013)
). Whereas it is easy to set up an LV or BST model for a complex biological system in a symbolic format, the identification of optimal parameter values continues to be a significant challenge. As a consequence, estimating parameters of systems of ordinary differential equations (ODEs) remains to be an active research area that attracts contributions from a variety of scientific fields (e.g.,
Gennemark & Wedelin (2007), Chou & Voit (2009), Girolami & Calderhead (2011), McGoff et al. (2015), Ramsay & Hooker (2017), and Schittkowski (2002)). Indeed, numerous optimization methods for ODE models have been proposed in recent years, but none works exceptionally well throughout wide ranges of application.The main focus of this paper is not a new estimation method per se
. Instead, we are interested in a more general and highlevel point of view regarding parameter estimation. Specifically, our work addresses parameter estimation for dynamic models whose mathematical form contains significant linear features, that allow a natural separation of parameters and system states. A trivial example is a linear ODE where the vector field
is linear in the parameter , with denoting the derivative of with respect to . As a more interesting example, the ODE vector field may be partially linear in the parameters, as it is the case for socalled Ssystem models in BST.Example 1.
An Ssystem (see Voit (2000)) is defined as
(1) 
Here are positive rate constants, while are realvalued kinetic orders that reflect the strength and directionality of the effect that a variable has on a given influx or efflux. Informally, one can view this system as a regression equation, where the “covariates” are the variables
on the righthand side, whereas the “response” variables are the derivatives
on the lefthand side. Note that the vector field is linear in the rate constants , but nonlinear in the kinetic orders . ∎Estimation methods that exploit separability of parameters and system states in dynamic models have a long history; see, e.g., Himmelblau et al. (1967) for a special case. However, a rigorous statistical analysis of one such method has been achieved only recently (Dattner & Klaassen (2015)). In a classical paper on the inference for dynamic models, Varah (1982) mentions in passing that “one can use the idea of separability or variable projection (see Golub & Pereyra (1973) or Ruhe & Wedin (1980)), in which the linear parameters are implicitly solved for, the resulting (fully) nonlinear least squares problem is solved for the nonlinear parameters, and then the linear parameters are obtained using their representation in terms of the nonlinear parameters. Since this reduces the size of the nonlinear least squares problem to be solved, it is worthwhile.” Somewhat surprisingly, given that parameter estimation for ODEs is commonly thought as a computational bottleneck in modeling dynamic processes, Varah’s suggestion has not been widely followed in practice. In fact, in the vast literature dedicated to parameter fitting techniques for dynamic models, we are aware of only two relevant references: Dattner et al. (2017), using the direct integral approach, applied the separable nonlinear leastsquares to the inference of parameters in a predator–prey system acting in a heterogeneous environment, while Wu et al. (2019) used separability to estimate parameters of highdimensional linear ODE systems. Moreover, Varah’s idea of exploiting separability for estimation of ODE parameters has been implemented only recently in a publicly available software package (Yaari & Dattner (2019)).
The analysis in this paper is hoped to convince the reader that Varah’s idea is indeed worth pursuing. Specifically, we explore and compare two general data fitting approaches for dynamic models: the traditional nonlinear leastsquares method (NLS), and a proposed separable nonlinear leastsquares method (SLS). Through extensive MonteCarlo simulations of representative complex models, arising in different scientific fields, we identify and quantify significant statistical and computational gains when using this separation method. We will ultimately come to the conclusion that model separability can be very beneficial and that the SLS approach should be considered for complex dynamic systems with significant linear features.
The paper is organized as follows. In Section 2 we present details of the SLS methodology in the context of dynamic models. Section 3 describes the simulation setup, quantifies the statistical measures we use in order to compare the performance of SLS and NLS, and presents numerical results, while conclusions are provided in Section 4.
2. Separable nonlinear leastsquares (SLS) and Varah’s idea
2.1. Generalities
Following Varah’s original idea within the context of inference in dynamic models, the main advantages of exploiting separability for parameter estimation are the following (Golub & Pereyra (2003)):

fewer initial guesses are required for optimization,

the optimization problem is better conditioned, and

convergence is faster.
These advantages have been convincingly demonstrated in several publications. For example, see Mullen (2008) for an implementation and applications in physics and chemistry; Chung & Nagy (2010) for a highdimensional case, where the number of parameters is substantially larger than the number of observations; Gan et al. (2018), who compared the performance of several algorithms for SLS problems; and Erichson et al. (2018)
, who studied sparse principal component analysis via variable projection. Separable models are of broad practical applicability, and as such form a subject of active theoretical and applied research. For instance, when analyzing the “reduced” nonlinear optimization problem of a separable structure, simplified conditions are required for establishing a variety of theoretical results concerning numerical and statistical properties of the resulting estimators, compared to the original NLS problem (e.g.,
Basu & Bresler (2000) and Dattner & Klaassen (2015)).In the following we focus on complex dynamic models and, specifically, consider a system of ordinary differential equations given by
(2) 
where takes values in and Let
(3) 
where , and the symbol stands for the matrix transpose. Here , a vector of size , stands for the “nonlinear” parameters that are not separable from the state variables , while , a vector of size , contains the “linear” parameters; note that . As the vector field in (3) is separable in the linear parameter vector , we refer to the corresponding ODE system as linear in the parameter
(cf. the case of a linear regression model), although the
solution to the system might be highly nonlinear in , or even implicit.2.2. Solution strategy
Let be the solution of the initial value problem (2). We assume that noisy measurements on the system are collected at time points , where
(4) 
Here the random variables
are unobservable, independent measurement errors (not necessarily Gaussian) with zero mean and finite variance.
Varah’s approach to parameter estimation in ODE models works as follows. Let stand for a smoother of the data, obtained, e.g., using splines or local polynomials (see, e.g., Fan & Gijbels (1996), Green & Silverman (1994), and Wasserman (2006) for a treatment of various smoothing methods and an extensive bibliography). This smoother approximates the solution to the ODE (2). Varah suggests to insert the smoother into equation (2), which will now be satisfied only approximately, and to minimize the resulting discrepancy over the parameters and A minimizer is then an estimator of This idea was put on a solid statistical foundation in Brunel (2008) and Gugushvili & Klaassen (2012). Varah’s original approach requires the use of the derivative as an estimator of which is a disadvantage, as it is wellknown that estimating derivatives from noisy and sparse data may be rather inaccurate: see, e.g., Vilela et al. (2007) and Chou & Voit (2009), or more generally Fan & Gijbels (1996). Recent research (Dattner & Klaassen (2015), Dattner (2015), Vujačić et al. (2015), Chen et al. (2017), Vissing Mikkelsen & Hansen (2017), Dattner et al. (2017), Yaari et al. (2018), Dattner & Gugushvili (2018), Yaari & Dattner (2018), Dattner & Huppert (2018)) has shown that it is more fruitful to transplant Varah’s idea to the solution level of equation (2). The corresponding approach is referred to as integral estimation and proceeds as follows. Define the integral criterion function
(5) 
where is the Euclidean norm. A minimizer of (5) over gives a parameter estimator. In practice, the integral has to be discretized and replaced by a sum, so that minimization can be performed using the nonlinear leastsquares method,
(6) 
The NLS solution does not take into account the specific linear form of the ODEs in (3), but uses the general form in (2).
It is at this stage that Varah suggested to utilize separability, without actually investigating such an approach. However, details are easy to work out. Denote
Then, with kept fixed, a minimizer of (5) is given by
where denotes the identity matrix. The notation and emphasizes the dependence of the solution on the nonlinear parameters . This solution is plugged back into (5), yielding the reduced integral criterion function
(7) 
Once is minimized over and a solution
is obtained, estimators for and follow immediately, and are given (with some abuse of the matrix transpose notation) by
(8)  
respectively. Note that the nonlinear optimization is applied only for estimating the nonlinear parameters , which, in comparison to the NLS approach, can substantially reduce the dimension of the nonlinear optimization problem.
From the above derivation it is clear that SLS problems are a special class of NLS problems, with linear and nonlinear objective functions for different sets of variables. While the idea of using separability for improving parameter estimation was presented already in Lawton & Sylvestre (1971), it seems that much of the subsequent literature is based on the variable projection method proposed by Golub & Pereyra (1973). Golub & Pereyra (2003) reviewed 30 years of research into this problem.
3. Simulation framework and results
In order to investigate the relative performance of SLS and NLS, we designed and performed a large MonteCarlo simulation, whose results are presented in this section.
All the computations were done on an Amazon EC2 m5a.4xlarge instance and were carried out in R using the simode package of Yaari & Dattner (2019), which is designed specifically for using separability properties of ODEs. We note in passing that in the context of nonlinear regression, the variable projection method of Golub & Pereyra (1973) is implemented in R in the nls command; see Venables & Ripley (2002), pp. 218–220 for an example of its application; see also the TIMP package of Mullen & van Stokkum (2007). We used default smoothing and optimization settings in simode, and in that respect both SLS and NLS received equal treatment; in particular, simode uses crossvalidation (see, e.g., Wasserman (2006)) to determine the optimal amount of smoothing. The code to reproduce our numerical results can be accessed on GitHub.^{1}^{1}1See https://github.com/haroldship/complexity2019code/tree/master/Final Code First Submission For plotting, we relied on the ggplot2 package in R, see Wickham (2009).
3.1. MonteCarlo study design
We chose several representative and challenging ODE models arising in a variety of scientific disciplines. These were:

SIR for simulating an infection process;

LotkaVolterra population model with sinusoidal seasonal adjustment;

Generalised Mass Action (GMA) system within BST, e.g., for metabolic pathway systems;

FitzHughNagumo system of axon potentials.
Further mathematical details on these systems and the specific experimental setups we used are given below.
In each case, we generated observations by numerically integrating the system and next adding independent Gaussian noise to the time courses. We considered various parameter setups, sample sizes, and noise levels, as specified below. The ODE parameters were estimated via both NLS and SLS, defined in equations (6) and (8), respectively.
As performance criteria, the time required to perform optimization and accuracy of the resulting parameter estimates were used. While comparing computation times is trivial, numerous options are available for comparing accuracy. We focused on the main difference between the two optimization schemes, namely the way they deal with the estimation of linear parameters. SLS does not require initial guesses for these parameters. By contrast, NLS does require a good initial guess for each linear parameter, or otherwise it might diverge or get stuck in a local minimum: finding “good” solutions to nonlinear optimization problems requires “good” initial guesses in the parameter space. Thus some “prior information” regarding these parameters is of crucial importance for optimization purposes. The key insight is that this prior information is encapsulated in the mathematical form of the ODEs themselves, such as (3). Importantly, while NLS does not take into account the special mathematical features of the ODEs and treats all the parameters in a uniform manner, this is not the case for SLS. Thus, one might a priori expect SLS to be more efficient and possibly more accurate than NLS, when prior information regarding the linear parameters is of low quality. On the other hand, when one has high quality prior information regarding the linear parameters, we expect that SLS and NLS to perform similarly. One might note that the nonlinear parameters in almost all GMA and Ssystems are very tightly bounded, usually between 1 and +2, and that their sign is often known, whereas the linear parameters are unbounded in GMA systems and nonnegative in Ssystems, and nothing is known about their magnitudes (see Chapter 5 of Voit (2000)). Thus, not needing prior information on the linear parameters in SLS can be a tremendous advantage.
For the MonteCarlo study, we varied the prior information by using high, medium and low quality initial guesses for the parameter values. Here higher quality means that the initial guesses were closer to the truth. To be more specific, the initial guesses for the linear parameters used by NLS were Gaussian randoms variables centered on the true parameter values and having standard deviations equal to the true parameter multiplied by a prior information value (thus the prior information value can also be understood as the coefficient of variation of the “prior distribution”). The specific quantification of “high”, “medium” and “low” is admittedly somewhat subjective, and varied across the different ODE models, as specified below. For the sake of better and faster convergence of the optimization algorithms (especially NLS), the nonlinear parameters were constrained to a given range, and this range was the same no matter how we varied the prior information on linear parameters. Further, in each MonteCarlo iteration we used exactly the same (pseudorandom) initial guess for nonlinear parameters for both NLS and SLS. Thus, as far as the information on nonlinear parameters is concerned, this was kept invariant for each benchmark model, irrespective of the prior on linear parameters. Consequently, both algorithms received the same prior information regarding nonlinear parameters, and hence none was treated preferentially.
The noise level (signaltonoise ratio, SNR) we used is defined as follows. For a given solution of an ODE equation, we calculate the standard deviation . Then SNR of, say, and is given by , and , respectively, where is the standard deviation of a Gaussian measurement error as defined in equation (4). We will refer to these SNRs as “low noise” and “high noise”, respectively (cf. Johnstone & Silverman (2005), albeit in a different context). We then compared the mean square errors (MSE) of the resulting parameter estimates, which leads to a valid comparison in statistically identifiable ODE models (see, e.g., Dattner & Klaassen (2015) for relevant definitions and results). As another accuracy measure we used the criteria (5) and (7) evaluated at optimal parameter values. The two criteria we propose, though reasonable, are different. Hence, they are not expected to be in agreement in every experimental setup. However, the global conclusions reached with them in Section 4 are coherent and are in favor of SLS.
We now provide the mathematical details on the models and the experimental setups.
3.1.1. Agegroup SIR
The system is an epidemiological model of an SIRtype (Susceptible – Infected – Recovered), and includes age group and seasonal components. The epidemic in each age group and each season is described using two equations for the proportion of susceptible () and infected () individuals in the population (the proportion of recovered individuals is given by ):
(9) 
The parameters of the model are the transmission matrix , the recovery rate , and , which signify the relative infectivity of, e.g., influenza virus strains circulating in seasons compared to season ( is used as a reference and is fixed at ). As shown in Yaari et al. (2018), taking into account separability characteristics of this model is advantageous for data fitting purposes. Specifically, (9) is nonlinear in the initial values , which are typically unknown and have to be estimated. For our purposes it suffices to consider a model with one age group and one season. The following parameter setup was used: . We considered two sample sizes, and , and two noise levels, and . The prior information used was , corresponding to high, medium and low quality, respectively. The size of the MonteCarlo study was simulations.
3.1.2. LotkaVolterra with seasonal forcing
As another benchmark we considered an extension of a classical predatorprey model, namely the LotkaVolterra model including the seasonal forcing of the predation rate, using two additional parameters that control the amplitude () and phase () of the forcing:
The nonlinear parameters are and . We considered the dynamics within the time interval . The parameter setup is given by
and initial values are . Four experimental scenarios where studied, corresponding to sample sizes of and , and SNRs of and . The prior information values were , corresponding to high, medium and low quality, respectively. The size of the MonteCarlo study was simulations.
3.1.3. GMA system
The GMA system we analyzed consists of three differential equations in three variables (Voit (2000), pp. 84–85). They are:
Here the linear parameters are the rate constants , while the nonlinear ones are the kinetic orders . Note that the parameters are allowed to become negative and their sign might not be known too. We considered the dynamics of the system within the time interval . The parameter setup is the one presented in Voit (2000), namely
and initial values are . Four experimental scenarios were studied: sample sizes of and , and SNRs of and . The prior information values were , corresponding to high, medium and low quality, respectively. The size of the MonteCarlo study was simulations. Parameter estimation for GMA systems is considered to be a challenging numerical task (Voit (2000)).
3.1.4. FitzHughNagumo system
The FitzHughNagumo system (FitzHugh (1961), Nagumo et al. (1962), FitzHugh (1969), FitzHugh (1969)
) models spike potential activity in a neuron. It is given by
(10) 
This system with two state variables was proposed as a simplification of the model presented in Hodgkin & Huxley (1952) for studying and simulating the animal nerve axon. The model is used in neurophysiology as an approximation of the observed spike potential.
The system (10) is linear in parameters and , but nonlinear in . We considered two sample sizes, and , and two SNRs of and . The parameters were set to . The initial values were . The true solutions were obtained over the time interval . The prior information used here was , corresponding to high, medium and low quality, respectively.^{2}^{2}2The initial guesses for parameters were assured to be positive. The size of the MonteCarlo study was simulations. Many researchers studied the problem of parameter estimation for the FitzHughNagumo model. In particular, Ramsay et al. (2007), Campbell & Steele (2012) and Ramsay & Hooker (2017) pointed out several difficulties in estimating the parameters for this ODE system.
3.2. Results of the MonteCarlo analysis
Our findings are presented through charts and tables. The primary summaries are Tables 1 and 2, where we report the ratios of the mean square errors (square errors averaged over MonteCarlo simulations) for estimates of linear parameters (for nonlinear parameters, see the discussion at the end of this section). Several conclusions can be gleaned from the tables.

Given highquality prior information, the accuracy of NLS and SLS is comparable, and neither method has a clear lead throughout the variety of experimental setups.^{3}^{3}3At least some of the differences that one sees from the raw numbers in the tables are plausibly attributable to the MonteCarlo simulation error and as such appear to be insignificant.

When the quality of prior information degrades to medium or low, the performance of SLS becomes in most cases significantly better than that of NLS (with an extent depending on the specific experimental setup).

For a fixed noise level, as the sample size increases, the advantage of SLS becomes more pronounced.

For a fixed sample size, as the noise level increases, the SLS is still better than NLS, but to a lesser extent.
Low noise  High noise  

Prior  sir  ltk  gma  ftz  sir  ltk  gma  ftz 
low  8.3  5.8  4.0  2.3  2.8  2.2  2.2  1.3 
medium  3.9  1.8  3.1  1.7  1.4  1.2  1.8  1.2 
high  0.9  1.3  0.9  2.0  0.4  1.0  0.6  1.1 
Low noise  High noise  

Prior  sir  ltk  gma  ftz  sir  ltk  gma  ftz 
low  12.0  9.6  3.9  3.9  3.8  3.2  2.2  2.2 
medium  4.1  4.3  2.8  1.9  1.6  1.6  1.6  1.3 
high  1.0  2.2  0.7  2.3  0.7  1.2  0.5  1.5 
These points can also be visualized through a combination of simple statistical charts. Thus, Figure 1 displays the line graphs that compare MSEs of the two methods under several experimental setups. Whereas the numbers in Tables 1 and 2 are ratios of MSEs, in the figures we present the absolute MSE values. From the graphs, an advantage of SLS over NLS is apparent for less than ideal prior information. Note that in this specific setting SLS performed worse than NLS for high quality prior information. A plausible explanation lies in the fact that while under our experimental setup the amount of information used by SLS via (3) is fixed throughout simulations, NLS can in principle receive arbitrarily precise initial guesses on linear parameters. One may therefore envision existence of a point, from where on using the latter kind of information outweighs the benefits of using the structural relationship (3). However, a precise quantification of the phenomenon is hardly possible beyond an observation that it appears to manifest itself in scenarios with excellent knowledge on likely parameter values. In reality, ideal prior information is rare.
The bottom panel of Figure 1 further suggests that in the specific scenarios that we report there, SLS improves when the noise level diminishes; this is unlike NLS in that figure.
Figure 2 is a scatterplot of NLS and SLS losses (5) and (7) (on a log scale) evaluated at optimal parameter estimates. The figure highlights in yet another way the importance of prior information for NLS: it is evident that the performance of the latter is strongly affected by the quality of initial parameter guesses. Again, NLS and SLS perform similarly when the prior information is of high quality. However, when the quality of prior information is less than ideal, as it is in most applications, NLS becomes substantially worse than SLS. The scatterplot also gives a quick impression of the variability of estimation results.
The conclusions that we drew from Figure 2 are confirmed by the left panel of Figure 3, which presents boxplots of NLS and SLS losses (on a log scale) measured according to criteria (5) and (7). The pattern is clear: SLS is better than NLS, and the inferiority for NLS becomes more dramatic with degrading prior information.
The right panel of Figure 3 summarizes computation times. SLS is in general much faster. The execution time of NLS is affected by the quality of prior information, and interestingly, increases with this quality.
Low noise  High noise  

Prior  sir  ltk  gma  ftz  sir  ltk  gma  ftz 
low  23.0  1.7  1.1  1.1  6.6  1.1  1.0  1.0 
medium  8.7  1.2  1.0  1.0  2.6  0.9  1.0  1.0 
high  1.0  1.0  0.9  1.0  0.4  0.9  0.9  1.0 
Low noise  High noise  

Prior  sir  ltk  gma  ftz  sir  ltk  gma  ftz 
low  29.0  4.1  1.1  1.6  6.5  1.3  1.0  1.3 
medium  8.2  2.1  0.9  1.0  2.2  1.1  0.9  1.0 
high  0.9  1.2  0.8  1.0  0.7  0.9  0.8  1.0 
Finally, in Tables 3 and 4 we provide information regarding the nonlinear parameters, where in the case of NLS one can observe how the prior knowledge on linear parameters propagates itself into estimation accuracy for nonlinear ones. In particular, for less than ideal prior information on the linear parameters, SLS holds a pronounced edge over NLS also in the case of nonlinear parameters.
4. Conclusions and outlook
In this work, we designed an extensive simulation study to explore the relative statistical and computational performance of two optimization schemes for inference in dynamic systems: the typical nonlinear leastsquares (NLS) method and a new separable nonlinear leastsquares (SLS) approach. As benchmarks, we considered several widely used ODE models arising in a variety of scientific fields. We measured statistical performance of the two methods by the mean square error (MSE) of the estimates. As another performance criterion, we employed the loss function values at the optimal parameter estimates. Computational performance of the methods was also compared by the execution times required to complete each optimization.
A pattern that emerged from our study is that SLS in general performs at least as well as, and frequently better than NLS, especially if the prior information on optimal parameters is not ideal, which is typically the case in practice. This statement is uniformly true over all models tested.
Our recommendation therefore is that parameter estimation problems for complex dynamic systems should be addressed, whenever the system contains an appreciable number of linear parameters, with the separable nonlinear leastsquares method, rather than the more commonly used nonlinear leastsquares method.
While the above message is simple and unambiguous, one must stress that data fitting in complex dynamical systems remains a challenging problem that cannot be treated in a cavalier fashion, even if one takes advantage of separability. For instance, in order to uncover the patterns in Section 3 of this work, we had to carefully design the experimental study, because otherwise simulations might not have converged, or might have converged to poor solutions. This was true for both NLS and SLS, but whenever they were observed, convergence issues were much more severe for NLS (especially sensitive was the case of the FitzHughNagumo system). This result highlights the crucial role of prior information regarding the parameters, or expressed differently, the quality of the initial parameter guesses used for the optimization. We focused here primarily on the effects of the prior information on the linear parameters. However, it also became clear that prior information on the nonlinear parameters has an equally crucial role for optimization purposes, this being true for both NLS and SLS (data not shown).
As a result of our exploratory work, we envision the following promising research directions for the future.

Numerical implementation of SLS for dynamic systems. All the computations in our paper were done in R using the simode package of Yaari & Dattner (2019). However, the idea of using separability properties of ODEs is independent of a particular programming language and can be implemented within other software packages quite as well. Indeed, much work has been done in the context of the variable projection method since it was first introduced in Golub & Pereyra (1973). We are aware of the TIMP package of Mullen & van Stokkum (2007), which implements the variable projection method. Thus, a next step could be to combine the strengths of both packages, simode and TIMP, in order to develop advanced software for variable projection in the context of dynamic systems.

Customized algorithms for specific classes of complex dynamical systems. It is wellknown that the performance of an optimization scheme depends crucially on the underlying mathematical model used for description of the data. Thus, it appears that different classes of dynamic models require specific algorithms tailored to their peculiarities. For instance, parameter estimation for GMA systems has different challenges than those encountered when working with SIR (see Section 3). We expect that there is much to gain from focusing future research on specific classes of models and developing stable algorithms for their parameter estimation.

Theoretical properties of SLS in the context of dynamic systems. Gugushvili & Klaassen (2012) studied the statistical properties of NLS in the general context of smoothing, while Dattner & Klaassen (2015) specifically addressed ODE systems that are linear in (functions of) the parameters. One might expect that some assumptions used in Gugushvili & Klaassen (2012) can be relaxed when the problem is closer to the one considered in Dattner & Klaassen (2015).

Extensions to partially observed, highdimensional, and misspecified dynamic systems. Recent work dealing with inference in highdimensional ODE models suggests that exploiting linearity in parameters is crucial for developing a successful estimation methodology (see, e.g., Chen et al. (2017) and Wu et al. (2019)). More generally, it would be interesting to study with the variable projection method the cases of partially observed, highdimensional, and possibly misspecified dynamic systems. This might additionally require the use of highdimensional regularization techniques (e.g., Chen et al. (2017)) for balancing data and model, and specifically taking into account a potential model misspecification (see Ramsay et al. (2007)).
References
 Basu & Bresler (2000) Basu, S., & Bresler, Y. (2000). The stability of nonlinear least squares problems and the CramérRao bound. IEEE Transactions on Signal Processing, 48(12), 3426–3436.
 Brunel (2008) Brunel, N. J. B. (2008). Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Stat., 2, 1242–1267.
 Campbell & Steele (2012) Campbell, D., & Steele, R. J. (2012). Smooth functional tempering for nonlinear differential equation models. Stat. Comput., 22(2), 429–443.
 Chen et al. (2017) Chen, S., Shojaie, A., & Witten, D. M. (2017). Network reconstruction from highdimensional ordinary differential equations. Journal of the American Statistical Association, 112(520), 1697–1707.
 Chou & Voit (2009) Chou, I.C., & Voit, E. O. (2009). Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math. Biosci., 219(2), 57–83.
 Chung & Nagy (2010) Chung, J., & Nagy, J. G. (2010). An efficient iterative approach for largescale separable nonlinear inverse problems. SIAM J. Sci. Comput., 31(6), 4654–4674.
 Dattner (2015) Dattner, I. (2015). A modelbased initial guess for estimating parameters in systems of ordinary differential equations. Biometrics, 71(4), 1176–1184.
 Dattner & Gugushvili (2018) Dattner, I., & Gugushvili, S. (2018). Application of onestep method to parameter estimation in ODE models. Stat. Neerl., 72(2), 126–156.
 Dattner & Huppert (2018) Dattner, I., & Huppert, A. (2018). Modern statistical tools for inference and prediction of infectious diseases using mathematical models. Stat. Methods Med. Res., 27(7), 1927–1929.
 Dattner & Klaassen (2015) Dattner, I., & Klaassen, C. A. J. (2015). Optimal rate of direct estimators in systems of ordinary differential equations linear in functions of the parameters. Electron. J. Statist., 9(2), 1939–1973.
 Dattner et al. (2017) Dattner, I., Miller, E., Petrenko, M., Kadouri, D. E., Jurkevitch, E., & Huppert, A. (2017). Modelling and parameter inference of predator–prey dynamics in heterogeneous environments using the direct integral approach. Journal of The Royal Society Interface, 14(126), 20160525.

Erichson et al. (2018)
Erichson, N. B., Zheng, P., Manohar, K., Brunton, S. L., Kutz, J. N.,
& Aravkin, A. Y. (2018).
Sparse principal component analysis via variable projection.
arXiv eprints.
URL https://arxiv.org/abs/1804.00341 
Fan & Gijbels (1996)
Fan, J., & Gijbels, I. (1996).
Local polynomial modelling and its applications, vol. 66 of
Monographs on Statistics and Applied Probability
. London: Chapman & Hall.  FitzHugh (1961) FitzHugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6), 445–466.
 FitzHugh (1969) FitzHugh, R. (1969). Mathematical models of excitation and propagation in nerve. In H. P. Schwan (Ed.) Biological engineering, vol. 9 of Interuniversity electronics series, chap. 1, (pp. 1–85). New York, NY: McGrawHill.
 Gan et al. (2018) Gan, M., Chen, C. L. P., Chen, G., & Chen, L. (2018). On some separated algorithms for separable nonlinear least squares problems. IEEE Transactions on Cybernetics, 48(10), 2866–2874.
 Gennemark & Wedelin (2007) Gennemark, P., & Wedelin, D. (2007). Efficient algorithms for ordinary differential equation model identification of biological systems. IET Systems Biology, 1(2), 120–129.
 Girolami & Calderhead (2011) Girolami, M., & Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. With discussion and authors’ reply. J. R. Stat. Soc., Ser. B, Stat. Methodol., 73(2), 123–214.
 Golub & Pereyra (2003) Golub, G., & Pereyra, V. (2003). Separable nonlinear least squares: The variable projection method and its applications. Inverse Probl., 19(2), 1–26.
 Golub & Pereyra (1973) Golub, G. H., & Pereyra, V. (1973). The differentiation of pseudoinverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal., 10(2), 413–432.
 Green & Silverman (1994) Green, P. J., & Silverman, B. W. (1994). Nonparametric regression and generalized linear models: a roughness penalty approach, vol. 58 of Monographs on Statistics and Applied Probability. London: Chapman & Hall.
 Gugushvili & Klaassen (2012) Gugushvili, S., & Klaassen, C. A. J. (2012). consistent parameter estimation for systems of ordinary differential equations: bypassing numerical integration via smoothing. Bernoulli, 18(3), 1061–1098.
 Himmelblau et al. (1967) Himmelblau, D., Jones, C., & Bischoff, K. (1967). Determination of rate constants for complex kinetics models. Industrial & Engineering Chemistry Fundamentals, 6(4), 539–543.
 Hodgkin & Huxley (1952) Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500–544.
 Johnstone & Silverman (2005) Johnstone, I. M., & Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Stat., 33(4), 1700–1752.
 Lawton & Sylvestre (1971) Lawton, W. H., & Sylvestre, E. A. (1971). Elimination of linear parameters in nonlinear regression. Technometrics, 13(3), 461–467.
 Lotka (1956) Lotka, A. J. (1956). Elements of mathematical biology. New York: Dover Publications, Inc. Unabridged republication of the first edition published under the title: Elements of physical biology.
 May (2001) May, R. M. (2001). Stability and complexity in model ecosystems. With a new introduction by the author. Princeton, NJ: Princeton University Press, 2nd ed.
 McGoff et al. (2015) McGoff, K., Mukherjee, S., & Pillai, N. (2015). Statistical inference for dynamical systems: A review. Statist. Surv., 9, 209–252.
 Mullen & van Stokkum (2007) Mullen, K., & van Stokkum, I. (2007). TIMP: An R package for modeling multiway spectroscopic measurements. Journal of Statistical Software, 18(3), 1–46.
 Mullen (2008) Mullen, K. M. (2008). Separable nonlinear models: theory, implementation and applications in physics and chemistry. Phd thesis, Vrije Universiteit.
 Nagumo et al. (1962) Nagumo, J., Arimoto, S., & Yoshizawa, S. (1962). An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10), 2061–2070.
 Peschel & Mende (1986) Peschel, M., & Mende, W. (1986). The predatorprey model: do we live in a Volterra world?. Springer.
 Ramsay & Hooker (2017) Ramsay, J., & Hooker, G. (2017). Dynamic data analysis. Modeling data with differential equations. Springer Series in Statistics. New York, NY: Springer.
 Ramsay et al. (2007) Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J. (2007). Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc., Ser. B, Stat. Methodol., 69(5), 741–796.
 Ruhe & Wedin (1980) Ruhe, A., & Wedin, P. Å. (1980). Algorithms for separable nonlinear least squares problems. SIAM Review, 22(3), 318–337.
 Savageau (1976) Savageau, M. A. (1976). Biochemical systems analysis. A study of function and design in molecular biology, vol. 6739 of Advanced Book Program. London etc.: AddisonWesley Publishing Company.
 Schittkowski (2002) Schittkowski, K. (2002). Numerical data fitting in dynamical systems. A practical introduction with applications and software. Dordrecht: Kluwer Academic Publishers.
 Tufte (2001) Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, CT: Graphics Press, 2nd ed.
 Varah (1982) Varah, J. (1982). A spline least squares method for numerical parameter estimation in differential equations. SIAM Journal on Scientific and Statistical Computing, 3(1), 28–46.
 Venables & Ripley (2002) Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S. Statistics and Computing. New York, NY: Springer, 4th ed.
 Vilela et al. (2007) Vilela, M., Borges, C. C. H., Vinga, S., Vasconcelos, A. T. R., Santos, H., Voit, E. O., & Almeida, J. S. (2007). Automated smoother for the numerical decoupling of dynamics models. BMC Bioinformatics, 8(1), 305.

Vissing Mikkelsen & Hansen (2017)
Vissing Mikkelsen, F., & Hansen, N. R. (2017).
Learning large scale ordinary differential equation systems.
arXiv eprints.
URL https://arxiv.org/abs/1710.09308  Voit (2000) Voit, E. O. (2000). Computational analysis of biochemical systems: A practical guide for biochemists and molecular biologists. Cambridge University Press.
 Voit (2013) Voit, E. O. (2013). Biochemical systems theory: a review. ISRN Biomathematics, 2013, Article ID 897658.
 Volterra (1926) Volterra, V. (1926). Fluctuations in the abundance of a species considered mathematically. Nature, 118, 558–560.
 Vujačić et al. (2015) Vujačić, I., Dattner, I., González, J., & Wit, E. (2015). Timecourse window estimator for ordinary differential equations linear in the parameters. Stat. Comput., 25(6), 1057–1070.
 Wasserman (2006) Wasserman, L. (2006). All of nonparametric statistics. Springer Texts in Statistics. New York, NY: Springer.
 Wickham (2009) Wickham, H. (2009). ggplot2. Elegant graphics for data analysis. Use R! New York, NY: Springer.
 Wu et al. (2019) Wu, L., Qiu, X., Yuan, Y.x., & Wu, H. (2019). Parameter estimation and variable selection for big systems of linear ordinary differential equations: A matrixbased approach. J. Amer. Statist. Assoc., 114(526), 657–667.

Yaari & Dattner (2018)
Yaari, R., & Dattner, I. (2018).
simode: R Package for statistical inference of ordinary differential
equations using separable integralmatching.
arXiv eprints.
URL https://arxiv.org/abs/1807.04202 
Yaari & Dattner (2019)
Yaari, R., & Dattner, I. (2019).
simode: Statistical inference for systems of ordinary
differential equations using separable integralmatching.
R package version 1.1.4.
URL https://CRAN.Rproject.org/package=simode  Yaari et al. (2018) Yaari, R., Dattner, I., & Huppert, A. (2018). A twostage approach for estimating the parameters of an agegroup epidemic model from incidence data. Stat. Methods Med. Res., 27(7), 1999–2014.
Comments
There are no comments yet.