1 Introduction
Ordinary differential equations (ODEs) are an extensively studied mathematical problem. Dynamical systems are described by ODEs, and the application of these extends to many different fields of science and engineering. Given the initial value proplem (IVP) on the time interval
(1) 
with globally Lipschitz continuous, we want to compute a numerical approximation of the solution using a probabilistic approach. Solving ODEs is one of the major tasks of numerical mathematics.
Classical numerical ODE solvers (e.g. Runge Kutta methods) construct the solution by iteratively evaluating the vector field at discrete times at numerical approximations and collecting information on the derivative .
As pointed out in the past [12, 3, 10], to incorporate uncertainties that arise due to the inaccurate observation of and consequently accumulated numerical error, we can interpret the observed evaluations of as (potentially noisy) data for statistical inference.
For quantifying the uncertainty on we model the analytic solution as a GaussMarkov stochastic process defined on a probability space . The iterative algorithm then outputs the law of , i.e. the probability measure that the process induces on the collection of functions from to the state space . This probability measure contains the belief over .
From this viewpoint, unknown numerical quantities are modelled as random variables and numerical algorithms return probability distributions instead of points estimates. This is a probabilistic approach which carries out numerical computations in a statistically interpretable way by combining probability theory and numerics, and permits to quantify the uncertainties for the computations. This idea became a dynamic research area which includes optimization, linear algebra and differential equations and is called
Probabilstic Numerics (PN) [6].A PN solver for ODEs was introduced by [5] who recast the inference of the solution as a Gaussian process (GP) regression. However, their original solver did not satisfy the polynomial convergence rates of classical solvers. In parallel development, [1] proposed a similar solver of similar structure, which uses a Monte Carlo, instead of a closedform Gaussian, updating scheme. This solver produced solutions of linear order, but still not of polynomial order. In [15], the polynomial convergence rate was established by choosing an Integrated Wiener Process (IWP) prior. This solver was further amended by adding calibrated uncertainty quantification by Bayesian quadrature [9] and using the Markov property of the IWP to recast it as a filter, thereby drastically speeding up the probabilistic computations [9, 16]. While the resulting filter is the fastest probabilistic ODE solver available, it outputs solely Gaussian probability measures. Another novel method introduced in [2] provides a more flexible nonparametric uncertainty quantification by defining a generative process to sample random numerical solutions of the ODE, obtained by adding Gaussian noise to the solution of a classical solver. The computational cost of this method per sample, however, is equal to the cost of the underlying solver, which makes it impractical for costsensitive applications.
In this paper we recall the solver in [15, 9] which employs a times integrated Wiener process prior on the solution and introduce instead an Integrated Ornstein Uhlenbeck (IOU) process prior. This new model improves the algorithm for ODEs whose trajectories have a higher derivative which is expected to gravitate back to zero. In the following, we will call such systems ’ODEs with bounded derivatives’.
2 Gaussian filtering for ODEs
2.1 Bayesian filtering problem
Assume we have a signal process which is not directly observable, solution of the stochastic differential equation (SDE)
(2) 
where denotes a Wiener process. Instead, we can only see an observation process that is a transformation of . Both processes are defined on some probability space . The goal of the filtering problem is to compute, given the observations , the optimal estimate of for . Accordingly, has to be measurable w.r.t. the natural filtration and minimize the meansquare distance between and the candidates in :
(3) 
The best mean square estimate of given is known to be the conditional expectation [8, 11]. In a Bayesian spirit, we seek to compute the posterior distribution of given the current and previous noisy observations . This can be done by using Bayes’ rule
(4) 
The distribution is the measurement model which describes the noisy relationship between the true signal and the observations, and is the prior containing preliminary information on before any observation are taken into account. A natural question is then: which prior should we put on the signal process
? Due to the computationally advantageous properties of Gaussian distributions which allow for fast inference by matrix computations we assume
to be a Gaussian process, i.e. to have finite joint Gaussian distribution. Moreover we want to iteratively update the solution from to without taking into account the previous observations at time steps . This is possible when is also a Markov process, since otherwise the computational cost per step would exponentially increase in the number of previous steps. Sample continuous GaussMarkov processes can be written as the solution of the linear timeinvariant stochastic differential equation (LTISDE) of the form(5) 
with Gaussian initial condition . The matrix is the so called drift matrix and is the diffusion matrix of the process. The solution of Eq. 5 depends on the form of and . With this conditions the distribution of the estimate
can be computed by Kalman filtering
[14], an iterative algorithmic implementation for the linear filtering problem with Gaussian model assumptions. Since it is essentially an application of Bayes’ rule, it is determined by the choice of the prior and the measurement model . Previous work [9, 15, 16] employed an IWP prior, discussed in Section 2.3, whereas in this paper we will analyse Bayesian filtering for ODEs with an IOUP prior. Due to the properties of Gaussian distributions, together with Gaussian measurement model assumptions, we can analytically compute the matrices for Gaussian filtering for LTI SDE like Eq. 5 (see [13]).2.2 Filtering for ODEs
We use Bayesian filtering for computing a numerical estimation of the solution of the IVP Eq. 1. We treat the imprecise measurements as noisy observations. Since is a numerical approximation of the true solution at time the evaluations of are indeed noisy. Therefore, we model the solution and the first derivatives as a draw of a Gauss Markov stochastic process . Hence, the SDE in Eq. 5, which describes the dynamics of , has the following form
(6) 
The drift matrix has nonzero entries only on the firstoff diagonal and on the last row. This form of guarantees that the th component is the derivative of the component in the state space. The parameters determine the process employed on the the component of the state space : they are all equal to zero for an integrated Wiener process, equal to zero apart from for the integrated OUP with some , whereas for the Matern process [4] we have . Inference in Eq. 6 with linear Gaussian measurements models gives rise to the Kalman Filter equations implemented in Algorithm 1. The algorithm outputs the filtering distribution , where .
The transition matrices and in lines 67 are given by ([13])
(7) 
(8) 
where and come from Eq. 6. The matrix is the measurement model matrix (in probabilistic terms ), in our case the th unit vector . In line 1 a prior model is selected. Lines 67 compute the prediction distribution , which depends from the form of the matrices and and is therefore strictly related to the choice of the prior. Lines 913 are known as update step and compute the updated distribution . In the experiments we will assume for simplicity , i.e. the numerical approximation is exact.
2.3 Prior selection: Integrated Wiener process
The standard Brownian motion is a Gauss Markov process with mean
and variance
. Samples of the process are depicted in Fig. 1 (b). Previous work [9, 15, 16] worked with an IWP prior for probabilistic ODE solvers. In this case and are given by [15](9) 
(10) 
This work showed the useful uncertainty quantification for the solution with only a small computational overhead compared to classical numerical methods. In particular, Schober et al.[15] constructed a class of probabilistic solvers whose posterior mean have the same analytic structure of the one of the classic numerical RungeKutta methods (for first, second and third order), thus acquiring their good properties.
3 Ornstein Uhlenbeck process
Can we use other Gauss Markov priors over the solution which preserve good convergence rates? In this paper we model the solution function and the first derivatives as a draw from an integrated Ornstein Uhlenbeck process (IOUP) to perform inference in ODEs. We show that this model is competitive with the IWP solver discussed in the previous section. The Ornstein Uhlenbeck process (OUP) is a GaussMarkov process whose dynamic is governed by the following SDE
(11) 
where and denotes a standard Wiener process.
The OUP converges to its stationary distribution in the sense that , for .
The difference between the Wiener process and the Ornstein Uhlenbeck is the drift term , which is constant for the Wiener process and dependent on the parameter for OUP.
If the current value of the process is smaller than the value of the mean , the drift is positive and the process tends to drift back to . The same (symmetric) effect occurs if the current value of is greater than . In other words, the samples evolve around the mean which acts as an equilibrium level. Some samples of the process are shown in Fig. 1 (a).
This property is called ”mean reverting” and the magnitude of determines the strength of this drift. For small values of the effect vanishes, whereas, for large , stays close to the mean with high probability. This behaviour justifies the fact that the variance of the OUP is bounded, in fact bounded by , whereas the Wiener process variance is unbounded () and the samples don’t tend to remain around the mean.
3.1 Bayesian Filtering with Integrated Ornstein Uhlenbeck process
We analyse the probabilistic interpretation of ODEs by filtering investigated in Section 2.2 in the case of a times integrated Ornstein Uhlenbeck process.
To this end, we a priori assume that the solution of the ODE and the first derivatives follow a times integrated Ornstein Uhlenbeck process with initial conditions . Accordingly, the drift matrix in Eq. 6 has entries , , on the last row. Using this prior model the matrices and can be computed analytically (see appendix) from Eq. 7 and Eq. 8 and have the following form:
(12) 
(13) 
A probabilistic solver with IOUP prior has local convergence rate for the predicted posterior mean of the same order as the solver with an IWMP prior:
Proposition 1
The order of the prediction in the first step
(14) 
for some constant independent of .
Proof
(15)  
(16)  
(17)  
(18) 
By writing the Taylor expansion of and subtracting it from Eq. 18, we obtain the desired order for the truncation error.
As mentioned above, we want to exploit of the ’mean reverting property’ of OUP (and absent in WP), or equivalently of the fact that OUP has bounded variance, for solving ODEs whose trajectories have bounded derivatives. Since we use a times IOUP prior over the solution, we want that also the integral of the process is maintained the same behaviour, i.e. tends to stay closer to the mean than the WP. This is indeed the case, as we proved that (see appendix).
4 Experiments
In this section we analyse applications of the probabilistic ODE solver with IOUP prior discussed in Section 3.1. In particular we compare this solver with the one considered in Section 2.3, which uses an IWP prior over the solution , by applying the two probabilistic ODE solvers to the same ODEs and examining the plots. The experiments aim to test the intuition that the selection of the prior depends on the structure of the ODE’s solution: when the solution looks ’meanreverting’, in the sense that it has bounded derivatives, we expect better results from the IOUP prior, whereas we should choose an IWP prior when solutions to ODEs have unbounded derivatives or look divergent. According to these considerations Table 1 shows our prior belief on which solver (IOUP or IWP) performs better for which of the following ODEs.
For the experiments the IOUP and IWP filters are initialized with mean and variance with nonzero entries only in the minor matrix where ord is the order of the ODE and
is the identity matrix.
ODE  IWPsolver  IOUPsolver 

1. a) Exponential function  
1. b) Negative exponential function  
2. Oscillators  
3. Moderate systems 
4.1 Linear ODEs
The solution of a), the exponential function, has a divergent trajectory with unbounded derivatives and as we expected the plot in Fig. 2 (a) shows that the IWPprior solver approximates the solution better than the IOUPprior.
On the contrary, the negative exponential function, solution of b), decays to zero. This resembles the behaviour of the meanreverting samples of the IOU process, which should therefore work as a good prior over . The plot in Fig. 2 (b) confirms this intuition: the IOUPsolver approximates better the solution than the IWPsolver. Both methods use a model of step size .
The plot in Fig. 2 (b) clearly shows the behaviour of the two solvers: IWP oscillates much more than IOUP around the solution when this converges to zero, because the IWP doesn’t tend to ’drift back’ to the mean, whereas the IOUP solver remains closer to the solution due to the process drifting property and smaller variance.
4.2 Oscillators
We used an IOUPprior to compute approximate oscillating solutions of ODEs in Problem 2. a) and b). The original intuition was that an IOUPsolver would be a good prior for oscillators, because their trajectory resembles the ’meanreverting’ property and looks similar to the samples of an OUP. However, we do not always see better results of the IOUPapproximate solution. Fig. 3 depicts two different oscillators: in the plot (a) the IWP solver distinctly approximates better the solution in Problem 2 a). On the other hand the Van der Pol oscillator (Problem 2 b)) illustrated in Fig. 3 (b) is better approximated by an IOUP prior. To motivate these results we argue that the similarity to the OUP samples is a global property of the solution’s structure. However, the filter computes the approximate solution along the time axis ’locally’, proceeding with small time steps. In particular the IOUP filter predicts with high probability the th derivative smaller at every time step, whereas the derivative of an oscillator is periodically changing and doesn’t decay to zero or converges to a finite number.
4.3 Moderate systems
Problem 3. is a system of ODEs (of order 1 and dimension 10) which describes a radioactive decay chain. The ten solutions of the ODE problem are shown in Fig. 4 : the trajectories decay to zero for (apart from which converges to one). Again, as discussed for ODE 1 , the solution functions look similar to the mean reverting samples of the OUP. In Fig. 5 we plotted two orbits of the ODE problem 3, with a step size of and . While the IWP oscillates around the orbit, the IOUP solvers remains near to it. Moreover the more stiff is the orbit, the more we expect the OrnsteinUhlenbeck to perform well. Indeed, the plots confirm that this intuition might be useful.
5 Discussion
The experiments presented in Section 4 provide good material to discuss the differences between the IOUP and the IWP solver. An IOUP prior appears to approximate better ’ODEs whith bounded derivatives’, due to the drifting property which keeps the process near to the mean zero with high probability (in our model the th derivative of the solution). Fig. 2 (b) and Fig. 5 suggest this argument is reasonable: it distinctly shows how the approximate solution with an IWP prior oscillates unstably around the analytical solution due to the fact that it is less conservative than the OUP (in the sense that it tends to deviate away from its mean) whereas the IOUP solver remains closer to it. In this perspective, an IWP works as a better prior for divergent ODEs, like e.g the exponential function (Fig. 2 a)). Our intuition was not validated for oscillators (Fig. 3). This is probably due to the fact that the trajectory of oscillators— although it resembles the drifting property— periodically changes the value of the derivative, whereas the IOUPsolver predicts a smaller derivative at each time step.
6 Summary
In the framework for Bayesian filtering we examined theory of probabilistic ODE solvers which return Gaussian process posterior distributions instead of point estimates and provide uncertainty measures over the solutions. We stressed the Bayesian nature of these solvers and discussed the question of prior selection. Hence we proposed a novel solver which employs an IOUP prior over the solution of ODEs, and provided experiments which show a competitive behaviour of this solver with the already wellestablished IWPsolver.
References
 [1] O. Chkrebtii, D. Campbell, M. Girolami, and B. Calderhead, Bayesian Uncertainty Quantification for Differential Equations, arXiv prePrint 1306.2365, (2013).
 [2] P. Conrad, M. Girolami, S. Särkkä, A. Stuart, and K. Zygalakis, Statistical analysis of differential equations: introducing probability measures on numerical solutions, Statistics and Computing, (2016), https://doi.org/10.1007/s1122201696710.
 [3] P. Diaconis, Bayesian numerical analysis, in Statistical decision theory and related topics, IV, Vol. 1 (West Lafayette, Ind., 1986), Springer, New York, 1988, pp. 163–175.

[4]
J. Hartikainen and S. Särkkä, Kalman filtering and
smoothing solutions to temporal Gaussian process regression models
, in IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2010, 2010, pp. 379–384.

[5]
P. Hennig and S. Hauberg, Probabilistic Solutions to Differential
Equations and their Application to Riemannian Statistics
, in Proc. of the 17th int. Conf. on Artificial Intelligence and Statistics (AISTATS), vol. 33, JMLR, W&CP, 2014.
 [6] P. Hennig, M. Osborne, and M. Girolami, Probabilistic numerics and uncertainty in computations, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471 (2015).
 [7] T. Hull, W. Enright, B. Fellen, and A. Sedgwick, Comparing numerical methods for ordinary differential equations, SIAM Journal on Numerical Analysis, 9 (1972), pp. 603–637.
 [8] I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus, vol. 113 of Graduate Texts in Mathematics, SpringerVerlag, New York, second ed., 1991, https://doi.org/10.1007/9781461209492, http://dx.doi.org/10.1007/9781461209492.
 [9] H. Kersting and P. Hennig, Active Uncertainty Calibration in Bayesian ODE Solvers, Uncertainty in Artificial Intelligence (UAI), (2016).
 [10] A. O’Hagan, Some Bayesian numerical analysis, in Bayesian statistics, 4 (Peñí scola, 1991), Oxford Univ. Press, New York, 1992, pp. 345–363.
 [11] B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, Universitext, SpringerVerlag, Berlin, sixth ed., 2003, https://doi.org/10.1007/9783642143946, http://dx.doi.org/10.1007/9783642143946.
 [12] H. Poincaré, Calcul des probabilités, GauthierVillars, Paris, 1896.

[13]
S. Särkkä,
Recursive Bayesian Inference on Stochastic Differential Equations
, PhD thesis, Helsinki University of Technology, 2006.  [14] S. Särkkä, Bayesian filtering and smoothing, Cambridge University Press, 2013.
 [15] M. Schober, D. Duvenaud, and P. Hennig, Probabilistic ODE Solvers with RungeKutta Means, Advances in Neural Information Processing Systems (NIPS), (2014).
 [16] M. Schober, S. Särkkä, and P. Hennig, A probabilistic model for the numerical solution of initial value problems, ArXiv eprints, (2016), https://arxiv.org/abs/1610.05261.
7 Appendix
7.1 Matrices and or IOUP
It is easy to see that
(19) 
Let’s calculate given by [13]
(20) 
Where
with . We assume . We have
so
and
Then
Writing
we get
(21) 
7.2 IOUP local convergence rate for the posterior mean
If we employ a times integrated OUP over the solution , where , the prediction step for the mean with initial value is where
(22) 
7.3 Variances of IOUP and IWP
Let denote an Ornstein Uhlenbeck process with and a standard Wiener process. In this section we will write with for simplicity in the calculations. We want to show :
We calculate a) and b):

So


So we can see that
Comments
There are no comments yet.