Lagrangian uncertainty quantification and information inequalities for stochastic flows

05/21/2019 ∙ by Michal Branicki, et al. ∙ 0

We develop a systematic information-theoretic framework for quantification and mitigation of error in probabilistic Lagrangian (i.e., trajectory-based) predictions which are obtained from (Eulerian) vector fields generating the underlying dynamical system in a way which naturally applies in both deterministic and stochastic settings. This work is motivated by the desire to improve Lagrangian predictions in complex, multi-scale systems based on simplified, data-driven models. Here, discrepancies between probability measures μ and ν associated with the true dynamics and its approximation are quantified via so-called φ-divergencies, D_φ(μν), which are premetrics defined by a class of strictly convex functions φ. We derive general information bounds on the uncertainty in estimates, E^ν[f], of `true' observables E^μ[f] in terms of φ-divergencies; we then derive two distinct bounds on D_φ(μν) itself. First, an analytically tractable bound on D_φ(μν) is derived from differences between vector fields generating the true dynamics and its approximations. The second bound on D_φ(μν) is based on a difference of so-called finite-time divergence rate (FTDR) fields and it can be exploited within a computational framework to mitigate the error in Lagrangian predictions by tuning the fields of expansion rates obtained from simplified models. This new framework provides a systematic link between Eulerian (field-based) model error and the resulting uncertainty in Lagrangian (trajectory-based) predictions.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Consider a probability space and a dynamical system on a manifold generated by a map , . Given the paths, , labeled by their initial conditions and contained in , we refer to estimation of path-based observables, e.g., , and of the underlying law of as Lagrangian predictions; in contrast, we refer to the issue of estimating the map itself as Eulerian predictions. This terminology follows from studies of transport in dynamical systems (e.g., [57, 65, 72, 76, 1]

). Many dynamical systems encountered in applications generate highly complex dynamics and they involve a very large number of degrees of freedom with nonlinear couplings across a wide range of spatio-temporal scales; examples range from dynamics of fluid flows, to neural networks, systems biology and molecular dynamics, to financial mathematics. Various approximations (systematic or ad-hoc) of the true dynamics which are necessary in such cases result in loss of information in the simplified dynamics, thus making the subsequent estimates of the observables uncertain and often unreliable. In this work we focus on developing a framework for

Lagrangian uncertainty quantification (LUQ) which is concerned with deriving bounds on the error associated with estimation of observables based on approximate/reduced models of the original dynamics leading to ; here , and is a projection onto . We confine ourselves to situations where the dynamical system generating is induced by an ODE/SDE


where the (Eulerian) fields generate continuous solutions with the initial condition distributed according to the probability measure , and are independent one-dimensional Wiener processes (we postpone details to §2 and §4).

The core challenge in assessing robustness and accuracy of Lagrangian predictions lies in the nonlinear and nonlocal-in-time nature of extraction of Lagrangian information from the dynamics generated by the Eulerian fields . Despite superficial similarities, the Lagrangian uncertainty quantification and error mitigation is distinctly different from uncertainty quantification in the Eulerian case; a simple example is sketched in Figure 1. In particular, minimising the lack of information between the fields and their approximations is analogous to the framework developed in the Eulerian context in [58, 60, 19, 59, 18, 20, 17]. On the other hand, approximations of the observables are inherently non-local in time, and they are sensitive to small perturbations in the (Eulerian) fields generating (1.1). Importantly, the Eulerian accuracy does not generally imply Lagrangian accuracy (see Figures 1 and 2). For example, even if the fields generating (1.1) are well-approximated in the sense that , in a suitable norm, this does not imply that , since the trajectory structure and the associated pathspace probability measures and their time-marginals , can be very different; this fact is well-known in the theory of deterministic dynamical systems in the context of bifurcation theory and nearly-integrable chaotic dynamics (e.g., [76], KAM theorem, etc.); see a sketch in Figure 2.

Figure 1. Illustration of some important differences between Eulerian and Lagrangian predictions (autonomous deterministic setting for simplicity of exposition). The discrepancy between two probability measures on the initial conditions propagated from the same initial measure at under dynamics induced by vector fields and can be large even if . Blue and green curves denote integral lines of the two vector fields (with a single hyperbolic fixed point), and the red-shaded contours indicate the supports of the time-marginal probability measures (at and respectively).

Quantification and mitigation of error in Lagrangian predictions due to the potentially uncertain Eulerian input is amenable to analysis in a probabilistic framework; in such a setting information-theoretic tools can be used to understand how to bound and optimise the lack of information in Lagrangian predictions obtained from imperfect Eulerian fields. To achieve this goal, which builds on a recent work [21], one needs to develop a framework which allows to ‘tune’ imperfect model dynamics so that their trajectory structure remains ‘close’ to that of the original system in an appropriate metric. Three major steps are needed to achieve our objective:

  • [leftmargin=0.8cm]

  • Determination of an appropriate probabilistic measure of discrepancy between two Lagrangian (trajectory-based) predictions.

  • Identification of the most important Lagrangian structures which need to be tuned in order to maximise the skill of reduced-order Lagrangian predictions.

  • Derivation of bounds on the error in the estimation of observables and the underlying probability measures from reduced models of the true dynamics.

The above challenges are addressed within a general information-theoretic framework, in which discrepancies between probability measures are defined via a class of premetrics referred to as divergencies. Following [21], a unified approach to this problem is based on utilising so-called -divergencies, , between probability measures and associated with the true dynamics and its approximation. Here, we develop approaches that provide uncertainty and sensitivity bounds for observables of interest over a finite-time horizon for non-autonomous stochastic models. The bounds are expressed in terms new -information inequalities (of Csiszár–Pinsker–Kullback type) which have the form

where , , , and iff or if is constant Moreover, we derive two distinct bounds on which is involved in :

  • [leftmargin = .7cm]

  • Bounds in terms of a certain auxiliary vector field involving differences between the vector fields and their approximations.

  • Bounds in terms of probabilistic measures of expansion rates in terms of so-called -FTDR fields [21] associated with the truth and its approximation.

The results derived in (a) provide an analytically tractable framework and they provide an analytically tractable connection between the Eulerian (field-based) model error and the uncertainty in Lagrangian (trajectory-based) predictions. The approach developed in (b) is based on a norm of the difference between so-called finite-time divergence rate (-FTDR) fields [21] which utilise a recently developed probabilistic framework for quantifying expansion rates in stochastic flows. The -FTDR bound has important connections to other probabilistic and geometric measures used in the past to study finite-time mixing and transport in stochastic flows. Importantly, this bound can be exploited within a computational framework to mitigate the error in Lagrangian predictions by tuning the fields of expansion rates in simplified models in order to optimally reproduce the original expansion rate fields.

The long-term goal is to use this framework for tuning imperfect Eulerian models generating the vector fields ( in the case of (1.1)) based on available empirical data in order to minimise the loss of relevant information in the subsequent Lagrangian predictions (see Figure 3 for a sketch of the framework).

Figure 2. Illustration of important differences between Eulerian and Lagrangian predictions (non-autonomous, time-periodic, deterministic setting). The discrepancy between two probability measures on the initial conditions propagated under the (Hamiltonian) dynamics induced by vector fields and can be large even if . Here, the red-shaded patches denote supports of invariant measures on the respective Poincaré sections. Different values of parameters in the -small term lead to different invariant measures (in this example ); all measures evolve from the same initial measure (supported on the green-shaded contour).

The contents of this article are as follows. Section 2 is devoted to the general formulation of the problem, where we also recall some relevant concepts and notations. In section 3, we introduce information theoretic pre-metrics, termed -divergencies, and outline their main properties following [21]. Then, we derive generalised information inequalities which involve the -divergencies between probability measures associated with the true and approximate dynamics. These information inequalities provide a unified framework for quantifying error in probability measures generated by approximate models, as well as errors in the corresponding observables. In section 4 we recall some relevant definitions and results concerned with stochastic flows, which are then used in section 5 to characterise information bounds for stochastic flows in terms of bounds on the -divergence between the true and approximate probability measures; these bounds are obtained via certain reconstruction of vector fields in section 5.1, and in section 5.2 in terms of scalar fields of trajectory-based divergence rates for stochastic flows derived in [21]. The analysis carried out for time-marginal measures in §5.1 is extended to measures on pathspace in section 5.3. Section 6 illustrates application of our results to a toy example of a slow-fast SDE. We close with some remarks on future work in section 7.

Figure 3. A conceptual sketch of a framework for tuning imperfect Eulerian models generating the vector fields ( in the case of (1.1)) based on available empirical data in order to minimise the loss of relevant information in the subsequent Lagrangian predictions. For simplicity, the dimension (in the spatial domain) of the true dynamics and its approximation is assumed the same, and the true dynamics is assumed to be deterministic; these restrictions are not necessary and the technical details are outlined in §2 and §5.

2. Setup and notations

Our starting point is to formulate the main notions and concepts needed in the construction of an information-theoretic framework for a probabilistic comparison of trajectory structure in stochastic flows and the associated uncertainty quantification and mitigation of model error in Lagrangian (i.e., trajectory-based) predictions.

2.1. Problem setup

Throughout this paper we assume that the original dynamics is defined either on or flat torus . We are concerned with characterising the evolution of functionals or ‘observables’, , defined on solutions of continuous-time dynamical systems generated by stochastic differential equations111We start from the Stratonovich form of the SDE rather than the Îto form since the former one is consistent with the physical limit which leads to stochastic perturbations in the deterministic dynamics (e.g., [43]). on


and observables based on the approximation of (2.1) on , ,


Here with , and are bounded measurable vector fields and is -dimensional Brownian motion on a complete probability space ; analogous notation holds for (2.2) with instead of , instead of , etc. Note that the labelling of the coefficients indicates that the initial values are distributed according to the respective measures and and not that the coefficients depend on the measures. We assume that the respective ‘drift’ and ‘diffusion’ terms , satisfy the standard conditions (e.g., [75, 51] for existence and uniqueness of solutions, , , of (2.1) and (2.2) respectively.

Consider the set of all probability measures on , and let , , , , be, respectively, time-marginals of the laws of and w.r.t. that satisfy (in the weak sense) the forward Kolmogorov equations (e.g., [51, 14, 75])


where is the dual of the corresponding differential operator (aka generator) given by


where analogous notation is assumed for the generator .

We are interested in a measure-based quantification of the discrepancy between the trajectory structure of the dynamics induced by (2.1) and (2.2); in applications the dynamics in (2.1) can be considered as the ‘truth’, and (2.2) to be its approximation. Note that this setting is very different from the previous considerations in [58, 60, 19, 59, 18, 20, 17] which focussed on uncertainty quantification in the vector fields generating the dynamics (2.1) and (2.2). An important issue when considering trajectory-based (Lagrangian) uncertainty quantification in applications is to find an appropriate measure of discrepancy between time-marginal probability measures and associated with the underlying dynamics222 Probability measures on pathspaces , are considered in §5.3.. One natural way to measure this discrepancy or ‘error’ is to consider a family of -divergences, , defined by (see §3.1)


where is the Radon-Nikodym derivative of 333 In all considerations involving -divergencies, Radon-Nikodym derivatives or absolute continuity we will denote by the time-marginal on defined on in order to simplify notation. with respect to (restricted to ) and is a strictly convex function. does not, in general, define a metric on the space of probability measures but it nevertheless allows to construct a very useful framework for a probabilistic quantification of modelling error in applications. Importantly, any -divergence satisfies information monotonicity [31, 32, 33] which is naturally imposed by physical constraints when simplifying/coarse-graining the underlying dynamics444 Information monotonicity of a divergence implies that , where and for all and for any measurable partition of .. Information-monotone divergencies are jointly convex in the arguments, and they uniquely determine (cf. [27]) a special Riemannian geometry with desirable invariance properties on the manifold of probability measures in which a Pythagorean-like decomposition and a geodesic projection theorem play a crucial role for applications to statistical estimation (e.g., [5, 2, 4, 3]). Note, in particular, that setting in (2.5) yields the KL-divergence [49] which is widely used in information theory and for uncertainty quantification in statistical inference (e.g., [30, 23, 56, 11, 12]). However, the suitability of the geometry induced by a given -divergence for uncertainty quantification depends on the particular application and on the considered submanifold of probability measures (e.g., [5, 2, 3, 31, 32, 33]); thus, a general framework cannot be restricted to any single -divergence. For example, KL-divergence is not the most suitable divergence to consider when dealing with measures whose densities are not in the exponential family; KL-divergence is also not suitable for sensitivity analysis of rare events in stochastic dynamical models (e.g., [10, 37]). It is worth stressing that a number of other divergences (or contrast functions), including Chernoff [28], Renyi [69], and Bregman [22] divergencies, have been extensively investigated in various contexts (information theory, statistical inference, optimisation, image processing, neural networks; e.g., [23, 12, 3, 56, 4, 31, 33]). However, these divergencies are not, in general, information monotone and are not suitable for our purposes.

2.2. Notation

Here, we list further definitions and notation which recurs throughout the paper.

Definition 2.1 (Wiener space).

We shall fix the probability space as the classical Wiener space, i.e., , , is a linear subspace of continuous functions that take value zero at , and endowed with the uniform norm

The sigma algebra is the Borel sigma algebra generated by open subsets of and is the Wiener measure, i.e., the law on induced by the -dimensional Wiener process.

Definition 2.2.

  • [leftmargin = .7cm]

  • For the following function spaces will also be in place.

    • [leftmargin=0.4cm]

    • set of bounded Borel measurable functions,

    • set of non-negative Borel measurable functions,

    • set of bounded continuous functions,

    • set continuous non-negative functions with compact supports,

    • set of smooth functions with compact supports.

  • Given the Borel measure space , we denote as the set of real-valued Lebesgue integrable functions satisfying

    Moreover, for , we define .

  • Let and and let be the Fréchet space of functions which are such that their -th derivative is -Hölder continuous with seminorms

    Here, is a compact convex subset of , and

  • Let be the Banach space of functions with the norms

  • Let be a closed interval with , where is the Lebesgue measure on .

    • [leftmargin = 0.4cm]

    • denotes the set of all jointly continuous vector fields such that and .

    • denotes the set of all jointly continuous vector fields such that and .

  • Let be matrix vector field. The Hilbert-Schmidt (or Frobenius) norm of is denoted by and it is defined by

  • Let be non-empty Borel set. One can restrict the sets of functions defined above to ; for example, means where

3. Information measures and inequalities

In this section, we first recall the notion of information-theoretic divergencies (§3.1), and we derive an extended version of the Csiszár–Pinsker–Kullback inequality in §3.2. The resulting bounds are subsequently utilised in the framework for Lagrangian uncertainty quantification. It is worth pointing out that, although we focus on Markovian flows generated by SDEs, the results discussed below can be easily extended to more general stochastic dynamical systems.

3.1. -divergencies

These generalised distances over a manifold of probability measures are given by premetrics constructed from a class of strictly convex functions satisfying the normality conditions


Let and be two probability measures on a measurable space . Then, the -divergence between and is defined by555 The definition of in (3.2) is closely related to that of -divergence due to Csiszar [31, 32, 33]. However, depending on the publication and the author, the conditions (3.1) are often not imposed and such -divergences might not even be premetrics. Here, we disambiguate the notation by requiring that used to generate necessarily satisfies the normality conditions (3.1); thus removing the symmetries , , , present in the general -divergencies. We also extend this definition in (3.3).


where is the Radon-Nikodym derivative of with respect to . In practice, verification of the absolute continuity condition in the definition (3.2) proves to be a subtle and a challenging task, in particular, for probability measures generated by SDEs. However, one can bypass this condition by finding a suitable dominating measure for both and . In such a case, an alternative definition for is constructed as follows: Let be any reference positive measure on such that and the -divergence between and is defined by


Note that definition (3.3) is independent of the reference measure used due to the uniqueness of the Radon-Nikodym derivative; in fact, this property implies invariance of the above definition w.r.t. a diffeomprphic change of variables. In general, is not symmetric and it does not satisfy the triangle inequality. However, due to Jensen’s inequality and (3.1), is information monotone (e.g., [31, 32, 33]); i.e., for any Markov kernel , we have

where , for all . Information monotonicity is naturally imposed by physical constraints when coarse-graining4 the underlying dynamics and it also implies that is a premetric; i.e., and iff almost everywhere. Importantly, -divergences belong to a class of convex integrals which admit the following duality representation (e.g. [6]): Let be a Polish space and . Then


where is the Legendre-Fenchel conjugate of ; i.e.,


It follows immediately from the above representation that is lower semicontinuous; i.e., if converges narrowly to then


and that it is jointly strictly convex in the arguments, i.e., for and


Various well-known divergencies used in information theory, probability theory and statistics are derived from (

3.2) or (3.3) with an appropriate choice the convex function . Examples of divergencies (some of them proper metrics) are listed below (cf. [56]):

Examples of -divergencies
Hellinger distance
Total variation
-divergence .

Information-monotone divergencies uniquely determine (cf. [27]) a special Riemannian geometry on the manifold of probability measures in which a Pythagorean-like decomposition and a geodesic projection theorem play a crucial role for applications of information-geometric framework to statistical estimation ([5, 2, 4, 3]). The suitability of the geometry induced by a given -divergence for uncertainty quantification depends on the particular application and on the considered submanifold of probability measures (e.g., [5, 2, 3, 31, 32, 33]). A number of other divergences, including Chernoff [28], Renyi [69], and Bregman [22] divergencies, have been extensively investigated in various contexts (e.g., [23, 12, 3, 56, 4, 31, 33]) but they are generally not information monotone. Given that we aim to exploit these geometric properties in the future work on uncertainty quantification in reduced-order models, we consider the whole family of -divergencies in the framework developed in the subsequent sections.

In particular, the Kullback-Libler divergence (KL-divergence) which is obtained by setting for in (3.2) which is of key importance in information theory and statistical estimation. The variational representation [34, 35] for KL-divergence is given by


Finally, we define an Orlicz subspace 666 The set of measurable functions is a subspace of a larger Orlicz space defined by


with the Orlicz norm The convex conjugate in (3.5) is locally bounded and the normality conditions (3.1) ensure that is a Young function, i.e.,

  • is lower-semicontinuous, and is not identically zero, and

  • for some .

This implies that the Orlicz subspace is well-defined and nontrivial, in the sense that . The Orlicz subspace will contain the class of observables under which the information inequality in subsection §3.2 will be formulated.

3.2. Information inequalities

Here, we derive certain bounds termed information inequalities which provide both observable- and measure-based quantification of modelling error within a unified abstract framework. These inequalities provide sharp weak error bounds tailored to a given observable by utilising the variational formulation of (cf. (3.4)), and they provide an extension of analogous bounds developed for the KL-divergence in [36, 29, 55]. Here, we derive a further generalisation of the Csiszár–Kullback–Pinsker inequality to the class of -divergencies and a much larger class of observables than those admissible for the KL - or -divergence; namely, the bound has the form (Theorem 3.2)


where , and iff or if is constant As discussed later in Theorem 3.2, in all the above formulas the observables are in the Orlicz subspace defined in (3.9). Furthermore, we derive a representation formula for in Proposition 3.4 which allows to re-write (3.10) as


where , , are such that , as . In §5.1 we develop the bound(3.11) further in the context of SDEs in order to bound in terms of differences between the vector fields generating the dynamics in (2.1) and (2.2); this step is important for bounding the error in estimates of (Lagrangian) observables explicitly in terms of the (Eulerian) vector fields generating the true and approximate underlying dynamics.

Remark 3.1.

  • [leftmargin=0.7cm]

  • An information-theoretic measure of finite-time average expansion rates of trajectories of non-autonomous SDEs/ODEs known as KL-divergence rate was shown in [21] to be linked to the commonly used finite-time Lyapunov functionals , , used in measuring the growth of perturbation in flows of ODEs; namely


    Here, is a solution of the forward Kolmogorov equation for the derivative flow, starting from of the initial perturbation One can recover the inequality (3.12) from the information inequality (3.11), by taking Extensions to stochastic flows were discussed in [21]. We shall return to a more general form of this idea in §5.2.

  • The information inequality (3.11) is related to the inequality proved long ago by Csiszár in [31] in terms of the total variation distance but it applies to a larger class of observables , namely . Csiszár’s result concerns the existence of depending on and with as such that


    On the other hand, the variational representation of is given by


    Combining the Csiszár inequality (3.13) with (3.14) yields


    It is relatively straightforward to verify that using (3.5) and (3.1), which implies that the information inequality (3.11) is valid for a larger class of observables. This is particularly useful when dealing with trajectory-based uncertainty quantification in pathspace, where it may not be simple to verify the information inequality (3.15) for the relevant class of . A typical example is the case of distance obtained for which leads to the Chapman-Robbins bound (e.g., [52])

    which is more useful when with , and need not be in .

Theorem 3.2.

Let be a pair of probability measures on a Polish space with , where is strictly convex, satisfying (3.1), and is twice continuously differentiable. Then, for any there exist such that






Proof: See Appendix A.1.

Remark 3.3.

The above result generalises the ‘goal-oriented information inequality’ for KL-divergence and developed in [29, 36, 55], to a class of all information monotone divergences. Note that, when the Orlicz subspace

is simply the class of all cumulant generating functions (aka logarithmic moment generating functions). The results in

[29, 36, 55] are based on the regularity of cummulant generating functions. Our generalisation relies on a convex-analytic approach under the normality conditions (3.1) imposed on or .

Proposition 3.4 (Representation formula for ).

Given the bound (3.16) and the assumptions of Theorem 3.2, consider the convex function defined by

  • [leftmargin = 0.7cm]

  • Let . If there exists such that for all . Then,

    where is given by , and is the Fenchel-Legendre conjugate of defined by

    Similarly, admits the representation