# A Geometric Framework for Stochastic Shape Analysis

We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic models of shapes, images and landmarks. The stochasticity is introduced in the vector field which transports the data in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework for shape analysis and image registration. The stochasticity thereby models errors or uncertainties of the flow in following the prescribed deformation velocity. The approach is illustrated in the example of finite dimensional landmark manifolds, whose stochastic evolution is studied both via the Fokker-Planck equation and by numerical simulations. We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points. The first of the two approaches matches moments of the Fokker-Planck equation to sample moments of the data, while the second approach employs an Expectation-Maximisation based algorithm using a Monte Carlo bridge sampling scheme to optimise the data likelihood. We derive and numerically test the ability of the two approaches to infer the spatial correlation length of the underlying noise.

## Authors

• 13 publications
• 5 publications
• 33 publications
12/16/2016

### A Stochastic Large Deformation Model for Computational Anatomy

In the study of shapes of human organs using computational anatomy, vari...
02/03/2020

### Diffusion bridges for stochastic Hamiltonian systems with applications to shape analysis

Stochastically evolving geometric systems are studied in geometric mecha...
05/15/2018

### String Methods for Stochastic Image and Shape Matching

Matching of images and analysis of shape differences is traditionally pu...
05/31/2017

### Bridge Simulation and Metric Estimation on Landmark Manifolds

We present an inference algorithm and connected Monte Carlo based estima...
12/13/2018

### Stochastic Image Deformation in Frequency Domain and Parameter Estimation using Moment Evolutions

Modelling deformation of anatomical objects observed in medical images c...
09/19/2018

### Nonisometric Surface Registration via Conformal Laplace-Beltrami Basis Pursuit

Surface registration is one of the most fundamental problems in geometry...
11/20/2017

### Stochastic metamorphosis with template uncertainties

In this paper, we investigate two stochastic perturbations of the metamo...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

In this work, we aim at modelling variability of shapes using a theory of stochastic perturbations consistent with the action of the diffeomorphism group underlying the Large Deformation Diffeomorphic Metric Mapping framework (LDDMM, see [younes_shapes_2010]). In applications, such variability arise and can be observed, for example, when human organs are influenced by disease processes, as analysed in computational anatomy [younes_evolutions_2009]

. Spatially independent white noise contains insufficient information to describe these large-scale variabilities of shapes. In addition, the coupling of the spatial correlations of the noise must be adapted to a variety of transformation properties of the shape spaces. The theory developed here addresses this problem by introducing spatially correlated transport noise which respects the geometric structure of the data. This method provides a new way of characterising stochastic variability of shapes using spatially correlated noise in the context of the standard LDDMM framework.

We will show that this specific type of noise can be used for all the data structures to which the LDDMM framework applies. The LDDMM theory was initiated by [trouve_infinite_1995, christensen_deformable_1996, dupuis_variational_1998, MiTrYo2002, beg2005computing] based on the pattern theory of [grenander_general_1994]

. LDDMM models the dynamics of shapes by the action of diffeomorphisms (smooth invertible transformations) on shape spaces. It gives a unified approach to shape modelling and shape analysis that is valid for a range of structures such as landmarks, curves, surfaces, images, densities or even tensor-valued images. For any such data structure, the optimal shape deformations are described via the Euler-Poincaré equation of the diffeomorphism group, usually referred to as the EPDiff equation

[holm1998euler, HoMa2004, younes_evolutions_2009]. In this work, we will show how to obtain a stochastic EPDiff equation valid for any data structure, and in particular for the finite dimensional spaces of landmarks. For this, we will follow the LDDMM derivation in [bruveris2011momentum] based on geometric mechanics [marsden_introduction_1999, holm_geometric_2011]. This view is based on the existence of momentum maps, which are characterized by the transformation properties of the data structures for images and shapes. These momentum maps persist in the process of introducing noise into the EPDiff equation, and they thereby preserve most of the technology developed for shape analysis in the deterministic context and in computational anatomy.

This work is not the first to consider stochastic evolutions in LDDMM. Indeed, [TrVi2012, vialard2013extension] and more recently [marsland2016langevin] have already investigated the possibility of stochastic perturbations of landmark dynamics. In these works, the noise is introduced into the momentum equation, as though it was an external random force acting on each landmark independently. In [marsland2016langevin], an extra dissipative force was added to balance the energy input from the noise and to make the dynamics correspond to a certain type of heat bath used in statistical physics. [staneva_learning_2017, sommer_bridge_2017]

considered evolutions on the landmark manifold with stochastic parts being Brownian motion with respect to a Riemannian metric and estimated parameters of the models from observed data. Here, we will introduce Eulerian noise directly into the reconstruction relation used to find the deformation flows from the velocity fields, which are solutions of the EPDiff equation

[HoMa2004, younes_shapes_2010]. As we will see, this derivation of stochastic models is compatible with variational principles, preserves the momentum map structure and yields a stochastic EPDiff equation with a novel type of multiplicative noise, depending on the gradient of the solution, as well as its magnitude. This model is based on the previous works [holm2015variational, arnaudon2016noise], where, respectively, stochastic perturbations of infinite and finite dimensional mechanical systems were considered. The Eulerian nature of the noise discussed here implies that the noise correlation depends on the image position and not, as for example in [TrVi2012, marsland2016langevin], on the landmarks themselves. Consequently, the present method for the introduction of noise is compatible with any data structure, for any choice of its spatial correlation. We also mention the conference paper [arnaudon2016stochastic2] in which the basic theory underlying the present work was applied to shape transformations of the corpus callosum. We discuss possibilities for including Lagrangian noise advected with the flow in contrast to the present Eulerian case, and possibilities for including non-stationary correlation statistics that responds to the evolution of advected quantities, in the conclusion of the paper.

To illustrate this framework and give an immediate demonstration of stochastic landmark dynamics, we display in Figure 1 three experiments which compare the proposed model with a stochastic forcing model, of the type studied in [TrVi2012]. The proposed model introduces the following stochastic Hamiltonian system for the positions of the landmarks, , and their canonically conjugate momenta, ,

 (1.1)

In (1.1), the are prescribed functions of space which represent the spatial correlations of the noise. In Figure 1, the

fields are Gaussians whose variance is equal to twice their separation distance and locations are indicated by black dots. We compare this model with the system,

 dqαi (1.2)

where is a constant. In this case, the noise corresponds to a stochastic force acting on the landmarks, whose corresponding Brownian motion is different for each landmark. We show on the first panel of Figure 1 that for a small number of landmarks and a large range of spatial correlations of the noise, both types of stochastic deformations in (1.1) and (1.2) visually coincide. This is shown for a simple experiment in translating a circle (from the black circle to the black dashed circle). By doubling the number of landmarks (middle panel), the dynamics of (1.2) results in small-scale noise correlation (magenta), whereas the proposed model (blue) remains equivalent to the first experiment. This figure illustrates shape evolution when the noise is Eulerian and independent of the data structure. Indeed, the limit of a large number of landmarks corresponds to a certain continuum limit, in this case corresponding to curve dynamics. Finally, in the right-most panel, we reduce the range of the spatial correlation of the noise by adding more noise fields. This arrangement allows us to qualitatively reproduce the dynamics of the equation (1.2) with the same number of landmarks as the amount of noise and its spatial correlation is similar in both cases. Indeed, the spatial correlations are dictated by the Eulerian functions defined in fixed space for our model, and by the density of landmarks in the stochastically forced landmark model.

Modelling large-scale shape variability with noise is of interest for applications in computational anatomy, in which sources of variability include natural ageing, the influence of diseases such as Alzheimer’s disease, and intra-subject population scale variations. In the LDDMM context, these effects are sometimes modelled using the random orbit model [miller_statistical_1997]. The random orbit approach models variability in the observed data by using an ensemble of initial velocities in matching a template to a set of observations via geodesic flows, see [vaillant_statistics_2004]. The randomness is confined to the initial velocity as opposed to the evolving stochastic processes used in the present work. A prior can be defined by assuming a distribution of the initial velocities, and Bayesian approaches can then be used for inference of the template shape as well as additional unknown parameters [allassonniere_towards_2007, ma_bayesian_2008, zhang_bayesian_2013]. The stochastic model developed here can also be applied to model random warps and to generate distributions used in Bayesian shape modelling, and for coupling warps and functional variations such as those in [raket2014nonlinear, kuhnel_most_2017]. Indeed, because the proposed probabilistic approach assigns a likelihood to random deformations, the model can be used for general likelihood-based inference tasks.

In the present model, the observed shape variability indicates the required spatial correlation of the noise, which must be specified or inferred for each application. As this correlation is generally unknown, estimating the parameters of the correlation structure becomes an important part of the framework. We will address the problem of inferring the noise parameters by considering two different methods in the context of the representation of shapes by landmarks: The first method is based on estimating the time evolution of the probability distribution of each landmark. We will derive a set of differential equations approximating the time evolution of the complete distribution via its first moments. We can then solve the inverse problem of estimating the noise correlation from known initial and final distribution of landmarks by minimization of a certain cost function, solved using a genetic algorithm. The second method is based on an Expectation-Maximisation (EM) algorithm which can infer unknown parameters for a parametric statistical model from observed data. In this context, since only initial and final landmarks positions are observed, the full stochastic trajectories are regarded as missing information. For this algorithm, we need to estimate the likelihood of stochastic paths connecting sets of observed landmarks. We achieve this by adapting the theory of diffusion bridges to the stochastic landmark equation. As discussed in the concluding remarks, inference methods for other data structures, in particular for infinite dimensional shape representations, are not treated in this paper and left as outstanding problems for future work.

Finally, we wish to mention that multiple additional approaches for shapes analysis exist outside the LDDMM context, particularly exemplified by the Kendall shape spaces [kendall1984shape], see also [dryden2016statistical]. We focus this paper on the LDDMM framework leaving possibilities for extending the presented methods to include stochastic dynamics and noise inference in other shape analysis approaches to future work.

### Plan of this work

We begin by developing a general theory of stochastic perturbations for inexact matching in section 2. We then focus on exact landmark matching in section 3, which is the simplest example of this theory. In particular, we derive the Fokker-Planck equation in section 3.2 and diffusion bridge simulation in section 3.3. In section 4, we describe the two methods we use for estimating parameters of the noise from observations. The Fokker-Planck based method is discussed in section 4.2 and the Expectation-Maximisation algorithm is treated in section 4.3. We end the paper with numerical examples in section 5, in which we investigate the effect of the noise on landmark dynamics and compare the two methods for estimating the noise amplitude.

## 2. Stochastic Large Deformation Matching

In this section, we will first review the geometrical framework of LDDMM, following [bruveris2011momentum], and then introduce noise following [holm2015variational] to preserve the geometrical structure of LDDMM. The key ingredient for both topics is the momentum map, which we will use as the main tool for reducing the infinite dimensional equation on the diffeomorphism group to equations on shape spaces.

### 2.1. The Deterministic LDDMM Model

Here, we will briefly review the theory of reduction by symmetry, as applied to the LDDMM context, following the presentation of [bruveris2011momentum]. We detail the proof of the formulas below in the next section when we include noise. Define an energy functional by

 E(ut)=∫10l(ut)dt+12λ2∥g1.I0−I1∥2, (2.1)

where are shapes represented in a vector space on which the diffeomorphism group acts, is a time-dependent vector field, and is a weight, or tolerance, which allows the matching to be inexact. The flow corresponding to is found by solving the reconstruction relation

 ∂tgt=utgt, (2.2)

and is matched against through the action of on . The vector field can be considered an element of the Lie algebra . In the case of being images , the action is by push-forward, , and when represents landmarks with positions , the action is by evaluation (see [bruveris2011momentum] for more details). The group elements can act on various additional shape structures such as tensor fields.

###### Remark 2.1 (Nonlinear shape structures).

This framework can be extended to structures that are not represented by a vector space , such as curves or surfaces. We leave this extension for future work.

Using the calculus of variations for the functional (2.1) results in the equation of motion for of the form

which is called the Euler-Poincaré equation. The operation is the coadjoint action of the Lie algebra of vector fields associated to the diffeomorphism group. The operation acts on the variations , which are 1-form densities, in the dual of the Lie algebra of vector fields, under the pairing. When is a norm, this equation is the geodesic equation for that norm, in the case that ; that is, with exact matching. We will focus on this case later in section 3 when discussing landmark dynamics. Here, the inexact matching term constrains the form of the momentum to depend on the geodesic path. Following the notation of [bruveris2011momentum], the momentum map is defined as

 m(t)=−1λ2J0t⋄(gt,1(J01−J11)♭), (2.4)

where is the solution of (2.2) at time with initial conditions at time , while and . The value corresponds to the initial shape, pushed forward to time , and is the target shape.

The operations and in the momentum map formula (2.4) are defined, as follows. The Lagrangian in (2.1) may be taken as kinetic energy, which defines a scalar product and norm on the space of vector fields . The quantity may then be regarded as the momentum conjugate to the velocity . Similarly, for the image data space , we define the dual space with the pairing , where and is the image domain . This identification defines the operator as . When an element of the diffeomorphism group acts on by push-forward, , the corresponding infinitesimal action of the velocity in the Lie algebra of vector fields is given by . In terms of this infinitesimal action, we can then define the operation as

 ⟨I⋄f,u⟩g×g∗:=⟨f,u.I⟩V×V∗. (2.5)

A detailed derivation of this formula for the momentum map can be found in [bruveris2011momentum].

###### Remark 2.2 (Solving this equation).

We will just add here the important remark that the relation (2.4) introduces nonlocality into the problem, as the momentum implicitly depends on the value of the group at later times. This is exactly what is needed in order to solve the boundary value problem coming from the matching of images and . The optimal vector field can be found with a shooting method or a gradient descent algorithm on the energy functional (2.1), see [beg2005computing]. For more information about the relation of the momentum map approach of [bruveris2011momentum] to the LDDMM approach of [beg2005computing], see [bruveris2015geometry].

### 2.2. Stochastic Reduction Theory

The aim here is to introduce noise in the Euler-Poincaré equation (2.3) while preserving the momentum map (2.4); so that the noise descends to the shape spaces. Following [holm2015variational], we introduce noise in the reconstruction relation (2.2) and proceed with the theory of reduction by symmetry. We will focus on a noise described by a set of real-valued independent Wiener processes together with associated vector fields on the data domain. We will later discuss particular forms of these fields and methods for estimating unknown parameters of the fields in the context of landmark matching.

###### Remark 2.3 (Dimension of the noise).

We proceed here with a finite number of associated vector fields and finite dimensional noise while leaving possible extension to infinite dimensional noise such as done by [vialard2013extension] for later works.

We replace the reconstruction relation (2.2) by the following stochastic process

 dgt=utgtdt+J∑l=1σlgt∘dWlt, (2.6)

where denotes Stratonovich integration. That is, the Lie group trajectory is now a stochastic process. With this noise construction, the previous derivations of (2.3) and (2.4) in [bruveris2011momentum] still apply and we obtain the following result for the stochastic vector field, .

###### Proposition 2.4.

Under stochastic perturbations of the form (2.6), the momentum map (2.4) persists, and the Euler-Poincaré equation takes the form

###### Proof.

We first show that the momentum map formula (2.4) persists in the presence of noise. The key step in its computation is to prove the formula in the lemma of [bruveris2011momentum] which is given by , where is the adjoint action on the diffeomorphism group on its Lie algebra. We first compute the variations of (2.6)

 δdgt=δugdt+uδgdt+J∑l=1σlδg∘dWlt,

and then prove this formula by a direct computation

This key formula is the same as in [beg2005computing] and [bruveris2011momentum] for the deterministic case. In particular, it does not explicitly depend on the Wiener processes . This ensures that the momentum map formula (2.4) remains the same as in the deterministic case. The last step of the proof is to derive the stochastic Euler-Poincaré equation (2.7). This is done by computing the stochastic evolution of the momentum, given by

The only time dependence is in the coadjoint action, and, by the standard formula

we obtain the result

where we have used the stochastic reconstruction relation (2.6) in the form

 dgg−1=udt+N∑l=1σl∘dWtl.

In summary, this stochastic perturbation of the LDDMM framework preserves the form of momentum map (2.4), although it does affect the reconstruction relation (2.6) and the Euler-Poincaré equation (2.7). As shown in [bruveris2011momentum], various data structures fit into this framework including landmarks, images, shapes, and tensor fields. In practice, for inexact matching, a gradient descent algorithm can be used to minimise the energy functional (2.1). The noise will only appear in the evaluation of the matching cost via the reconstruction relation. The algorithm of [beg2005computing] then directly applies, provided the stochastic reconstruction relation can be integrated with enough accuracy. We will not treat the full inexact matching problem here. Instead, we will study the simpler case of exact matching, where the energy functional consists only of the kinetic term.

The exact matching problem in computational anatomy possesses many parallels with the geometric approach to classical mechanics and ideal fluid dynamics. In particular, Poincaré’s fundamental paper in 1901, which started the field of geometric mechanics in finite dimensions, has recently been generalised to the stochastic case [cruzeiro2017momentum]. In addition, the fundamental analytical properties of Euler’s fluid equations have been shown to extend to the stochastic case in [holm2017euler].

We expect that these advances in the analysis of SPDEs occurring in fluid dynamics and other parallel fields will inform computational anatomy, and eventually will apply to infinite-dimensional representations of shape. One reason for our optimism is that the fundamental analytical properties of incompressible Euler fluid dynamics in three dimensions have already been found in [holm2017euler] to persist under the introduction of the present type of stochasticity. Namely, the properties of local-in-time existence and uniqueness, as well as the Beal-Kato-Majda criterion for blow-up for the deterministic 3D Euler fluid motion equations, all persist in detail for stochastic Euler fluid motion, under the introduction of the type of stochastic Lie transport by cylindrical Stratonovich noise that we have proposed here for stochastic shape analysis.

The persistence of deterministic analytical properties in passing to the SPDEs governing stochastic 3D incompressible continuum fluid dynamics is a type of infinite-dimensional result that has not yet been proven for the evolution of shapes. The corresponding results in the analysis of SPDEs for embeddings, immersions and curves representing data structures for shape evolution, for example, have not yet been discovered, and they remain now as outstanding open problems. However, we believe that the prospects for successfully performing the necessary analysis are hopeful because the type of noise we propose here preserves the fundamental properties of diffeomorphic flow for both continuum fluids and shapes. For example, the momentum maps for the deterministic and stochastic evolution of shapes of any data structure are identical. Thus, the only difference in the present approach from the deterministic case is that the diffeomorphic time evolution of the various shape momentum maps proceeds by the action of Lie derivative by a stochastic vector field, instead of a deterministic one. Since the stochastic part of the vector field is as smooth as we wish, we are hopeful that the analytical properties for the deterministic evolution of a large class of infinite-dimensional representations of shape (such as smooth embeddings) will also persist under the introduction of the type of stochastic transport proposed here. For the remainder of the paper, we restrict ourselves to the treatment of stochastic landmark dynamics.

## 3. Exact stochastic Landmark Matching

In this section, we apply the previous ideas of stochastic deformation of LDDMM to exact matching with landmark dynamics. This is the simplest data structure in the LDDMM framework, and it will serve to give interesting insights into the effect of the noise in this context. Since exact matching means that the energy functional contains only a kinetic energy, the optimal vector field is found from a boundary value problem with the Euler-Poincaré equation (2.3). For exact matching, the singular momentum map for landmarks takes the simple familiar form for the reduction of the EPDiff equation (see [CaHo1993, HoMa2004])

 m(x,t)=N∑i=0pi(t)δ(x−qi(t)), (3.1)

for landmarks with momenta and positions , with . A direct substitution of into the stochastic Euler-Poincaré equation (2.7) gives the stochastic landmark equations in (3.6). Here, is a given kernel corresponding to the Green’s function of the differential operator used to construct the Lagrangian. Below, we take a different approach and proceed from a variational principle in which the stochastic landmark dynamics is constrained. We refer the interested reader to for example [jacobs2014higher] for a detailed exposition of this derivation in the deterministic context.

### 3.1. Stochastic Landmarks dynamics

Recall that for landmarks in , the diffeomorphism group elements act on the landmarks by evaluation of their position , and the associated momentum map is (3.1). The original action functional (2.1) can be equivalently written as a constrained variational principle where the play the role of Lagrange multipliers enforcing the stochastic reconstruction relation (2.6). This procedure is based on the Clebsch action principle, which for landmark dynamics has been studied for one dimensional motion of landmarks on the real line in [HoTy2016]

 S(u,q,p)=∬l(u)dxdt+∑i∫pi⋅(∘dqi−u(qi)dt+∑lσl(qi)∘dWlt). (3.2)

Notice that only the Lagrangian depends on the spatial (Eulerian) variable on the image domain. We now use the singular momentum map (3.1) which provides us with the relation

 2l(u)=∫m(q,p)(x)⋅u(x)dx=∑ipi⋅u(qi).

This relation reduces the action functional (3.2) to the finite dimensional space of landmarks. We arrive at the action integral

 S(q,p) =∫h(q,p)dt+∑i∫pi⋅(∘dqi+∑lσl(qi)∘dWlt), (3.3)

where the Hamiltonian only depends on the landmark variables, as

 h(q,p)=12N∑i,j=1(pi⋅pj)K(qi−qj). (3.4)

The action integral in (3.3) involves the phase space Lagrangian (3.4) and the stochastic potential, given by

 ϕl(q,p):=∑ipi⋅σl(qi). (3.5)

Taking free variations of (3.3) gives the stochastic Hamilton equations in the form

 (3.6)

Explicitly, we have

 dqi=∑jpjK(qi−qj)dt+∑lσl(qi)∘dWlt,dpi=−∑jpi⋅pj∂∂qiK(qi−qj) dt−∑l∂∂qi(pi⋅σl(qi))∘dWlt. (3.7)

In coordinates, the stochastic equations (3.6) become

 dqαi=∂h∂pαidt+∑lσαl(qi)∘dWlt,dpαi=−∂h∂qαidt−∑l,β∂σβl(qi)∂qαipβi∘dWlt, (3.8)

where run through the domain directions, .

In order to have a unique strong solution of this stochastic differential equation, we need the drift and volatility to be Lipschitz functions with a linear growth condition after converting to Itô form, and for the volatility to be uniformly bounded, see [karatzas1991brownian]. This requirement is achieved when the functions are twice continuously differentiable and uniformly bounded in the position variable. The latter property will hold with these functions being kernel functions. The particular form of the stochastic potential in (3.5) arises from the Legendre transformation of (3.2). The solutions of (3.8) represent the singular solutions of the stochastic EPDiff equation, corresponding to a stochastic path in the diffeomorphism group. In previous works such as [TrVi2012, vialard2013extension, marsland2016langevin], noise has been introduced additively and only in the momentum equation, corresponding to a stochastic force. Also, the noise has typically been taken to be different for each landmark, and one can interpret it having been attached to each landmark. In the present case, the noise is not additive and the Wiener processes are not related to the landmarks, but to the domain of the image. Nearby landmarks will thus be affected by a similar noise, controlled by the spatial correlations of the noise. We refer to Figure 1 in the Introduction for a numerical experiment demonstrating this effect.

###### Remark 3.1 (Geometric noise).

The geometric origin of the Hamiltonian stochastic equations in (3.6) deserves a bit more explanation. In the position equation (3.6), the noise arises as the infinitesimal transformation by the action of the stochastic vector field in (2.6), namely , on the manifold of positions of the landmarks, which is generated by the stochastic potentials, . Since this stochastic Hamiltonian is linear in the canonical momenta, the noise perturbing the evolution of the landmark positions is independent of the landmark momenta. On the other hand, the noise in the momentum equations arises as the cotangent lift of the action of the stochastic vector field on the positions of the landmarks. This cotangent lift determines the action on the momentum fibres attached to the perturbed position of each of the landmarks in phase space. The cotangent lift transformation is given explicitly by the product of the momentum and the gradient of the spatial fields with respect to the position of the -th landmark. This difference increases the effect of the noise in regions where the fields have large spatial gradients, provided the landmarks are moving rapidly enough for their momenta to be non-negligible. We will see in the example that in certain cases this balance in the product of the momentum and the spatial gradient of the noise parameters can significantly affect the dynamics of the landmarks.

### 3.2. The Fokker-Planck Equation

In this section, we study the evolution of the probability density function of the stochastic landmarks by using the Fokker-Planck equation. This study is possible in the case of landmarks because the associated phase space is finite dimensional.

We will denote the probability density by , on the phase space at time . The Fokker-Planck equation can be computed using standard procedures and is given in the following proposition.

###### Proposition 3.2.

The Fokker-Planck equation associated to the stochastic process (3.6) for the probability distribution is given by

 ddtP(q,p,t)={h,P}can+12∑l{ϕl,{ϕl,P}can}can:=L∗P, (3.9)

where is the canonical Poisson bracket with and are the stochastic potentials. This formula also defines the forward Kolmogorov operator, .

###### Proof.

The proof follows the standard derivation of the Fokker-Planck equation, by taking into account the geometrical structure of the stochastic process (3.6). The time evolution of an arbitrary function can be written as

 df(p,q)={f,h}candt+∑l{f,ϕl}can∘dWlt,

as both drift and volatility have the same Hamiltonian form in the Stratonovich formulation. We then compute the Itô correction of this stochastic process, which is can be written as a double Poisson bracket form; namely, . The Itô correction is the quadratic variation of the Stratonovich term in the stochastic differential equation, which equals the non-stochastic part of one half of the time derivative of the volatility (where a square Brownian motion becomes ). We refer to [arnaudon2016noise, cruzeiro2017momentum] for a more detailed derivation of this formula in a general setting. Taking the expectation of the Itô process then removes the noise term and defines the forward Kolmogorov operator such that . By pairing this formula with the density function over the phase space by using the usual pairing, as

 ∫P(q,p,t)Lf(q,p)dqdp=∫L∗P(q,p,t)f(q,p)dqdp,

we obtain the Fokker-Planck equation , which is explicitly given by (3.9) as the double bracket term is self-adjoint and the advection term anti-self-adjoint. Notice that here we have used a special property of the Poisson bracket; namely, that the Poisson bracket is also a symplectic -form, which is exact and whose integral over the whole phase space vanishes, provided we choose suitable boundary conditions. We again refer to [arnaudon2016noise, cruzeiro2017momentum] for more details about this derivation. ∎

Of course, the direct study of this equation is not possible, even numerically, because of its high dimensionality. The main use here of the Fokker-Planck equation will be to understand the time evolution of uncertainties around each landmark. Indeed, for each landmark , the corresponding marginal distribution (integrating

over all the other variables) will represent the time evolution of the error on the mean trajectory of this landmark. We will show in the next section how to approximate the Fokker-Planck equation with a finite set of ordinary differential equations which describe the dynamics of the first moments of the distribution

. This will then be used to estimate parameters of the noise fields for given sets of initial and final landmarks.

###### Remark 3.3 (On ergodicity).

The question of ergodicity of the process (3.6) is not relevant here, as we will only consider this process for finite times, usually between and . The existence of stationary measures of the Fokker-Planck equation via Hörmander’s theorem is thus not needed. Nevertheless, we will rely on a notion of reachability in the landmark position in the next section, where we will show how to sample diffusion bridges for landmarks with fixed initial and final positions. This ensures that there exists a noise realisation which can bring any set of landmarks to any other set of landmarks. This property is weaker than the Hörmander condition and was introduced in [sussmann1973orbits].

### 3.3. Diffusion Bridges

The transition probability and solution to the Fokker-Planck equation can also be estimated by Monte Carlo sampling of diffusion bridges. This approach will, in particular, be natural for maximum likelihood estimation of parameters of landmark processes using the Expectation-Maximisation (EM) algorithm that will involve expectation over unobserved landmark trajectories, or for direct optimization of the data likelihood. The EM estimation approach will be used in section 4.3. Here, we develop a theory of conditioned bridge processes for landmark dynamics which we will employ in the estimation. The approach is based on the method of [delyon2006simulation] with two main modifications. The scheme and its modifications will be detailed after a short description of the general situation. Alternative methods for simulating conditioned diffusion bridges can be found in e.g. [papaspiliopoulos_importance_2012, bladt_simulation_2016, schauer_guided_2017].

In [delyon2006simulation], a Girsanov formula [girsanov_transforming_1960], generalized to account for unbounded drifts, is used to show that when the diffusion field of an -valued diffusion process

 (3.10)

is uniformly invertible, the corresponding process conditioned on hitting a point at time is absolutely continuous with respect to an explicitly constructed unconditioned process that will hit at time a.s.. The modified process is constructed by adding an additional drift term that forces the process towards the target . In [delyon2006simulation], this process is constructed as a modification of (3.10)

 d^x=b(^x,t)dt−^x−vT−tdt+Σ(^x,t)dW. (3.11)

Letting denote the law of conditioned on hitting with corresponding expectation , the Cameron-Martin-Girsanov theorem implies that is absolutely continuous with respect to , see for example [oksendal_stochastic_2003] and the discussion in [papaspiliopoulos_importance_2012]. An explicit expression for the Radon-Nikodym derivative can be computed, and this derivative is central for using simulations of the process to compute expectations over the conditioned process . In particular, as shown in [delyon2006simulation], the conditioned process and the modified process are related by

 Ex|v(f(x))=E^x(f(^x)φ(^x))E^x(φ(^x)), (3.12)

where is a correction factor applied to each stochastic bridge . Notice here that is a real-valued function of the stochastic path from to .

Returning to landmark evolutions in the phase space , the process (3.6) has two vector variables that typically will be conditioned on hitting a fixed set of landmark positions at time . The conditioning thus happens only in the variables by requiring . To construct bridges with an approach similar to the scheme of [delyon2006simulation], we need to find an appropriate extra drift term and handle the fact that the diffusion field may not be invertible in general. Recall first that the landmark process (3.6) has diffusion field

 Σ(q,p)=(Σq(q)Σp(q,p)):=(σ1(q),…,σJ(q)−∇q(p⋅σ1(q)),…,−∇q(p⋅σJ(q))), (3.13)

where denotes the vector . Notice that this matrix is not square and has dimension so that with a -vector corresponds to the stochastic term of (3.6). Though couples the and equation, when the number of noise fields is sufficiently large, the part will often be surjective as a linear map . In this situation, by letting denote the Moore-Penrose pseudo-inverse of , we can construct a guiding drift term as

 G(q,p):=−Σ(q,p)Σq(q)†(q−v)T−t. (3.14)

This term, when added to the process (3.6) and when measures are taken to control the unbounded drift of (3.6), will ensure that the modified process hits a.s. at time . The drift term (3.14) is a direct generalization of the term added in (3.11). If had been invertible then resulting in the guiding term of [delyon2006simulation] used in equation (3.11). In the current non-invertible case, uses the difference which only involves the landmark position but affects both the position and the momentum equations. We stress here the fact that the introduction of noise in the equation in (3.6) is essential in our present approach. When conditioning on the variable, a guided process could not directly be constructed in this way, if the noise had been introduced only in the equation, as in [TrVi2012, vialard2013extension, marsland2016langevin]. The fact that this term is weighted by is also important as it allows the guiding term to be more efficient in the noisy regions of the image, where there is more freedom to deviate from the deterministic path. The guiding term can be interpreted as originating from a time-rescaled gradient flow, and with the guiding term added, the diffusion process can be seen as a stochastically perturbed gradient flow, see [arnaudon2016stochastic2].

The guiding term (3.14) is, in practice, not always appropriate for landmarks. Because the correction is dependent only on the difference to the target in the position equation, a phenomenon of over-shooting is often observed. In such cases, the landmarks travel too fast initially due to a large momentum, strengthened by the guiding term that forces the landmarks towards . The high initial speed is only corrected when the time approaches and the guiding term brings the landmark back to their final position. This effect is illustrated in Figure 4 in section 5.2 and results in low values of the correction factor used to compute the expectation in (3.12). This effect tends to produce inefficient samples when approximating (3.12) by Monte Carlo sampling. As an alternative, upon letting denote the drift term of (3.6), we employ a guided diffusion process of the form

 (d^qd^p)=b(^q,^p)dt−Σ(^q,^p)Σq(^q)†(ϕt,T(^q,^p)−v)T−tdt+Σ(^q,^p)∘dW, (3.15)

for some appropriately chosen function that gives an estimate of the value of using the value of the modified stochastic process at time . The hat denotes the solution of the process (3.15), which is different from the original dynamics of the process (3.6) written without the hats. The choice recovers the guiding term (3.14). It would be natural to define . The resulting guiding term will only be driven by the expected amount needed at the endpoint, not from the value at time . A similar choice but easier to handle is to let be the solution at time of the original deterministic landmark dynamics (2.3), obtained from the initial conditions . We will use this latter choice with a modification to ensure its time derivative is bounded. The approach is visualised in Figure 4. To ensure convergence of for , a bounded approximation will be chosen to replace the original unbounded drift in (3.15). As it turns out, this choice has little influence in practice.

The matrix in (3.15) only accounts for the dynamics in the pseudo-inverse . When the momentum is high and the noise fields have high gradients, this fact can again lead to improbable sample paths. In such cases, the scheme can be further generalised by using a guiding term of the form

 1T−tΣ(^q,^p)(Dh(ϕt,T(Σ(^q,^p)h))|h=0)†(ϕt,T(^q,^p)−v). (3.16)

The matrix is a linear approximation of the expected endpoint dynamics as a function of the noise vector . Again, with , the original guiding term (3.14) is recovered, and the term is close to the guiding term of (3.15) when the momentum or gradients of are low. We use this term for the experiments in section 5.2 involving high momentum dynamics, e.g. Figure 6.

The following result is an extension of [delyon2006simulation, Theorem 5] and [marchand_conditioning_2011, Theorem 3] to the modified guided SDE (3.15). It is the basis for the EM approach for estimating the parameters of the landmark processes developed in section 4.3. Please note that the Girsanov theorem [delyon2006simulation, Thm. 1] which relates the modified and original process, does not assume that is invertible. The main analytic consequence of the non-invertibility is that the process is semi-elliptic and the transition density, therefore, cannot be bounded by Aronson’s estimation [aronson_bounds_1967]. Instead, we here assume continuity and boundedness of the density of in small intervals of in the sense of the assumption below. We write for the transition density at time of a solution to (3.6) started at . Similarly, when conditioning only on , we write .

###### Assumption 1.

For any and , the process has a density and the map is continuous in and and bounded on sets for , sufficiently small , and any integrable function .

The interpretation of Assumption 1 is that, given any distribution of initial conditions with density , the resulting -transition density of the process is continuous and bounded in and . As shown in Lemma A.2, Assumption 1 can be slightly weakened if Theorem 3.4 is only used to approximate the transition density at time as opposed to expectations for arbitrary measurable functions .

We let denote the Wiener space of continuous paths .

###### Theorem 3.4.

Assume is surjective for all with bounded, and that is , bounded, and with bounded derivatives. Let be a bounded approximation of the -part of the drift , and set . Let be a point with positive, and let be the law of . Let be solution to (3.15), with a map with bounded on . Then, for positive measurable ,

 E(q,p)|v[f(q,p)]=limt→TE(^q,^p)[f(^q,^p)φ(^q,^p,t)]E(^q,^p)[φ(^q,^p,t)], (3.17)

with

 logφ(q,p,t)=−∫t0(q−v)TA(q)~b(q,p)dsT−s −∫t0(q−v)T(dA(q))(q−v)2(T−s)−∑i,j∫t0d[Aij(q),(q−v)i(q−v)j)]T−s +∫t0(bq(q,p)−~bq(q,p))TΣq(q)†,TdW−12∫t0∥Σq(q)†(bq(q,p)−~bq(q,p))∥2ds +∫t0(φt,T(q,p)−q)TΣq(q)†,TdWT−t−12∫t0∥∥∥Σq(q)†(φt,T(q,p)−q)T−t∥∥∥2ds,

 (3.18)

In the Theorem, denotes the quadratic variation of semimartingales. As mentioned above, a bounded approximation must be used to replace the original drift term in (3.15). The last integrals in the expression for are results of this approximation, and the use of the map .

The result is proved in Appendix A. If had been invertible and if the guidance scheme (3.11) was used, the result of [delyon2006simulation] would imply that the right hand side limit of (3.17) would equal

Extending the convergence argument to the present non-invertible case is non-trivial, and we postpone investigating this to future work. For numerical computations, can be approximated by finite differences. As described later in the paper, we do this using a framework that allows symbolic evaluation of gradients and thus subsequent optimization for parameters of the processes.

## 4. Estimating the Spatial Correlation of the Noise

We now assume a set of observed landmark configurations at time , i.e. the observations are considered realisations of the stochastic process at some positive time . From this data, we aim at inferring parameters of the model. This can be both parameters of the noise fields and parameters for the initial configuration . The initial configuration can be deterministic with fixed known or unknown parameters, or it can be randomly chosen from a distribution with known or unknown parameters. We develop two different strategies for performing the inference. The first inference method in section 4.2 is a shooting method based on solving the evolution of the first moments of the probability distribution of the landmark positions while the second method in section 4.3 is based on the Expectation-Maximisation (EM) algorithm. The discussion is here in the context of landmarks, although these ideas may also apply in the more general context of section 2.

### 4.1. The Noise Fields

We start by discussing the form of the unknown noise fields . To estimate them from a finite amount of observed data, we are forced to require the fields to be specified by a finite number of parameters. A possible choice for a family of noise fields is to select linearly independent elements from a dense subset of . We here use a kernel with length-scale and a noise amplitude , that is

 σαl(qi)=λαlkrl(∥qi−δl∥), (4.1)

where denotes the kernel positions. Possible choices for the kernel include Gaussians , or cubic B-splines . The Gaussian kernel has the advantage of simplifying calculations of the moment equations whereas the B-spline representation is compactly supported and gives a partition of unity when used in a regular grid. Other interesting choices may include a cosine or a polynomial basis of the image domain.

In principle, the methods below allow all parameters of the noise fields to be estimated given sufficient amount of data. However, for simplicity, we will fix the length-scale and the position of the kernels. The unknown parameters for the noise can then be specified in a single vector variable . The aim of the next sections will be to estimate this vector, possibly in addition to the initial configuration , from data using the method of moments in 4.2 and EM in 4.3, respectively.

###### Remark 4.1.

For the bridge simulation scheme, we required to be surjective as a linear map . This assumption can be satisfied when the number of landmarks is low relative to the number of noise fields having non-zero support in the area where the landmarks reside. On the other hand, if the number of landmarks is increased while the number of noise fields is fixed, the assumption eventually cannot be satisfied. Intuitively, in such cases, the extra drift added to the bridge SDE must guide through a nonlinear submanifold of the phase space to ensure the landmarks will hit the target point exactly. This limitation can be handled in three ways: (1) The method of moments as described below avoids matching individual point configurations, and it can, therefore, be used in situations where the surjectivity condition is not satisfied. (2) As discussed in remark 2.3, the noise can be made infinite dimensional. This can be done while keeping correlation structure similarly to the case with finite . See also [arnaudon2016stochastic2] for a discussion of noise in the form of a Gaussian process. (3) The bridge matching can be made inexact mimicking the inexact matching pursued in deterministic LDDMM. This could potentially relax the requirements on the extra drift term to only ensure convergence towards a given distance of the target. Inexact observations of stochastic processes are for example treated in [van_der_meulen_continuous-discrete_2017].

### 4.2. Method of moments

We describe here our first method for estimating the parameters by solving a shooting problem on the space of first and second order moments. Given an estimate of the endpoint distributions , we will solve the inverse problem which consists in using the Fokker-Planck equation (3.9) to find the values of such that we can reproduce the observed final distribution. Solving the Fokker-Planck equation directly is infeasible due to its high dimensionality. Instead, we will derive a set of finite dimensional equations approximating the solution of the Fokker-Planck equation (3.9) for the probability distribution in terms of its first moments. This approach has been developed in the field of plasma physics for the Liouville equation, which is similar to the Fokker-Planck equation (3.9).

###### Remark 4.2 (Geometric moment equation).

As the Fokker-Planck (3.9) is written in term of the canonical bracket, we could expect to be able to apply a geometrical version of the method of moments such as the one developed by [holm2007geometric]. Although this method seems to fit the present geometric derivation of the stochastic equations, we will not use it as it is not in our case practically useful. Indeed, it requires the expansions of the Hamiltonian functions in term of the moments, but we cannot obtain here a valid expansion with a finite number of terms. This is due to the fact that the LDDMM kernel and the noise kernels cannot generally be globally approximated by finite polynomials with bounded approximation error for large distances. This would, in turn, produce spurious strong interactions between distant landmarks.

The method for approximating the Fokker-Planck that we will use here is the following. We first define the moments

 :=∫qαiPθ(q,p,t)dqdp (4.2) :=∫qαipβjPθ(q,p,t)dqdp, (4.3)

where we have written only two possible moments, although any combinations of and at any order are possible. In this work we will only consider moments up to the second order, that is the moments and . Notice that the first moment are -tensors, and the second moments are -tensors, although we will only use index notation here.

We illustrate this method with the first moment , which represents the mean position of the landmarks. We compute its time derivative and use the property of the Kolmogorov operator defined in (3.9) to obtain

 (4.4)

We thus first need to apply the Kolmogorov operator to to obtain

 Lqαi=−{h,qαi}can+12∑l{ϕl,{ϕl,qαi}can}can=∂h∂pαi+12∂σαl(qi)∂qβiσβl(qi), (4.5)

which corresponds to the part of the drift of the stochastic process with Itô correction. Similarly, for the momentum evolution, we obtain

 Lpαi=−{h,pαi}can+12∑l{ϕl,{ϕl,pαi}can}can=−∂h∂qαi+12pγi∂σγl(qi)∂qβi∂σβl(qi)∂qαi−12pβi∂2σβl(qi)∂qαi∂qγiσγl(qi). (4.6)

Most of the terms on the right hand side of (4.5) and (4.6) are nonlinear; so their expected value cannot be written in terms of only the first moments. This is the usual closure problem of moment equations, such as the BBGKY problem arising in many-body problems in quantum mechanics. The solution to this problem is to truncate the hierarchy of moments for a given order and consider the system of ODEs as an approximation of the complete Fokker-Planck solution. Here we will apply the so-called cluster expansion method described in [kira2011semiconductor]. We refer to the Appendix B.1 for more details about this method.

Apart from the first approximation , the next order of approximation is to keep track of the correlations

 (4.7)

This quantity is also called a centred second moment as for it corresponds to the covariance of the probability distribution for the landmark . In general, it corresponds to the correlation between the positions of two landmarks. The dynamical equation for this correlation is found from the equation of the second moment, which gives

 =∑l\Braketσαl(qi)σβl(qj)+\Braketqαi∂h∂pβj−\Braketqαi\Braket∂h∂pβj +12∑l⎛⎝\Braketqαiσγl(qj)∂σβl(qj)∂qγj−\Braketqαi\Braketσγl(qj)∂σβl(qj)∂qγ