Log In Sign Up

Computational Anatomy in Theano

by   Line Kuhnel, et al.
Københavns Uni

To model deformation of anatomical shapes, non-linear statistics are required to take into account the non-linear structure of the data space. Computer implementations of non-linear statistics and differential geometry algorithms often lead to long and complex code sequences. The aim of the paper is to show how the Theano framework can be used for simple and concise implementation of complex differential geometry algorithms while being able to handle complex and high-dimensional data structures. We show how the Theano framework meets both of these requirements. The framework provides a symbolic language that allows mathematical equations to be directly translated into Theano code, and it is able to perform both fast CPU and GPU computations on high-dimensional data. We show how different concepts from non-linear statistics and differential geometry can be implemented in Theano, and give examples of the implemented theory visualized on landmark representations of Corpus Callosum shapes.


page 1

page 2

page 3

page 4


A slice tour for finding hollowness in high-dimensional data

Taking projections of high-dimensional data is a common analytical and v...

Differential geometry and stochastic dynamics with deep learning numerics

In this paper, we demonstrate how deterministic and stochastic dynamics ...

Robust Non-linear Wiener-Granger Causality For Large High-dimensional Data

Wiener-Granger causality is a widely used framework of causal analysis f...

General non-linear Bellman equations

We consider a general class of non-linear Bellman equations. These open ...

Stochastic Development Regression on Non-Linear Manifolds

We introduce a regression model for data on non-linear manifolds. The mo...

Intrinsic Universal Measurements of Non-linear Embeddings

A basic problem in machine learning is to find a mapping f from a low di...

Sundials/ML: Connecting OCaml to the Sundials Numeric Solvers

This paper describes the design and implementation of a comprehensive OC...

1 Introduction

Euclidean statistical methods can generally not be used to analyse anatomical shapes because of the non-linearity of shape data spaces. Taking into account non-linearity and curvature of the data space in statistical analysis often requires implementation of concepts from differential geometry.

Numerical implementation of even simple concepts in differential geometry is often a complex task requiring manual implementation of long and complicated expressions involving high-order derivatives. We propose to use the Theano framework in Python to make implementation of differential geometry and non-linear statistics algorithms a simpler task. One of the main advantages of Theano is that it can perform symbolic calculations and take symbolic derivatives of even complex constructs such as symbolic integrators. As a consequence, mathematical equations can almost directly be translated into Theano code. For more information on the Theano framework, see [10].

Even though Theano make use of symbolic calculations, it is still able to perform fast computations on high-dimensional data. A main reason why Theano can handle complicated data is the opportunity to use both CPU and GPU for calculations. As an example, Fig. 1 shows matching of landmarks on two different ellipsoids performed on a -dimensional landmark manifold. The matching code was implemented symbolically using no explicit GPU code.

The paper will discuss multiple concepts in differential geometry and non-linear statistics relevant to computational anatomy and provide corresponding examples of Theano implementations. We start by considering simple theoretical concepts and then move to more complex constructions from sub-Riemannian geometry on fiber bundles. Examples of the implemented theory will be shown for landmark representations of Corpus Callosum shapes using the Riemannian manifold structure on the landmark space defined in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework.

Figure 1: Matching of landmarks on two ellipsoids. Only the matching curve for landmarks have been plotted to make the plot interpretable. The GPU computation is automatic in Theano and no explicit GPU code is used for the implementation.

The presented Theano code is available in the Theano Geometry repository that includes Theano implementations of additional differential geometry, Lie group, and non-linear statistics algorithms. The described implementations are not specific to the LDDMM landmark manifold used for examples here. The code is completely general and can be directly applied to analysis of data modelled in spaces with different non-linear structures. For more examples of Theano implementation of algorithms directly targeting landmark dynamics, see [1, 2].

The paper is structured as follows. Section 1.1 gives a short introduction to the LDDMM manifold. Section 2 concerns Theano implementation of geodesics as solution to Hamilton’s equations. In Section 3

, we use Christoffel symbols to define and implement parallel transport of tangent vectors. In Section


, the Frechét mean algorithm is considered, while stochastics, Brownian motions, and normal distributions are described in Section

5. Section 6

gives an example of calculating sample mean and covariance by estimating the Frechét mean on the frame bundle. The paper ends with concluding remarks.

1.1 Background

The implemented theory is applied to data on a landmark manifold defined in the LDDMM framework [11]. More specifically, we will exemplify the theoretical concepts with landmark representations of Corpus Callosum (CC) shapes.

Consider a landmark manifold, , with elements as illustrated in Fig. 2. In the LDDMM framework, deformation of shapes are modelled as a flow of diffeomorphisms. Let denote a Reproducing Kernel Hilbert Space (RKHS) of vector fields and let be the reproducing kernel, i.e. a vector field satisfies for all with . Deformation of shapes in are then modelled by flows

of diffeomorphisms acting on the landmarks. The flow solves the ordinary differential equation

, for . With suitable conditions on , the norm on defines a right-invariant metric on the diffeomorphism group that descends to a Riemannian structure on . The induced cometric takes the form


where for . The coordinate matrix of the cometric is which results in the metric having coordinates .

In the examples, we use landmarks representing the CC shape outlines, and the kernel used is a Gaussian kernel defined by

with variance parameter

set to the average distance between landmarks in the CC data. Samples of CC outlines are shown in the right plot of Fig. 2.

Figure 2: (left) An example of a point in . (right) A subset of the data considered in the examples of this paper. The black curve represents the mean CC of the data.

2 Geodesics

Geodesics on can be obtained as the solution to Hamilton’s equations used in Hamiltonian mechanics to describe the change in position and momentum of a particle in a physical system. Let be a chart on and assume is a Riemannian manifold. The Hamiltonian describes the total amount of energy in the physical system. From the cometric , the Hamiltonian can be defined as , where is the component matrix of at . Hamilton’s equations are given as the system of ordinary differential equations

Using the symbolic derivative feature of Theano, the system of ODE’s can be represented and discretely integrated with the following code snippet:

Hamiltonian function and equations
# Hamiltonian function:
H = lambda q,p: 0.5*,,p))
# Hamiltonian equations:
dq = lambda q,p: T.grad(H(q,p),p)
dp = lambda q,p: -T.grad(H(q,p),q)
def ode_Ham@(t,x):
    dqt = dq(x[0],x[1])
    dpt = dp(x[0],x[1])
    return T.stack((dqt,dpt))
# Geodesic:
Exp = lambda q,v: integrate(ode_Ham,T.stack((q,gMflat(v))))

where gMflat is the map turning tangent vectors in to elements in . integrate denotes a function that integrates the ODE by finite time discretization. For the examples considered here, we use a simple Euler integration method. Higher-order integrators are available in the implemented repository mentioned in Section 1. A great advantage of Theano is that such integrators can be implemented symbolically as done below using a symbolic for-loop specified with theano.scan. The actual numerical scheme is only available after asking Theano to compile the function.

Numerical Integration Method
def integrator@(ode_f):
    def euler@(*y):
        t = y[-2]
        x = y[-1]
        return (t+dt,x+dt*ode_f(*y))
    return euler
def integrate@(ode,x):
    (cout, updates) = theano.scan(fn=integrator(ode),
            outputs_info=[x],sequences=[*y], n_steps=n_steps)
    return cout

In the above, integrator specifies the chosen integration method, in this example the Euler method. As the integrate function is a symbolic Theano function, symbolic derivatives can be obtained for the integrator, allowing e.g. gradient based optimization for the initial conditions of the ODE.

An example of a geodesic found as the solution to Hamilton’s equations is visualized in the right plot of Fig. 3. The initial point was set to the average CC for the data shown in Fig. 2 and the initial tangent vector was given as the tangent vector plotted in Fig. 3.

Figure 3: (left) The initial point and tangent vector for the geodesic. (right) A geodesic obtained as solution to Hamilton’s equations.

The exponential map, , is defined as , where , is a geodesic with . The inverse of the exponential map is called the logarithm map, denoted . Given two points , the logarithm map retuns the tangent vector that results in the minimal geodesic from to , i.e. satisfies . The logarithm map can be implemented using derivative based optimization by taking a symbolic derivative of the exponential map, Exp, implemented above:

Logarithm map
# Loss function for landmarks:
loss = lambda v,q1,q2: 1./d*T.sum(T.sqr(Exp(q1,v)-q2))
dloss = lambda v,q1,q2: T.grad(loss(v,q1,q2),v)
# Logarithm map: (v0 initial guess)
Log = minimize(loss, v0, jac=dloss, args=(q1,q2))

The use of the derivative features provided in Theano to take symbolic derivatives of a discrete integrator makes the implementation of the logarithm map extremely simple. The actual compiled code internally in Theano corresponds to a discrete backwards integration of the adjoint of the Hamiltonian system. An example of matching shapes by the logarithm map was shown in Fig. 1. Here two ellipsoids of landmarks were matched by applying the above Log function.

3 Christoffel Symbols

We here describe how Christoffel symbols can be computed and used in the Theano framework. A connection defines links between tangent spaces on and describes how tangent vectors for different tangent spaces relate. Let denote a coordinate chart on with basis coordinates , . The connection is uniquely described by its Christoffel symbols, , defined as .

An example of a frequently used connection is the Levi-Civita connection for Riemannian manifolds. Based on the metric on , the Levi-Civita Christoffel symbols are found by


The Theano implementation below of the Christoffel symbols directly translates into code:

Christoffel Symbols
## Cometric:
gsharp = lambda q: T.nlinalg.matrix_inverse(g(q))
## Derivative of metric:
Dg = lambda q: T.jacobian(g(q).flatten(),q).reshape((d,d,d))
## Christoffel symbols:
Gamma_g = lambda q: 0.5*(T.tensordot(gsharp(q),Dg(q),axes = [1,0])\
     + T.tensordot(gsharp(q),Dg(q),axes = [1,0]).dimshuffle(0,2,1)\
     - T.tensordot(gsharp(q),Dg(q),axes = [1,2]))

The connection, , and Christoffel symbols, , can be used to define parallel transport of tangent vectors on . Let be a curve and let . A vector field is said to be parallel along if the covariant derivative of along is zero, i.e. . For a tangent vector , there exists a unique parallel vector field along s.t. . Assume , then the vector field is parallel along if the coordinates follows the differential equation,


with initial values . In Theano code the ODE can be written as,

Parallel transport
def ode_partrans@(gamma,dgamma,t,x):
    dpt = - T.tensordot(T.tensordot(dgamma, Gamma_gM(gamma),
                                    axes = [0,1]),x, axes = [1,0])
    return dpt
pt = lambda v,gamma,dgamma: integrate(ode_partrans,v,gamma,dgamma)

Let be the mean CC plotted in Fig. 3 and consider s.t. is the vector consisting of copies (one for each landmark) of and , the vector of copies of . The tangent vector is shown in Fig. 3. Define as the geodesic calculated in Section 2 with initial values . The parallel transport of along is visualized in Fig 4. To make the plot easier to interpret, the parallel transported vectors are only shown for five landmarks.

Figure 4: Example of parallel transport of basis vectors , along the geodesic with initial values , . The parallel transported vectors are only plotted for landmarks.

4 Frechét Mean

The Fréchet mean [4] is a generalization of the Euclidean mean value to manifolds. Let be a distance map on . The Frechét mean set is defined as . For a sample of data points , the empirical Fréchet mean is


Letting be the Riemannian distance function determined by the metric , the distance can be formulated in terms of the logarithm map, defined in Section 2, as . In Theano, the Fréchet mean can be obtained by optimizing the function implemented below, again using symbolic derivatives.

Frechet Mean
def Frechet_mean@(q,y):
    (cout,updates) = theano.scan(fn=loss, non_sequences=[v0,q],
                sequences=[y], n_steps=n_samples)
    return 1./n_samples*T.sum(cout)
dFrechet_mean = lambda q,y: T.grad(Frechet_mean(q,y),q)

The variable v0 denotes the optimal tangent vector found with the Log function in each iteration of the optimization procedure.

Consider a sample of observations of the CC data shown in the right plot of Fig. 5. To calculate the empirical Fréchet mean on , the initial point was set to one of the CC observations plotted in the left plot of Fig. 5. The result of the optimization is shown in Fig. 5 (bold outline).

Figure 5: (left) The estimated empirical Frechét mean (black), the initial value (blue) and the Euclidean mean of the samples (red). (right) Plot of the samples of CC, with the Frechét mean shown as the black curve.

So far we have shown how Theano can be used to implement simple and frequently used concepts in differential geometry. In the following sections, we will exemplify how Theano can be used for stochastic dynamics and for implementation of more complex concepts from sub-Riemannian geometry on the frame bundle of .

5 Normal Distributions and Stochastic Development

We here consider Brownian motion and normal distributions on manifolds. Brownian motion on Riemannian manifolds can be constructed in several ways. Here, we consider two definitions based on stochastic development and coordinate representation of Brownian motion as an Itô SDE. The first approach [3] allows anisotropic generalizations of the Brownian motion [9, 7] as we will use later.

Stochastic processes on can be defined by transport of processes from , to by the stochastic development map. In order to describe stochastic development of processes onto , the frame bundle has to be considered.

The frame bundle, , is the space of points s.t. and is a frame for the tangent space . The tangent space of , , can be split into a vertical subspace, , and a horizontal subspace, , i.e. . The vertical space, , describes changes in the frame , while defines changes in the point when the frame is fixed in the sense of having zero acceleration measured by the connection. The frame bundle can be equipped with a sub-Riemannian structure by considering the distribution and a corresponding degenerate cometric . Let denote a chart on with coordinates and coordinate frame for . Let denote the basis vectors of the frame . Then have coordinates where and defines the inverse coordinates of . The coordinate representation of the sub-Riemannian cometric is then given as


where is the matrix with components and for with denoting the Christoffel symbols for the connection, . The sub-Riemannian structure restricts infinitesimal movements to be only along horizontal tangent vectors. Let be the lift of a tangent vector in to its horizontal part and let be given. A horizontal vector at can be defined as the horizontal lift of the tangent vector , i.e. . A basis for the horizontal subspace at is then defined as , where denote the canonical basis of .

Let denote a stochastic process on , . A stochastic process on can be obtained by the solution to the stratonovich stochastic differential equation, , with initial point . A stochastic process on can then be defined as the natural projection of to . In Theano, the stochastic development map is implemented as

Stochastic Development
def sde_SD@(dWt,t,q,nu):
    return T.tensordot(Hori(q,nu), dWt, axes = [1,0])
stoc_dev = lambda q,u,dWt: integrate_sde(sde_SD,

Here, integrate_sde is a function performing stochastic integration of the SDE. The integrate_sde is defined in a similar manner as integrate described in Section 2. In Fig. 6 is given an example of stochastic development of a stochastic process in to the landmark manifold. Notice that for , only the first basis vectors of the basis is used in the stochastic development.

Figure 6: (left) Stochastic process on . (right) The stochastic development of on . The blue points represents the initial point chosen as the mean CC. The red points visualize the endpoint of the stochastic development.

Given the stochastic development map, Brownian motions on can be defined as the projection of the stochastic development of Brownian motions in . Defining Brownian motions by stochastic development makes it possible to consider Brownian motions with anisotropic covariance by choosing the initial frame as not being orthonormal.

However, if one is only interested in isotropic Brownian motions, a different definition can be applied. In [8], the coordinates of a Brownian motion is defined as solution to the Itô integral,


This stochastic differential equation is implemented in Theano by the following code.

Brownian Motion in Coordinates
def sde_Brownian_coords@(dW,t,q):
    gMsharpq = gMsharp(q)
    X = 


    det = T.tensordot(gMsharpq,Gamma_gM(q),((0,1),(0,1)))
    sto = T.tensordot(X,dW,(1,0))
    return (det,sto,X)
Brownian_coords = lambda x,dWt: integrate_sde(sde_Brownian_coords,

An example of an isotropic Brownian motion found by the solution of is shown in Fig. 7.

Figure 7: (left) Brownian motion on . (right) Samples drawn from an isotropic normal distribution defined as the transition distribution of a Brownian motion obtained as a solution to .

In Euclidean statistical theory, a normal distribution can be considered as the transition distribution of a Brownian motion. A similar definition was described in [9]. Here a normal distribution on is defined as the transition distribution of a Brownian motion on . In Fig. 7 is shown samples drawn from a normal distribution on with mean set to the average CC shown in Fig. 2 and isotropic covariance. The Brownian motions are in this example defined in terms of .

6 Frechét Mean on Frame Bundle

A common task in statistical analysis is to estimate the distribution of data samples. If the observations are assumed to be normally distributed, the goal is to estimate the mean vector and covariance matrix. In [9], it was proposed to estimate the mean and covariance of a normal distribution on by the Frechét mean on the frame bundle.

Consider Brownian motions on defined as the projected stochastic development of Brownian motions on . A normal distribution on is given as the transition distribution of a Brownian motion on . The initial point for the stochastic development, , corresponds to the mean and covariance, i.e. denotes the mean shape and the covariance of the normal distribution. As a consequence, normal distributions with anisotropic covariance can be obtained by letting be a non-orthonormal frame.

In Section 4, the Frechét mean on was defined as the point , minimizing the average geodesic distance to the observations. However, as only a sub-Riemannian structure is defined on , the logarithm map does not exist and hence the geodesic distance cannot be used to define the Frechét mean on

. Instead, the distance function will be defined based on the most probable paths (MPP) defined in 

[7]. In this section, a slightly different algorithm for estimating the mean and covariance for a normal distribution is proposed compared to the one defined in [9].

Let be given such that is the mean and covariance of a normal distribution on . Assume that observations have been observed and let . The Frechét mean on can then be obtained by optimizing,

where denotes the sharp map on changing a momentum vector in to the corresponding tangent vector in . Notice that the optimization is performed over , which is the Frechét mean, but also over the momentum vectors . When making optimization over these momentum vectors, the geodesics, , becomes MPPs. The Frechét mean on is implemented in Theano and numpy as,

Frechet Mean on FM
detg = lambda q,nu: T.nlinalg.Det()(T.tensordot(nu.T,
lossf = lambda q1,q2: 1./d.eval()*np.sum((q1-q2)**2)
def Frechet_meanFM@(u,p,y0):
     q = u[0:d.eval()]
     nu = u[d.eval():].reshape((d.eval(),rank.eval()))
     for i in range(n_samples):
         distv[i,:] = lossf(Expfmf(u,p[i,:])[0:d.eval()],y0)
         normp[i,:] = 2*Hfm(u,p[i,:]) # Hamiltonian on FM
     return 1./n_samples*np.sum(normp)

7 Conclusion

In the paper, it has been shown how different concepts in differential geometry and non-linear statistics can be implemented using the Theano framework. Integration of geodesics, computation of Christoffel symbols and parallel transport, stochastic development and Frećhet mean estimation were considered and demonstrated on landmark shape manifolds. In addition, we showed how the Frećhet mean on the frame bundle can be computed for estimating the mean and covariance of an anisotropic normal distribution on .

Theano has, for the cases shown in this paper, been a very efficient framework for implementation of differential geometry concepts and for non-linear statistics in a simple and concise way yet allowing efficient and fast computations. We emphasize that Theano is able to perform calculations on high-dimensional manifolds using either CPU or GPU computations. In the future, we plan to extend the presented ideas to derive Theano implementations of differential geometry concepts in more general fiber bundle and Lie group geometries.

Acknowledgements. This work was supported by Centre for Stochastic Geometry and Advanced Bioimaging (CSGB) funded by a grant from the Villum foundation.