PDE-constrained optimization in medical image analysis

02/28/2018 ∙ by Andreas Mang, et al. ∙ Association for Computing Machinery University of Pennsylvania University of Houston 0

PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncological imaging. We review related literature and give examples on the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first one is image registration. The second one is data assimilation for brain tumor patients, and the third one data assimilation in cardiovascular imaging. The image registration problem is a classical task in medical image analysis and seeks to find pointwise correspondences between two or more images. The data assimilation problems use a PDE-constrained formulation to link a biophysical model to patient-specific data obtained from medical images. The associated optimality systems turn out to be sets of nonlinear, multicomponent PDEs that are challenging to solve in an efficient way. The ultimate goal of our work is the design of inversion methods that integrate complementary data, and rigorously follow mathematical and physical principles, in an attempt to support clinical decision making. This requires reliable, high-fidelity algorithms with a short time-to-solution. This task is complicated by model and data uncertainties, and by the fact that PDE-constrained optimization problems are ill-posed in nature, and in general yield high-dimensional, severely ill-conditioned systems after discretization. These features make regularization, effective preconditioners, and iterative solvers that, in many cases, have to be implemented on distributed-memory architectures to be practical, a prerequisite. We showcase state-of-the-art techniques in scientific computing to tackle these challenges.



There are no comments yet.


page 6

page 7

page 9

page 16

page 17

page 19

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We review methods that integrate medical image analysis algorithms with biophysical models. There are several benefits in integrating biophysical modeling with medical imaging. It can aid prognosis, diagnosis, and the design of new treatment protocols. It can aid our understanding of how biophysics relate to imaging information. Examples of medical image analysis algorithms include image segmentation, image registration, and parameter estimation for biomarkers.

The literature on the integration of biophysical (or biophysically inspired) models with image analysis algorithms is quite extensive. We will consider three problems: The first one is diffeomorphic image registration (see §2). This problem is paramount to many applications in medical imaging [63, 186]. It is about establishing a pointwise spatial correspondence , , between two images and of the same object so that the transformed template image becomes similar to the reference image , i.e,. for all , where denotes the function composition [63, 145]. We require that the map is a diffeomorphism, i.e., is a bijection, continuously differentiable, and has a continuously differentiable inverse. In our formulation, we do not directly invert for the map but introduce a pseudo-time variable and invert for the velocity of . In the context of Lagrangian methods, the map at a particular location represents the end point of the characteristic at defined by the velocity (see [132]). In the present work, we use an Eulerian formulation instead. Here, the PDE constraint is, in its simplest form, given by a hyperbolic transport equation for the image intensities subjected to the velocity . The regularization model is a Sobolev norm that stipulates smoothness requirements on . If is sufficiently smooth it is guaranteed that gives rise to a diffeomorphism  [20, 39, 209]. This formulation can be augmented by biophysics operators (hard constraints111In general, constraints can either be hard or soft. Hard constraints are a set of conditions that the variables are required to satisfy. Soft constraints are penalties for variables that appear in the objective functional; they penalize a deviation of the variables from a condition.

) to incorporate additional prior knowledge on the expected deformation map. One motivation for adding such constraints is that often, especially in longitudinal studies of the same patient, there is a real deformation by the realization of an actual physical phenomenon. A simple constraint, which nonetheless poses significant numerical challenges, is the incompressibility of tissue 

[128, 129, 131]. Examples for more complex biophysical constraints are brain tumor growth models [99, 73, 174, 210, 211, 212] or cardiac motion models [188].

We will discuss all of these constraints either in the context of diffeomorphic image registration or in the more generic context of data assimilation in medical imaging. This brings us to the second problem we are addressing in this article: Data assimilation in brain tumor imaging [70, 100, 114, 135, 138] (see §3). The PDE constraint is, in its simplest form, a nonlinear parabolic differential equation. The inversion variables are, e.g., the initial condition, the growth rate of the tumor, or a diffusion coefficient that controls the net migration of cancerous cells within brain parenchyma. The regularization model is in our case an -penalty. The third problem is cardiac motion estimation (see §4).

1.1 Contributions

  • We provide a comprehensive review of existing work on three topics for PDE-constrained optimization with application to medical image analysis:

    1. diffeomorphic image registration (see §2),

    2. data assimilation in brain tumor imaging (see §3), and

    3. data assimilation in cardiac imaging (see §4).

  • We discuss the implementation of a distributed-memory Newton–Krylov solver for two of these problems (see §5).

In this paper, we summarize our recent contributions in the field of diffeomorphic image registration [128, 129, 131, 132, 130] and data assimilation in brain tumor imaging [70, 71, 135, 127]. In particular, we will present the formulations described in [128, 129] and [70], the Newton–Krylov methods presented in [128, 130] and [70], and review the distributed-memory implementations described in [129, 71]. We have also added some new results, which have not been presented elsewhere (see §6.1).

1.2 Mathematical Setting

The problems we consider in this paper can be viewed as an inverse problem of estimating a parameter function (inversion field) based on observations , which we assume to be solutions of a system of PDEs with , defined on a domain , , . We, e.g., consider linear hyperbolic and nonlinear parabolic PDE constraints . We assume that the data is a nonlinear function of the parameters , i.e., the parameter-to-observation map is given by , where is measurement noise. Here, is an observation operator, i.e., a projection of the state variable onto locations in (or ) to which the observation (or desired state) is associated. The problem of recovering from is often ill-posed due to the fact that the measured data is finite and perturbed by noise. A common approach to tackle this challenge is to solve for a smooth, locally unique solution of a nearby problem by imposing prior knowledge based on an adequate regularization scheme [59], for example a Tikhonov functional . Overall, we arrive at the following formulation


The problem in (1) balances data fidelity (for simplicity, we only consider an -distance) with regularity (controlled by the (regularization) parameter ). Designing efficient algorithms for PDE-constrained optimization problems remains a significant challenge [2, 27, 34, 95, 97, 120]. These problems typically involve an infinite number of unknowns, which, upon discretization, leads to high-dimensional systems (in our case, e.g., unknowns for clinically relevant problems). Usually, the solver has to be tailored to the structure of the operators. We focus on derivative/adjoint based algorithms for PDE-constrained optimization. The evaluation of the objective functional and its derivatives involves expensive computations, which, in our examples, are the solution of PDEs. We tackle these challenges by designing computational methods that rigorously follow mathematical and physical principles, and are based on efficient, robust, high-fidelity algorithms with a sound theoretical basis.


Typically, the model parameters are not

the final quantity of interest. More often, we need to predict some other future quantity (e.g., tumor infiltration to healthy tissue or the probability of tumor recurrence). Using a deterministic formulation, like the one in (

1), only provides point estimates. Due to uncertainties in the data (denoted above by ), the model parameters , the inversion algorithm, and the mathematical model

, we require confidence intervals for the quantities of interest, not just point estimates. That is, we are interested in uncertainty propagation from the input to the quantity of interest. This can be achieved by using a Bayesian framework 

[107, 192, 187]. Formulating the problem as a Bayesian inverse problem provides the posterior joint probability density of the parameters . Related quantities of interest can be computed by computing expectations with respect to the posterior. The solution of the deterministic inverse problem corresponds to the maximum a posteriori probability estimate. This connection between deterministic inversion and statistical inference can be used to devise efficient sampling strategies that exploit local problem structure [67, 68, 136, 162]. In what follows, we will focus on deterministic inversion.

1.3 Limitations and Open Issues

Despite the success of biomechanical models, in many cases their clinical application is limited to qualitative analysis. Biomechanical simulators are hampered by imprecise and complex constitutive laws, and the uncertainties in boundary and initial conditions. Moreover, even assuming we had a complete knowledge of the models and their parameters, solving the underlying PDEs is computationally challenging; accounting for the uncertainty amplifies the computational difficulties.

The key challenges and limitations of biomechanically informed algorithms for medical image analysis can be summarized as follows:

  • Models are often phenomenological; many parameters in the equations do not have direct biological counterparts and can therefore not be directly obtained from data.

  • Boundary and initial conditions have significant uncertainties.

  • In a clinical setting, in-vivo medical imaging data has typically poor resolution and may contain significant amount of noise, which makes the calibration of complicated (multi-parameter) models quite challenging.222We note that high-resolution imaging technologies have emerged; an example is CLARITY imaging [44, 194, 115]; we revisit this ex-situ imaging technique briefly in §2. High-resolution imaging techniques are not yet available in a clinical setting. Resolution levels for routinely collected imaging is between and along each spatial direction (depending on the imaging modality).

  • The imaging operator is extremely complicated. The question on how observations in imaging data map to model outputs is a difficult one and depends on the scanner. Oftentimes, pseudo-correspondences are used.

  • Complicated models result in ill-conditioned, multi-physics operators that are challenging to solve.

  • We obtain complex optimization problems that are highly nonlinear and time-dependent.

We limit this review to problems we have considered in our past work. We note that PDE-constrained optimization appears in other contexts in medical imaging. For example, many image reconstruction algorithms involve physical models and the reconstruction phase could be formulated as a PDE-constrained optimization problem. However, this usually leads to overly complicated algorithms. Typically approximation and spectral analysis of the underlying operators are preferable and lead to direct reconstruction algorithms. Here, we briefly give some references for problems in which PDE-constrained optimization has been considered in practice. In elastography, the goal is to estimate elastic material properties using, e.g., ultrasound. The main PDE is wave propagation. Examples can be found in [72, 153, 155]. In optical diffusion tomography, the goal is to image optical properties of tissue using diffuse approximations of light scattering. The main PDEs are diffusion, radiation-diffusion, or highly-damped Helmholtz. Examples include [7, 8, 105, 168, 173].

1.4 Organization and Notation

We denote the standard inner product by and the norm by , both defined on the spatial domain , . We denote discretized quantities by upright boldface letters. That is, the discrete representation of is denoted by . Likewise, a superscript is added to the operators whenever we refer to discretized quantities.

We review the literature and present the formulation for the diffeomorphic registration problem in §2 and for the data assimilation problem in brain tumor imaging in §3. We review relevant literature for cardiac motion estimation in §4. We present the implementation of our solver and its deployment in high-performance computing platforms in §5. We showcase exemplary numerical results in §6. We conclude with §7. Additional details about our solver can be found in the appendix (§B and §C).

2 Diffeomorphic Registration

Let us start by defining more precisely what we mean by diffeomorphic image registration. In diffeomorphic image registration we require that the spatial transformation that maps the template image to the reference image is a diffeomorphism. A diffeomorphism is an invertible map, which is continuously differentiable (in particular a -function) and maps onto itself. Formally, we require that does not change sign or vanish for to be locally diffeomorphic.

2.1 Literature Review

Providing a comprehensive review on image registration is impossible due to the large body of literature. We refer to [63, 144, 145, 186] for a general introduction to the problem, recent algorithmic developments, and applications. We consider the problem of diffeomorphic image registration as a hyperbolic optimal control problem [33, 39, 90, 128, 129, 130, 131]; i.e., problems with a transport or continuity equation as PDE constraint (we refer to §2.2 for details on the problem formulation).333Similar formulations can be found in other applications, e.g., geophysical sciences [65]. These types of formulations are based on conceptual ideas from computational fluid dynamics and flow control [78]. They share characteristics with traditional optical flow formulations [102, 108, 163, 172] and formulations for optimal mass transport [6, 22, 25, 85, 184, 197].

Viscous fluid formulations for diffeomorphic image registration are pioneered in [42, 43]. Here, the basic idea is to introduce a pseudo-time variable and parametrize the deformation map by its velocity . This approach has been embedded into a variational framework in [20, 55, 140, 142, 195]; the problem formulation takes the form

subject to for and for , where is equivalent to , and . In the formulation above, we seek to find a time dependent, smooth, and compactly supported velocity field , where is a Sobolev space of certain regularity. The Sobolev norm in the problem formulation above guarantees that belongs to a space

of regular vector fields, which in turn guarantees (the regularity requirements are discussed in

[55, 195, 20]) that the solution to is a diffeomorphism. This approach embeds the diffeomorphisms in a Riemannian space; the time integral of the squared Sobolev norm of is a geodesic distance (a Riemannian metric) between the identity map at and the diffeomorphism at . The optimal defines the shortest path on the manifold of diffeomorphisms that connects with .

As we will see below, we will use a PDE-constrained (Eulerian) formulation instead. A related formulation that includes additional hard and soft constraints on and can be found in [118, 119]. This is different to the PDE-constrained formulations in [33, 38, 39, 90, 128, 129, 130, 131]. Here, a linear advection equation is used to model the transport of the intensities of . The latter approach does not operate on the space of diffeomorphisms; does not appear explicitly, only does. This formulation has been augmented by hard or soft constraints on the divergence of  [33, 39, 128, 129, 131, 172], rendering the deformation map (nearly) incompressible [79, p. 77ff.].444An interesting direction of research is to augment these PDE constraints by more complex biophysics operators [99, 73, 188, 210, 211, 212]. This results in additional parameters that need to be calibrated. In both approaches has to be sufficiently smooth to guarantee that the associated deformation map is a diffeomorphism. These smoothness requirements are typically stipulated by the regularization operator—a Sobolev norm, the order of which depends on the smoothness of the images, the particular form of the transport equation, and the existence of additional constraints (such as, e.g., a divergence-free velocity) [18, 20, 38, 39, 49, 55, 195] (see §2.2).

The differences between our formulation and PDE-constrained formulations for optimal mass transport [22, 25, 85, 184] are the constraint (a continuity equation to preserve mass555Applications for mass-preserving registration can be found in [36, 132, 205]. versus a transport equation) and the regularization model (-norm of scaled by the transported density versus Sobolev-norm of ). The conceptual differences between our formulation and traditional optical flow formulations [102, 108, 163, 172] are that we do not consider a time series of images and the transport equation is a hard constraint; it does not directly enter the objective functional. PDE-constrained formulations for optical flow, which are equivalent to our formulation, are described in [5, 18, 33, 39].

There has only been a limited amount of work devoted to the design of effective algorithms for velocity-based diffeomorphic image registration that exploit state-of-the-art technology in scientific computing. There are several challenges for the design of efficient solvers. First and foremost, image registration is an ill-posed, highly nonlinear inverse problem. It is established that gradient descent schemes for numerical optimization (which are still predominantly used [13, 14, 15, 20, 39, 81, 90, 201]) only converge slowly when applied to nonlinear, ill-posed problems. Recently, several groups have developed Newton-type algorithms to address this issue [11, 25, 128, 129, 130, 131, 132, 184, 96]. In this context, the design of an effective preconditioner for the Hessian is essential for these methods to be competitive in terms of the time-to-solution; different approaches have been discussed in [25, 128, 130, 132, 184, 96] (see [23] for an introduction to preconditioning of saddle point problems). Another issue is the size of the search space, in particular when considering 3D problems (in space).666The number of unknowns is the number of discretization points in space times the number of discretization points in time for the velocity times the dimensionality of the ambient space (we invert for a time-dependent vector field). The size of the search space can be reduced by exploiting the fact that the momentum of the variational problem is constant in time (in Lagrangian coordinates). That is, the flow of the diffeomorphism can completely be encoded by the momentum at  [11, 141, 201, 208, 209]. Another approach is to invert for a stationary velocity field  [9, 10, 94, 129, 131, 132].777A stationary velocity field is a velocity field that is constant in time, as opposed to a nonstationary—i.e., time-dependent or transient—velocity field. We note that stationary velocity fields do not cover the entire space of diffeomorphisms and do not provide a Riemannian metric on this space, something that may be desirable in certain applications [20, 140, 213]; this requires time-dependent velocities. The work in [128] uses a Galerkin method to control the number of unknowns in time. It is shown experimentally that stationary and nonstationary velocities yield an equivalent registration quality in terms of data mismatch. In addition to that, we have to deal with an expensive parameter-to-observation map, which, in our formulation, is a hyperbolic transport equation. Several strategies to efficiently solve these equations have been considered; they range from conditionally stable total variation diminishing [33], second order Runge-Kutta [128, 129], and high-order essentially nonoscillatory [90] schemes, to unconditionally stable implicit Lax-Friedrich [25, 184], semi-Lagrangian [20, 39, 130, 131], and Lagrangian [132] schemes.

We note, that there exists a large body of literature that discusses effective parallel implementations of solvers for PDE-constrained optimization problems; examples include [1, 2, 28, 29, 30, 27, 26, 183]. There has been only little work on distributed-memory algorithms for diffeomorphic image registration. Real-time implementations for low dimensional (rigid) registration are, e.g., described in [180, 171]. Popular software packages for (in many cases) diffeomorphic image registration are described in [10, 15, 109, 143, 145, 156, 199, 200]. None of these uses a PDE-constrained formulation like the one discussed here. Most implementations use open multi-processing [15, 109, 199, 200] or graphic processing units (GPUs) for acceleration [77, 82, 143, 197, 185, 151, 77, 58, 112, 198, 126]. A survey of registration algorithms that exploit GPUs for acceleration can be found in [64]. A literature survey on high performance computing in image registration can be found in [181, 57, 178]. Works on deploying solvers for diffeomorphic image registration to supercomputing platforms can be found in [82, 81, 80, 131, 71]. The work that is most closely related to ours [131, 71] is [82, 80] and [198]. In [82, 81, 80] a multi-GPU accelerated implementation of the diffeomorphic image registration approach described in [106] is presented. The work in [198] discusses a GPU accelerated implementation of the diffeomorphic algorithm described in [10].

Recent advances in medical imaging [44, 194, 115] result in data sizes that pose the demand for deploying scalable, distributed-memory implementations.888The resolution of these datasets can reach resulting in of data (if stored with half precision) [194]. Overall, this results in 2.4 trillion discretization points in space. To the best of our knowledge, only our solver [131, 71] has been shown to scale up to similar problem sizes. Moreover, we are the first group that proposed an implementation of a globalized (Gauss–)Newton–Krylov solver for a PDE-constrained approach to diffeomorphic image registration that scales on supercomputing platforms [131, 71].

Figure 1: Diffeomorphic image registration problem. The inputs to this inverse problem are scalar intensity values of two images—the reference image and the template image (image to be registered)—of the same object (left); data taken from [4, 145]. The inherent assumption is that there exists a geometric transformation (e.g., a rigid body transformation (middle) or a deformable transformation (middle right and right)) that relates points in to points in (arrows). These spatial correspondences are, in our formulation, solely based on intensity values; points with similar intensities map to one another. In general, this can result in mappings that are not meaningful (orange arrow). In our formulation, we consider a mapping meaningful or plausible if it is a diffeomorphism, i.e., it is smooth, onto, one-to-one, and it has a smooth inverse.

2.2 Formulation

We assume that the reference image and the template image are continuously differentiable functions, and compactly supported on , . In diffeomorphic image registration we seek to find a diffeomorphism such that the transformed template image becomes similar to the reference image , i.e.,  [144, 145]. In our formulation, we do not directly invert for the deformation map . We introduce a pseudo-time variable and invert for a smooth velocity field (control variable) , , , where is a Sobolev space of order (see below); the velocity parameterizes the deformation map . We use an Eulerian formulation; in its simplest form, the PDE constraint in (1), i.e., the forward operator, is given by the transport equation [128, 90] equationparentequation


and periodic boundary conditions on . In our formulation, the map does not appear explicitly. This is different from map-based formulations (i.e., formulations that directly invert for [36, 63, 133, 145] and Lagrangian formulations for velocity-based diffeomorphic image registration [15, 20, 132]. In our formulation, we model the deformed template image as the solution of the transport equation (2) at and denote it by .

In the inverse problem we seek to find a velocity such that the transported template image is equal to for all . As can be seen in (1), we use a squared -distance to measure the discrepancy between and , where corresponds to and projects the state to the terminal time frame at , i.e., .999Other distance measures, such as mutual information, normalized gradient fields, or cross correlation can be used (see [144, 145] for an overview). In our formulation, changing the distance measure will affect the first term in the objective functional and the terminal condition of the adjoint equation (8) in §5.1.1. Notice, that the reference image as well as the initial condition of the forward problem in (2) are measured function, and, hence, may contain noise.

This formulation has been augmented by constraints on the divergence of  [39, 128, 130, 131]. We obtain an incompressible diffeomorphism if is divergence-free [79, p. 77ff.], i.e., . In this case, is the space of divergence-free velocities with Sobolev regularity of order . We can relax the incompressibility assumption by introducing a mass-source , i.e.,  [129]. This is equivalent to adding a penalty on the divergence of to the variational problem [33]. In [132] it has been suggested to replace (2a) with the continuity equation, e.g., used in optimal mass transport formulations [22, 25, 85, 184] to model the preservation of mass of .

A critical ingredient of the variational problem formulation is the regularization operator ; choosing an adequate Sobolev norm guarantees that the deformation map associated with is a diffeomorphism [20, 55, 140, 142, 195]. In general format, is given by


Here, is a positive definite, self-adjoint differential operator. In [20] it is shown that we require (slightly more than) -regularity for if we assume that and are -functions. Accordingly, with , , has become a popular choice [20, 94, 213]. If we model the images as functions of bounded variation instead, we require that is an -function [39]. This requirement can be relaxed to -integrability if we add a diffusion operator to (2a[18]. A detailed study of (near) incompressible flows can be found in [38, 39, 49]. The work in [33] considers an additional regularization term for . In [129, 131] is modeled as a stationary field; the integral in time in (3) can be dropped. Extensions to nonquadratic -regularization models are discussed in [129, 163]. In particular, the work in [129] introduces a nonlinear regularization operator that enables a control (promote or penalize) of shear (the first variation yields a control equation with a viscosity that depends on the strain rate).

3 Data Assimilation in Brain Tumor Imaging

3.1 Literature Review

We view biophysics simulations as a powerful tool to augment morphological and functional medical imaging data. There is a long tradition in the design of mathematical models of cancer progression [21, 170, 202]. In many cases, the models are rather simplistic.101010More sophisticated multi-species models that, e.g., account for hypoxia, necrosis and angiogenesis can be found in [92, 76, 166]. This is due to the fact that medical imaging only provides scarce information to drive the model (see Fig. 2); we have to limit the size of the parameter space. Most of the approaches used for data assimilation in medical imaging exclusively rely on observations derived from morphologic imaging data and consider continuum models; cancerous cells are not tracked individually but modeled as a density on , . In their simplest form, these models assume that cancer progression is dominated by two phenomena: cell division/mitosis and cell migration, resulting in reaction-diffusion type equations [150, 189, 190, 104].111111Models that account for the mechanical interaction of the tumor with its surroundings have been described in [40, 45, 98, 146, 203, 206, 207]. Despite their simplicity and phenomenological character, these types of models have been successfully applied to capture the appearance of tumors in medical images [45, 113, 125, 127, 135, 167]. They have been used to (i) study tumor growth patterns in individual patients [45, 127, 135, 167, 189], (ii) extrapolate the physiological boundary of tumors [113, 148], (iii) or study the effects of clinical intervention [116, 117, 139, 164, 169, 191, 203] .

Figure 2: Parameter identification/data assimilation in brain tumor imaging. The input to our problem are magnetic resonance images of an individual patient (left; from left to right: fluid attenuation inversion recovery; T2-weighted; and T1-weighted with contrast enhancement). Probability maps for tissue types extracted from these data serve as an input to our inversion; middle; tumor (yellow); edema (purple); white matter (white); gray matter (gray); cerebrospinal fluid (red). We perform the simulations in a standardized atlas space (due to the absence of an estimate of the patient’s image without tumor; the data is taken from [46]). This dataset provides probability maps for anatomical regions for a normal brain image (e.g., white and gray matter denoted by and , respectively). The goal is to estimate parameters so that the predicted state at matches the observed state (parameter identification problem). The images are modified from [73].

Most of the work mentioned above is concerned with brain tumor imaging; exceptions are, e.g., [203] (breast cancer) and [125] (pancreatic cancer). A key challenge is the design of methods that have the potential to generate patient-specific predictions. In general, this requires the solution of an inverse parameter identification problem. Several groups have addressed this problem. There has only been little work on model-based image analysis in brain tumor imaging based on PDE-constrained optimization [47, 70, 99, 111, 110, 125, 135, 127, 165];121212We note that there exists a large body of literature on optimal control problems with similar PDEs as constraints in other areas. Examples can, e.g., be found in [19, 50, 61, 62, 159]. the works in [47, 70, 99, 111, 110, 125, 165] consider adjoint based approaches for numerical optimization. Others use derivative-free optimization [40, 110, 114, 133, 127, 139, 206, 207], finite-difference approximations to the gradient [101], or tackle the parameter estimation problem within a Bayesian framework [116, 117, 138, 91, 154, 123, 48, 122] (see Rem. 1.2). These approaches have in common that they only require the evaluation of an objective functional (i.e., essentially a repeated solution of the forward problem for perturbed parameters). As such, they are easy to implement. However, it is established that they are in general not as efficient as gradient based optimization strategies (at least for high-dimensional problems); they display slow convergence resulting in an excessive number of iterations/evaluations of the forward model. Another strategy that has been considered, is to exploit the asymptotic behavior of reaction-diffusion equations [191, 89, 114]. The basic idea is to consider the boundary of the abnormality visible in medical imaging as a traveling wave solution and convert it into parameter estimates for the growth/migration rate (see [89]). The work in [99] is to the best of our knowledge the first to consider adjoint equations in the context of data assimilation in brain tumor imaging. The adjoint equations are derived for the one-dimensional case; they fall back to a derivative-free optimization strategy when solving the three-dimensional problem. The work in [47] is based on a multi-species model. They derive adjoint equations for inverting for the vascularization. The work in [125] extends [99]. They derive first-order optimality conditions for an advection-reaction-diffusion PDE-constrained optimization problem; there is no information on the numerical treatment of the problem.

In [165] a coupled reaction-diffusion system for the density of normal tissue, cancerous tissue, and ion concentration is considered. They derive adjoint equations for a scalar parameter controlling the excess ion concentration. The work in [111] models the tumor as a radially symmetric spheroid. The only work we are aware of (in the context of model based brain tumor image analysis) that derives Hessian information for numerical optimization is [70]. We describe this model in more detail in §3.2. Aside from numerical optimization, there has only been little work on the design of effective solvers (in terms of time-to-solution and rate of convergence). Conditionally stable explicit time integrators are a common choice due to their simplicity [47, 203]. Using methods that require a large number of time steps can become computationally prohibitive in a three-dimensional setting not only in terms of the time-to-solution but also due to memory requirements.131313A naive implementation may require storage of the time history of the state and adjoint fields; one remedy is the implementation of check-pointing/domain decomposition schemes [1, 75, 93]. The works in [99, 70, 133, 127] consider unconditionally stable schemes.

An open problem in data assimilation in brain tumor imaging is how to establish a physiologically meaningful correspondence between the model output (predicted state; typically a cell density) and the data (observed state; typically morphological imaging information, i.e., an abnormality in the images; see Fig. 2

for an example). As a consequence one has to rely on heuristics and pseudo-correspondences. One such example is to relate (manual) segmentations of imaging abnormalities to a detection thresholds for the computed population density of cancerous cells 

[70, 89, 116, 117, 127, 133, 191]. Another approach is to derive cell density estimates from physiological imaging data [12, 101, 203]. Either of these techniques only provides scarce information to drive the parameter estimation, which in turn limits the size of the search space, and as such the complexity of the models.

Considering parallel, distributed-memory implementations, we are not aware of any work in this area other than the one presented by our group [71].141414We note that there certainly exist parallel implementations of solvers for similar PDE-constrained optimization problems. We refer to [1, 2, 28, 29, 30, 27, 26] for examples.

3.2 Formulation

The PDE constraint in (1), i.e., the forward operator, is given by equationparentequation


with Neumann boundary conditions on . The parameter

represents the growth rate; it controls the proliferation of cancerous cells modeled by a logistic function. The diffusion tensor

, , controls the net migration of cancerous cells . A common assumption in many models is that the rate of migration depends on the tissue composition; typically, it is modeled to be faster in white than in gray matter [116, 117, 189]. If we denote the probability maps for these tissue types by and , respectively, the associated model for is given by [127]


with . We can augment this model by integrating the white matter tract architecture151515The white matter fibre architecture can be estimated from data measured by so-called diffusion tensor imaging, a magnetic resonance imaging technique that measures the anisotropy of diffusion in the human brain. The result of this measurement is a tensor field that can directly be inserted into (4).

(i.e., information about neuronal pathways; see Fig. 

3 for an illustration). This has, e.g., been considered in [45, 70, 113, 127, 135, 148].

The input to our inversion are (i) estimates for and , (ii) a tumor free patient image obtained from medical imaging data [73, 137], and (iii) an estimate for the patient’s tumor given in terms of a probability map . Given the forward model (4) our parameter vector essentially consists of , (i.e., and in (5)) and . We assume that we have experimental data that provides estimates for and ; we only invert for . We parameterize the initial condition in (4b) using an dimensional space spanned by a Gaussian basis

we obtain . The regularization model for our problem formulation (see (1)) is the -norm of .

Figure 3: Forward simulation. The migration model controls the shape of the tumor by connecting the simulation to virtual brain anatomy (the data is taken from [46]). We display an estimate for white matter fibre tracts of the human brain obtained from diffusion tensor imaging (data taken from [147]; the color coding encodes the orientation of the fibres). This data allows us to model a preferential migration of cancerous cells along white matter tracts (see, e.g., [45, 70, 113]). We show two simulation results for different model parameters (middle: isotropic migration model; right: anisotropic migration model). The orange surface of the first simulation shows the core of the tumor and the green surface the outer boundary. The images are modified from [127].

4 Cardiac Motion Estimation

Medical imaging can help in the diagnosis of cardiac masses, cardiomyopathy, myocardial infarction, and valvular disease. One example is cine-MRI, which is used routinely for diagnosing a variety of cardiovascular disorders [37, 179]. Given the cine-MRI data we seek to reconstruct the motion of the cardiac wall, which in turn is used to evaluate clinical indices like ejection fraction and localized abnormalities of the heart function. Segmentation of the ventricles and the myocardium is the first step toward motion reconstruction for quantitative functional analysis of cine-MRI data. Manual segmentation is time consuming [16] and inter- and intra-observer variability can be high [32]. For this reason automatic segmentation algorithms are preferable. However, even today most of them are not robust enough for a fully automatic segmentation. Here we discuss methods that incorporate biophysical constraints and often result in PDE-constrained optimization algorithms.

One of the main thrusts in recent research has been the 4D motion estimation using biomechanical models. In [103], a piecewise linear composite biomechanical model was used to determine active forces and the strains in the heart based on tagged MRI information. In [157, 158] echocardiography and MR images were combined with biomechanical models for cardiac motion estimation. Interactive segmentation was combined with a Bayesian estimation framework that was regularized by an anisotropic, passive, linearly elastic, myocardium model. The authors recognized the importance of neglecting active contraction of the left ventricle. In [176, 177], the need for realistic simulations and the need for inversion and data assimilation was outlined. In [31], an incompressible deformable registration was used for global indicators. In [160], a regular grid with B-splines was used with a mutual information-based similarity measure;161616Mutual information is a statistical distance measure that originates from information theory. As opposed to the squared distance used in the present work (see §2

), which is used for registering images acquired with the same modality, mutual information is used for the registration of images that are acquired with different modalities (e.g., the registration of computed tomography and magnetic resonance images). Mutual information assess the statistical dependence between two random variables (in our case image intensities; see 

[144, 145, 186] for more details). however, the focus has been on inter-subject registration [161] (without consideration of mechanics). In  [182], a 4D deformable registration was proposed with a diffusive space-time regularization that worked well for global motion measures. In [188], a biomechanically-constrained motion estimation algorithm that explicitly couples raw image information with a biomechanical model of the heart is described. The main features of that scheme are (i) a patient-specific image-based inversion formulation for the active forces; (ii) a multigrid-accelerated, octree-based, adaptive finite-element forward solver that incorporates anatomically-accurate fiber information; (iii) an adjoint/Hessian-based inversion algorithm; and (iv) a 4D coupled inversion for all images . The method requires segmentation of the blood pool and myocardium at the initial frame to assign material properties and a deformable registration to map fiber-orientations to the patient. More complex models are considered in [52]. The authors use an adjoint method for cardiac-motion estimation and contractility combining catheterized electrophysiology data and cine-MRI sequences. They consider a gradient descent method that uses adjoint equations for the elastic deformation of the heart. A similar approach for cardiac motion estimation is described in [196]; the adjoints are derived using an automatic differentiation framework.

5 Numerics

Here, we summarize the numerical strategies for solving the problems in §2 and §3, respectively. We discuss the discretization in space and time, the solvers, the computational kernels, and the parallel implementation. Overall, we use a globalized, preconditioned, inexact, reduced space Gauss–Newton–Krylov method for both problems. We will see that the optimality systems of the considered problems yield large, ill-conditioned operators that are challenging to solve in an efficient way. The associated PDE operators will determine the choices we make for the numerics.

5.1 Optimality Conditions

We use the method of Lagrange multipliers [124] to turn (1) into an unconstrained problem. From thereon we can consider two alternative approaches to tackle the optimization problem: optimize-then-discretize and discretize-then-optimize (see [78, 204, p. 57ff.] for a discussion; see also Rem. 5.1). We use the former strategy; we first compute variations of the Lagrangian functional with respect to the state, adjoint, and control variables, and then discretize them. We will assume that all variables fulfill the necessary regularity requirements to be able to carry out these computations; we refer the interested reader to, e.g., [18, 38, 39, 118, 119], for a rigorous treatment.


An optimize-then-discretize approach does not guarantee that the discretization of the gradient is consistent with the discretization of the objective functional. Further, it is not guaranteed that the discretization of the forward and adjoint operators are transposes of one another; similarly, it is not guaranteed that discretization of the Hessian is symmetric. These inconsistencies can have a negative effect on the convergence of the solver. For a discretize-then-optimize approach one can (by construction) guarantee that these operators are consistent, transposes, and symmetric. However, one may, e.g., obtain a different numerical accuracy for the forward and adjoint operators (see, e.g., [87, 54] for examples). For the optimize-then-discretize approach we can choose arbitrarily accurate solvers for the forward and adjoint problems. We refer to [78, 34] for more remarks on the discretization of optimization and control problems. For the PDE-constrained optimization problems considered in this study, we have experimentally observed that the discretization errors are below the tolerances relevant in practical application (e.g., a relative change of the gradient of ). If we are interested in solving the inverse problems more accurately (e.g., up to ) our solver may fail to converge. If more accurate solutions are required, we will have to consider PDE solvers that preserve the properties of the continuous operators. For the diffeomorphic registration problem, we present a discretize-then-optimize implementation in [132]; works of other authors that consider a discretize-then-optimize strategy can be found in [11, 25, 96].

5.1.1 Diffeomorphic Registration

We assume that is stationary and divergence-free. We introduce the Lagrange multipliers , , , to obtain the Lagrangian functional171717We neglect the periodic boundary conditions for simplicity.


We obtain the optimality system by taking variations with respect to the adjoint, state, and control variables, and applying integration by parts. We use a reduced space formulation. That is, we assume that the state and adjoint equation are fulfilled exactly (see below) and only iterate on the reduced space of the velocity . Accordingly, we require vanishing variations of the Lagrangian with respect to the control variable , i.e.,

This is the strong form of the control or decision equation (i.e., the reduced gradient) of our problem; is an integro-differential operator of the form


The operator originates from the regularization operator in (3); depending on its order we arrive at an elliptic [129], biharmonic [128] or triharmonic [39] integro-differential control equation. The pseudo-differential operator is a projection onto the space of divergence-free velocities. It originates from the elimination of the constraint and the adjoint variable from our optimality system (see [128, 129]). The operator is given by (Leray projection). If we neglect the incompressibility constraint in our formulation becomes an identity operator. If we introduce a mass-source map into our formulation this operator becomes more involved [129]. Evaluating the gradient (7) for a given trial requires us to find and . We can compute by solving the forward problem (2) forward in time. Once we have found the final state we can compute by solving the adjoint equation equationparentequation


with periodic boundary conditions on backward in time; (8) is a final value problem and models the transport of the residual difference between the final state and the reference image backward in time.181818Instead of solving an advection or continuity equation as done in the Eulerian formulation we present here, it has been suggested to solve for the state and adjoint variables using the diffeomorphism

(which involves interpolation operations instead of the solution of a transport equation) 


We can use the expression for the reduced gradient in (7) within a gradient descent scheme for numerical optimization (see, e.g., [128]). We use a Newton–Krylov method instead. We provide additional details in §B and §C.

5.2 Data Assimilation

The Lagrangian for our problem is given by191919We neglect the Neumann boundary conditions for simplicity.


with Langrange multiplier , and , , and . The unknowns of our problem are the diffusion coefficients and , the growth rate , and the initial condition (and its location). We can derive the adjoint equations for either or all of these variables. For simplicity of presentation, we will derive the optimality conditions for the parameterization of in (4b). A more complete picture can be found in [70]. The strong form of the control or decision equation (i.e., the reduced gradient of our problem) is given by the vanishing first variation of the Lagrangian functional in , where


To obtain , we have to solve the adjoint equation equationparentequation


with Neumann boundary conditions on , backward in time. Notice that the final condition in (11b) depends on the final state at , i.e., requires the solution of the forward problem (4).

5.3 Discretization

We use the trapezoidal rule for numerical quadrature with respect to space and with respect to time. We assume that the functions in our formulation (including images) are periodic and continuously differentiable.202020The data assimilation problem in §3 requires Neumann boundary conditions on , with . We use a penalty approach to approximate these boundary conditions. We apply periodic boundary conditions on and set the reaction and diffusion coefficients in (4a) to sufficiently small penalty parameters and outside of ; see, e.g., [70, 100, 127, 133]. We apply appropriate filtering operations and periodically extend or mollify the discrete data to meet these requirements. We use regular grids to discretize , , . The spatial grid consists of , , grid points , , , . We consider a nodal discretization in time, which results in discretization points. We use a spectral projection scheme for all spatial operations. That is, we approximate spatial functions as , where , , , is the grid index and are the spectral coefficients of . The mappings between the spatial and spectral coefficients are done using FFTs. We approximate spatial derivative operators by applying the appropriate weights in the spectral domain. This scheme allows us to efficiently, stably, and accurately apply differential operators and their inverses.212121Note that the - condition for pressure spaces arising in finite element discretizations of Stokes problems [35, p. 200ff.] is not an issue with our scheme (incompressibility constraint in diffeomorphic registration problem).

5.4 Numerical Time Integration

The iterative solution of the optimality conditions of our problems requires a repeated solution of PDE operators of mixed type. Here we summarize the time integration schemes for the parabolic or hyperbolic PDEs that appear in our optimality systems. In general we opt for unconditionally stable schemes to reduce the number of timesteps. This will not only reduce the computational work load but also reduce the memory requirements; evaluating the Hessian operators that appear in our scheme requires us to store at least one space-time field (the state variable).222222A possible remedy is to employ check-pointing or domain decomposition strategies [1, 75, 93].

5.4.1 Parabolic PDEs

We use an unconditionally stable, second-order Strang-splitting method to solve the parabolic equations (see also [70, 99]). We explain this for the forward problem (4) in Alg. 1. Let denote the tumor discretized distribution at time , . We apply an implicit Crank–Nicolson method for diffusion (lines 4 and 6 in Alg. 1) and solve the reaction part analytically (line 5 in Alg. 1). A similar splitting strategy is used in [92]. In other related work, conditionally [47, 203] and unconditionally stable [127, 133] explicit schemes were considered.

1:  input: , ,
2:  ;
3:  for  do
4:      solve in
5:      solve in analytically with initial condition
6:      solve in
7:  end for
Algorithm 1 Strang-splitting for the solution of the forward problem in (4). We denote the discretized diffusion operator by .

We use a preconditioned conjugate gradient (PCG) method with a fixed tolerance of to solve the diffusion steps. We use a preconditioner based on a constant coefficient approximation of given by , where is the average diffusion coefficient. The inversion and construction of has vanishing computational cost due to our spectral scheme; it only requires a pointwise multiplication in the Fourier domain and a single forward and backward FFT.

5.4.2 Hyperbolic PDEs

We employ a semi-Lagrangian scheme [60] to solve the hyperbolic transport equations of the form . In the context of diffeomorphic image registration, semi-Lagrangian schemes have, e.g., been used in [20, 39, 130, 131]. They are unconditionally stable. This allows us to keep the number of timesteps small and by that significantly reduces the memory requirements. We illustrate this scheme for the forward problem (homogeneous case) in Alg. 2; additional details can be found in [130]. The algorithm consists of two steps: In a first step we have to compute the characteristic by solving in with final condition at backward in time. The second step is to solve an ODE for the transported quantity along ; i.e., we have to solve . Both of these steps require the evaluation of scalar or vector fields along the characteristic , i.e., at locations that do not coincide with grid points ; this involves interpolation (see lines 3 and 6 in Alg. 2 for an example). We use an explicit, second-order accurate Runge–Kutta scheme to integrate these ODEs in time, and a cubic interpolation model in space.

1:  input: , , ,
5:  for  do
7:  end for
Algorithm 2 Semi-Lagrangian scheme for the solution of the forward problem in (2). Note, that we have for the forward problem.

5.5 Numerical Optimization

We have derived the expressions for the optimality conditions in §5.1. Here, we revisit the generic formulation used in (1). The specific operators for the two problems can be found in §5.1 and §C.

In general, we require that the gradient with respect to the control variable vanishes; i.e., for an admissible solution to (1). We use a globalized, inexact [53, 56], preconditioned, matrix-free, reduced space232323By reduced space we mean that we iterate only on the reduced space of the control variable of our problem as opposed to so called full-space or all-at-once methods (see, e.g., [24, 29, 30, 83, 95] for more details). Gauss–Newton–Krylov method for numerical optimization. The Newton step is (in general format) given by


where .242424The order of the optimality system depends on the problem. For the diffeomorphic registration case the control variable is given by the velocity field , i.e., ; for the tumor case is given by , i.e., . We compute the search direction by solving the reduced space Karush–Kuhn–Tucker (KKT) system iteratively using a PCG method. Here, , , , is a discrete representation of the Gauss–Newton approximation (see Rem. 5.5) to the reduced Hessian at Newton (or outer) iteration index . We globalize our scheme based on a backtracking line search subject to the Armijo condition (we use default parameters; see [152, algorithm 3.1, p. 37]).252525Our implementation also features a trust region method.

Since we use a matrix-free Krylov-subspace method to iteratively solve for the search direction , we only require an expression for the application of the Hessian to a vector (Hessian matvec). In Newton-type methods, most work is spent on the (iterative) solution of the KKT system; it is the computational bottleneck of our solver. For both of our problems, each Hessian matvec involves the solution of two PDEs: the incremental state and adjoint equations; this is costly. Therefore, it is essential to design an effective preconditioner to reduce the number of Hessian matvecs we have to perform. We summarize the overall algorithm in Alg. 3 (Newton step) and the steps necessary to solve (12) in Alg. 4 (see §B). We provide expressions for the Hessian matvec for the two individual problems in §C, respectively.

The preconditioner for the reduced space Hessian of the tumor problem uses, likewise to the preconditioner for the forward problem described §5.4.1, a constant coefficient approximation for the diffusion operators (see [70]). For the registration problem we use the inverse of the regularization operator as a preconditioner (a common choice in PDE-constrained optimization [3, 128, 131, 132]). The preconditioned Hessian is a compact perturbation of identity; for smooth data and sufficient regularity of (i.e., we can resolve the problem), we observe a rate of convergence that is independent of the mesh size. In recent work, we have presented a 2-level (multigrid inspired) preconditioner. The convergence for both preconditioners is not -independent. Details for the 2D case can be found in [130]. The results presented in this study are in 3D and computed using as a preconditioner.


We use a Gauss–Newton approximation to the Hessian to ensure that the operator is positive definite far away from the optimum. The Gauss–Newton approximation is obtained by dropping all expressions in the evaluation of the Hessian that involve the adjoint variable . (We provide a precise definition of the individual operators in §C.) We expect that the rate of convergence of our scheme drops from quadratic to superlinear using this approximation. We recover local quadratic convergence as (i.e., the mismatch goes to zero, and we have solved the problem). We observe this for synthetic problems. For problems involving real data, the data will be perturbed by noise. In fact, we may not only have to deal with noise perturbations, but also with intensity drifts (which we can correct for in a preprocessing step). Based on these perturbations, we may not recover local quadratic convergence.

5.6 Implementation Details

Generally speaking, (interesting) images are functions of bounded variation. Our scheme cannot handle this type of discontinuities in the data. We assume that all fields that appear in our formulations are smooth, compactly supported functions. We ensure this numerically by presmoothing the data, and applying appropriate extensions to the data, if necessary. For the registration problem we estimate the regularization parameter based on a parameter continuation scheme [84, 86, 128]. For the tumor problem we use a similar approach—the L-curve method [88] (see [70] for an example). We terminate the inversion if the relative change of the gradient of our problem reaches a user defined tolerance. We typically choose values between and for real world applications.


From a numerical optimization point of view, we want to drive the gradient to zero. In diffeomorphic image registration problems on real data (mainly multi-subject neuroimaging applications) we have observed that, while the solution may still change, the mismatch seems to be quite stable after reducing the gradient by roughly one order of magnitude. Reducing the gradient further does, in our experience, not dramatically alter the mismatch (which is what practitioners mostly care about).

We have developed and presented prototype implementations of our nonlinear solver in Matlab. These solvers are described in detail in [128, 130, 70]. We summarize the Newton–Krylov scheme in §B. Our parallel code [129, 71] is implemented in C++, and uses the message passing interface (MPI) for parallelism. We use a 2D pencil decomposition for 3D FFTs [74, 51] to distribute the data among processors. We use PETSc for linear algebra operations [17], PETSc’s TAO package for the nonlinear optimization [149]

, and AccFFT for Fourier transforms 

[69] (a library for parallel FFTs developed by our group). Our code implements the operations for evaluating the objective functional, the gradient, and the Hessian matvec, and the interfaces to set tolerances and control the stopping criteria. The main computational kernels of our scheme are the FFT used for spatial differentiation and the interpolations in the semi-Lagrangian scheme. These kernels, and their parallel implementation, are described in detail in [69, 131, 71].

6 Numerical Experiments

We showcase some numerical experiments to illustrate the behavior of our solvers next.

6.1 Diffeomorphic Image Registration

A study for the performance of our Gauss–Newton–Krylov scheme and a comparison to (Sobolev) gradient descent type approaches can be found in [128]. We could demonstrate that our Gauss–Newton–Krylov scheme yields a significantly better rate of convergence. A study on registration quality as a function of different regularization operators can be found in [129]. In this work, we showed that we can generate high-fidelity registration whilst maintaining well-behaved , by constraining local volume change. A study of the performance of our semi-Lagrangian scheme in combination with a two-level preconditioner be found in [130]. Here, we could achieve a speedup of up to 20x compared to our original solver [128]. All this work is limited to the 2D case and implemented in Matlab. In this section, we present results for the implementation of a memory-distributed solver for the 3D case. For a study on the scalability of this solver on supercomputing platforms we refer to [131, 71]. In [71] we achieved a 8x speedup compared to our original implementation [131]. With this work we pave the way for solving diffeomorphic image registration problems on real clinical data sets for clinically relevant problem sizes in real-time—with a time to solution of under two seconds for 50 million unknowns (a grid size of ) using 512 cores on TACC’s Lonestar 5 system (2-socket Intel Xeon E5-2690 v3 (Haswell) with 12 cores/socket, memory per node). We were also able to solve an inverse problem of unprecedented scale with 200 million unknowns (a problem size, that is 64x bigger than in [131]) using 8192 cores on HLRS’s Hazel Hen system (2-socket Intel Xeon E5-2690 v3 (Haswell) with 12 cores/socket, memory per node; we summarize these results in Tab. 1).

problem size cores runtime efficiency 2 100.00 8 87.61 32 81.37 128 79.54 512 60.50 10243 128 100.0 20483 1024 93.6 40963 8192 82.9
Table 1: Scaling results for diffeomorphic image registration. Top block (left): Strong scaling on TACC’s Lonestar 5 system for 3D brain imaging data. Bottom block (left): Weak scaling results on HLRS’s HazelHen system. We report (from left to right) the number of discretization points in space, the number of cores (MPI tasks), the runtime (in seconds for the entire inversion), and the strong and weak scaling efficiency (in %). On the right we plot the strong scaling performance. We also show the ideal performance and the time spent on computing FFTs and evaluating the interpolation operator. Results originally reported in [71].

In what follows, we will showcase some new results for our algorithm for diffeomorphic registration [128, 129, 130, 131]. We will start by examining the capabilities of our solver of modeling large diffeomorphic deformations in 3D. We will then consider synthetic (smooth) and real brain imaging data. We will report results that showcase the performance of our solver on moderate computing platforms.

6.1.1 Default Parameters

We normalize the images to an intensity range of

prior to registration. If we consider data with sharp edges or real images, we apply a Gaussian smoothing operator with standard deviation of one voxel per spatial direction. We perform the registration in double precision. The number of time steps for the PDE solver is fixed to

. We invert for a stationary velocity field . We use a PCG method to solve for the search direction with a superlinear forcing sequence. We use the inverse of the regularization operator as a preconditioner. The results presented below are obtained on a single node of CACDS’s Opuntia cluster (2-socket Intel Xeon E5-2680v2 at with 10 cores/socket and memory). We do not perform any grid, parameter, or scale continuation. All results reported in this study are computed on the original resolution of the images.

6.1.2 Sphere-To-Bowl

We showcase results for a benchmark problem for large deformation diffeomorphic image registration algorithms.

Setup: We consider the synthetic registration problem proposed in [43]—a standard benchmark problem for diffeomorphic image registration. We illustrate the template and reference image in Fig. 4. The task is to register a sphere to a bowl. The images are discretized using a grid of size ( unknowns). We use an -regularization model and invert for a stationary velocity field .

Results: We illustrate the results in Fig. 5. We show three orthogonal planes through the center of the 3D volumes in Fig. 4 (top to bottom: coronal, axial and sagittal views). Each column in Fig. 5 shows (from left to right) (i) the reference image, (ii) the template image, (iii) the mismatch between the reference image and the template image before registration, (iv) the mismatch between the template and the reference image after registration, (v) the determinant of the deformation gradient, i.e., the determinant of the Jacobian of the deformation map , and (vi) an illustration of the deformation pattern . The mismatch is large in black areas (corresponds to 100%) and small in white areas (corresponds to 0%). The color map for the determinant of the deformation gradient is illustrated in Fig. 5. Black corresponds to a , orange to and white to . In Fig. 6 we show the evolution of the deformation. That is, we solve the forward problem from to based on the found velocity field . We limit this visualization to the sagittal view. We display intermediate states for . In the top row, we show the mismatch between the state variable and the reference image . In the two bottom rows, we show the state variable itself and the state variable with a grid in overlay to illustrate the deformation. The configuration at is equivalent to the results visualized in Fig. 5.

Figure 4: Synthetic registration problem as originally proposed in [43]. We show two and three-dimensional (volume rendering) illustrations of the template image (left; sphere) and the reference image (right; bowl). The two-dimensional illustration is a plane cut through the center of the three-dimensional objects (sagittal cut; see Fig. 5).
Figure 5: Results for the registration problem illustrated in Fig. 4. We show (from left to right) the reference image , the template image , the residual differences before registration, the residual differences after registration, a pointwise map of the determinant of the deformation gradient, and the deformed template image with an illustration of the deformed grid in overlay. Each row corresponds to a different view of the 3D volume (from top to bottom: coronal, axial, and sagittal). The map is diffeomorphic as judged from the visualization of the deformed grid as well as the values for the determinant of the deformation gradient (in ; colormap illustrated on the top). Notice that the deformation map illustrated on the left is the pullback map (depicts, where points originate from), whereas the map for the determinant of the deformation gradient corresponds to the inverse of this deformation map.
Figure 6: Results for the registration problem illustrated in Fig. 4 (sagittal view only). We show (from left to right) the solution of the forward problem for the estimated velocity field at different time points. The bottom row shows the deformed template image as it evolves in time (bottom row: grid to illustrate the deformation pattern in overlay). The top row shows the mismatch between the deformed template image and the reference image (see Fig. 4 left column for an illustration of the reference image). White areas indicate a vanishing mismatch and black areas a large mismatch.

Observations: The results reported in Fig. 5 and Fig. 6 illustrate that we can recover large deformations. We reduce the mismatch between the transported template image and the reference image significantly. The mismatch is almost zero everywhere after registration. Only at the sharp corners of the bowl and along its boundary we can observe residual differences between the deformed template image and the reference image (dark areas). In [132] (in a 2D setting) we experimentally observed that using time-dependent velocities may help to further reduce this mismatch.

Careful visual inspection of the evolution of the deformation (see Fig. 6) suggests that we obtain a well behaved map . This is confirmed by the values for the determinant of the deformation gradient; the determinant of the deformation gradient does not change sign and is non-zero for all . The extremal values are (. We conclude that the deformation map is locally diffeomorphic (up to numerical accuracy) as judged by these values.

6.1.3 Smooth Synthetic Data

We report results for a smooth synthetic test problem to demonstrate the performance of our solver under idealized conditions (smooth data and smooth velocity field without the presence of noise).

Setup: The template image is given by . The velocity field is given by , , , . The reference image is computed by transporting the template image with . We solve the inverse problem on a grid size of ( unknowns). We set the tolerance for the relative reduction of the gradient to and the absolute tolerance for the gradient to (not reached). We use an regularization model with a regularization parameter of . We set the maximal number of Newton iterations to 50 and the maximal number of Krylov iterations to 100.

Results: We report the convergence behaviour of our solver in Fig. 7 (top row). We show (from left to right) the trend of the relative values (normalized with respect to the values obtained at the first iteration) for the mismatch, the -norm of the gradient, and objective values with respect to the Gauss–Newton iteration index. The last graph shows the residual of the PCG per Gauss–Newton iteration (from Gauss–Newton iteration 1 (light blue) to Gauss–Newton iteration 5 (dark blue)), respectively.

Observations: Our solver converges after six Gauss–Newton iterations. Overall, we require 73 Hessian matrix vector products and a total of 168 PDE solves. We require 1, 1, 4, 9, 23, and 35 PCG iterations per Gauss–Newton iteration, respectively (see also Fig. 7; right column at the top). We reduce the gradient by to an absolute value of (the initial norm of the gradient is ). The runtime of our solver is ( ) on one node with 20 cores. We reduce the mismatch from to . Considering the plots in Fig. 7, we can observe that we mismatch, the gradient, and the objective drop rapidly. We converged to a stationary value of the objective functional and the mismatch after only 4 iteration. The relative reduction of the gradient at iteration 4 is (). During the last three iterations the gradient is still significantly reduced, but the mismatch and the objective value only change marginally. Considering the residual for the PCG iterations, we can see that the residual not only drops fast, but also that at each new Gauss–Newton iteration, we start almost at the same residual we terminated the former iteration with. That is, by using a superlinear forcing sequence we do not seem to oversolve for the Gauss–Newton step for this particular test problem. This changes when we move to real data.

6.1.4 Real Brain Imaging Data

We report results for real brain imaging data in a challenging multi-subject registration problem (i.e., the registration of brain imaging data of different individuals).

Setup: The images are taken from the non-rigid image registration evaluation project [41]. This repository contains 16 rigidly aligned T1-weighted MRI brain data sets of different individuals with a grid size of ( unknowns).262626Additional information on the data sets, the imaging protocol, and the pre-processing can be found in [41] and at http://www.nirep.org/. We show one dataset of this repository in Fig. 7 (dataset na01 (reference image) and dataset na03 (template image)). We consider the datasets na01 through na05 for these experiments; na01 serves as the reference image for all experiments.

We consider an -regularization model in combination with a penalty on the divergence of the velocity field (-div regularization model). The regularization parameter for the Sobolev norm for the velocity is set to . The regularization parameter for the penalty on the divergence of the velocity is set to . These values were chosen by experimentation. We set the maximal number of Newton iterations to 50 and the maximal number of Krylov iterations to 100. We use a tolerance of for the relative change of the gradient and set the absolute tolerance for the gradient to (not reached).


We use tighter tolerances since we found by experimentation that decreasing the gradient further, significantly increases the runtime. The reduction in mismatch associated with this increase in runtime does, from our perspective, not justify the use of smaller tolerances for the reduction of the gradient.

Results: We report the convergence behaviour of our solver in Fig. 7 (top row). We show results for four registrations (na0, , to na01). We display (from left to right) the trend of the relative values (normalized with respect to the values obtained at the first iteration) for the mismatch, the -norm of the gradient, and objective values, with respect to the Gauss–Newton iteration index. The last graph shows the residual of the PCG per Gauss–Newton iteration (from Gauss–Newton iteration 1 (light blue) to Gauss–Newton iteration 5 (dark blue)), respectively. The last plot is only for the registration of the dataset na02 to the dataset na01. We provide additional measures of performance in Tab. 2. We illustrate results for an exemplary dataset (na03 to na01) in Fig. 8.

Figure 7: Convergence for a synthetic problem (top row) and MRI brain data (bottom row; the different colors for the first three plots from the left correspond to different datasets). From left to right: We report the trend of () the relative mismatch, () the relative gradient norm (the dashed red line indicates the tolerance for terminating the optimization), and () the objective functional per Gauss–Newton iteration, respectively. We also show (right column) the trend of the residual for the PCG iterations (the different shades of blue represent the trend of the PCG residual for each Gauss–Newton iteration; the iteration count increases from bright to dark).
dataset matvecs PDE solves mismatch runtime
na02 6 143 308 (00)
na03 9 255 563 (0)
na04 8 203 434 (0)
na05 6 137 296 (0)
Table 2: Registration performance for MRI brain data of size ( unknowns). The results correspond to those reported in Fig. 7. We report the number of Gauss–Newton iterations , the number of Hessian matvecs, the number of PDE solves, the relative mismatch after registration, the -norm of the gradient, the relative reduction of the -norm of the gradient, and the runtime. The results are obtained on a single node of CACDS’s Opuntia cluster (2-socket Intel Xeon E5-2680v2 at with 10 cores/socket and memory). We do not perform any grid, scale, or parameter continuation.
Figure 8: Results for a real-world registration problem. We show (from left to right) the reference image , the template image , the residual differences before registration, the residual differences after registration, a pointwise map of the determinant of the deformation gradient, and the deformed template image with an illustration of the deformed grid in overlay. Each row corresponds to a different view of the 3D volume (from top to bottom: coronal, axial, and sagittal). Notice that the deformation map illustrated on the left is the pullback map (depicts, where points originate from), whereas the map for the determinant of the deformation gradient corresponds to the inverse of this deformation map.

Observations: Our solver converges after 6 to 9 Gauss–Newton iterations for these datasets for a tolerance of for the relative reduction of the gradient. We require between 296 and 563 PDE solves (this includes the solves for the evaluation of the objective functional, the gradient, and the Hessian matvecs) and 137 up to 255 Hessian matvecs (see Tab. 2). The mismatch is reduced by a factor of up to . We can see that the mismatch drops quickly during the first few iterations. After about 6 iterations the reduction stagnates. The same is true for the objective functional. From these plots we can also see that further reducing the gradient will not lead to a significant reduction in the mismatch (which is what we care about from a practical perspective). For some of the runs we can also see that the reduction in the gradient starts to stagnate. If we use a smaller tolerance for the gradient, the runtime will increase significantly. Overall, we obtain a runtime of 15 minutes up to 30 minutes. This is not competitive with certain existing methods for diffeomorphic registration ([200] is a prominent example). We note, that these methods do, in general, not monitor the reduction of the gradient. They just perform a fixed number of iterations. Another key difference is, that these approaches perform a grid continuation, something we have not considered here. We expect that this will significantly reduce the runtime, given our solver has a complexity of , where is the number of unknowns.

In Fig. 7 we can also see that for the considered dataset the residual for the PCG iterations drops steadily, but slowly (these results are for the registration of the dataset na02 to dataset na01). We require 14, 17, 18, 21, 30, and 43 PCG iterations per Gauss–Newton iteration, respectively. Qualitatively, we can observe that the template and reference image are in good agreement (see Fig. 8). The deformation map is locally diffeomorphic as judged by the value for the determinant of the deformation gradient. The visualization of the deformed grid also illustrates a well behaved deformation map.

6.2 Data Assimilation

We study the performance of our formulation for synthetic test problems, for varying noise perturbations and detection thresholds.

Setup: We apply different levels of noise and consider partial observations of the computed states. We assume observations at two time points ( and ). We consider different thresholds for the partial observations since there is no consensus in the literature on how to model th