Atlas-ISTN: Joint Segmentation, Registration and Atlas Construction with Image-and-Spatial Transformer Networks

12/18/2020 ∙ by Matthew Sinclair, et al. ∙ 6

Deep learning models for semantic segmentation are able to learn powerful representations for pixel-wise predictions, but are sensitive to noise at test time and do not guarantee a plausible topology. Image registration models on the other hand are able to warp known topologies to target images as a means of segmentation, but typically require large amounts of training data, and have not widely been benchmarked against pixel-wise segmentation models. We propose Atlas-ISTN, a framework that jointly learns segmentation and registration on 2D and 3D image data, and constructs a population-derived atlas in the process. Atlas-ISTN learns to segment multiple structures of interest and to register the constructed, topologically consistent atlas labelmap to an intermediate pixel-wise segmentation. Additionally, Atlas-ISTN allows for test time refinement of the model's parameters to optimize the alignment of the atlas labelmap to an intermediate pixel-wise segmentation. This process both mitigates for noise in the target image that can result in spurious pixel-wise predictions, as well as improves upon the one-pass prediction of the model. Benefits of the Atlas-ISTN framework are demonstrated qualitatively and quantitatively on 2D synthetic data and 3D cardiac computed tomography and brain magnetic resonance image data, out-performing both segmentation and registration baseline models. Atlas-ISTN also provides inter-subject correspondence of the structures of interest, enabling population-level shape and motion analysis.



There are no comments yet.


page 5

page 11

page 12

page 14

page 15

page 17

page 21

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1:

An overview of the Atlas-ISTN framework in training (top) and test time refinement (bottom) illustrated with CCTA image data. During training, both ITN and STN weights are optimized by leveraging ground-truth labelmaps for each subject, while the atlas is updated at the end of each epoch. A symmetric loss is used to register the atlas to the subject and vice versa. At test time, instance-specific optimization (indicated by *) leverages the ITN prediction as a target labelmap to update the STN weights, providing a refined transformation from the atlas to a given subject. Spurious segmentations in the ITN prediction can be circumvented via registration of the atlas to the subject. References to related sections and figures are provided in grey text. Red boxes: atlas labelmap (deformed and undeformed). Blue boxes: input image or corresponding GT labelmap (deformed and undeformed). Green boxes: ITN logits (deformed and undeformed).


Image segmentation and registration have long been important tools for biomedical image analysis (Maintz and Viergever, 1998; Pham et al., 2000). Deep learning models such as U-nets (Ronneberger et al., 2015) have emerged as the state-of-the-art for segmentation, with their ability to learn rich feature representations for accurate pixel-wise segmentations on challenging image datasets when trained with large labelled sets of 2D and 3D images. One challenge with such segmentation models however is their sensitivity to image noise and artefacts, which can yield spurious and topologically implausible segmentations at test time. Furthermore, such undesirable predictions are made more likely with fewer training examples. Numerous recent works have sought to tackle this with post-processing (Kamnitsas et al., 2017; Larrazabal et al., 2020), anatomical constraints in training (Oktay et al., 2018), and novel regularizers or loss terms (Xu and Niethammer, 2019; Clough et al., 2019), to name a few. While such approaches have improved topological plausibility of segmentation model predictions, none guarantee a target topology at test time.

More recently, a growing body of research has explored the use of deep learning models for image registration for the purpose of image segmentation. Deep learning registration models typically learn to predict a dense deformation field to register a pair of images, which can be used to propagate a labelmap (of known topology) from a source to a target image for the purpose of segmentation. Most such methods rely on a single pass of a trained model at test time to predict a deformation field (Dalca et al., 2018; Lee et al., 2019; Balakrishnan et al., 2018; Dalca et al., 2019a, c; Dong et al., 2020; Mansilla et al., 2020). The accuracy of a warped segmentation to a target image has been shown to improve with test time optimization of the registration network’s parameters, particularly in settings of limited training data (Balakrishnan et al., 2018; Lee et al., 2019). Dalca et al. (2019a) proposed a framework to learn a conditional atlas image jointly with a model to perform registration of the atlas with target images. These recent image-registration driven approaches used for segmentation are however not commonly benchmarked against pixel-wise segmentation models, and have fallen short in relevant metrics when they have been (Lee et al., 2019; Xu and Niethammer, 2019).

Inspired by features of each of these approaches, we propose Atlas-ISTN, a framework that benefits from the detailed predictions of pixel-wise segmentation while circumventing the effects of noise and artifacts via registration of a learned topologically consistent, population-derived multi-class atlas labelmap (Fig. 1, upper panel) to an initial pixel-wise segmentation. Additionally, Atlas-ISTN leverages test time refinement of model parameters to optimize for the registration of the constructed atlas labelmap to a predicted segmentation labelmap, improving over 1-pass test time performance (Fig. 1, lower panel). This framework also simultaneously guarantees topology of the structures-of-interest (SoI) and provides atlas-space correspondence between subjects for further population-level analysis, all while being contained in a single model framework with straight-forward training and test time deployment.

Related work

Medical image segmentation models can be broadly categorized into pixel-wise prediction, shape fitting and registration-based methods. A rich body of literature exists for methods which use image registration as a means to segment a target image, which build on two main approaches: multi-atlas segmentation (MAS) (Išgum et al., 2009; Kirişli et al., 2010; Iglesias and Sabuncu, 2015), and construction and registration of a statistical shape model (SSM) (Heimann and Meinzer, 2009; Young and Frangi, 2009). MAS has proven to be highly effective, and provides competitive performance with modern deep learning methods for segmenting large structures of the heart from 3D cardiac computed tomographic angiography (CCTA) and magnetic resonance imaging (MRI), albeit in a setting with limited training data (Heinrich and Oster, 2018; Zhuang et al., 2019). MAS has also been used effectively for segmentation of heart structures in CCTA in a large-scale multicenter/multivendor evaluation (Kirişli et al., 2010). A down-side of MAS is the high computational overhead at test time, which can involve registration, selection and sophisticated fusion of multiple atlas labelmaps to a target image to achieve best performance (Išgum et al., 2009).

Methods using SSMs on the other hand typically construct a template/atlas in the form of a mean image, labelmap, or mesh, which at test time is registered to a target image, with the option of leveraging a segmentation of the target image in the registration process (Medrano-Gracia et al., 2014; Bai et al., 2015). Advantages of SSMs include providing correspondence to a common atlas space for population-level analyses of shape and motion, and using an unbiased atlas tends to perform better for segmentation than warping a given training case. For cardiac data, given the significant variability in heart orientation, size and morphology observed in CCTA and cardiac MRI, a two-stage (i) affine and (ii) non-rigid registration approach is typically used, sometimes requiring the definition of anatomical landmarks to guide the first stage of registration (Bai et al., 2015)

. SSMs are also often parameterized with Principal Component Analysis (PCA), and the PCA modes can be optimized to fit a SSM to a target image or segmentation

(Heimann and Meinzer, 2009). Limitations of PCA representations however include over-fitting of the model to limited training data, thus not being able to accurately represent anatomies which lie outside of the training distribution and can include both significant (large-scale) and subtle (small-scale) variations in target anatomies. Both MAS and SSM-based segmentation approaches often involve workflows with multiple processing steps. For example, separate tools are typically used to optionally (1) build a SSM, (2) produce an image segmentation or landmark coordinates from an unseen target image, (3) register the atlas to a target image, segmentation, and/or landmarks (4) select and/or fuse the best atlas labelmap(s). See (Iglesias and Sabuncu, 2015) for a review of MAS and (Heimann and Meinzer, 2009; Young and Frangi, 2009) for a review of SSM works.

Prior to the emergence of powerful convolutional neural networks (CNNs) for pixel-wise segmentation

(Ronneberger et al., 2015; Long et al., 2015), MAS and SSM-based methods were the dominant approaches used for biomedical image segmentation of structures known to adhere to a particular atlas geometry. An advantage of the more traditional approaches is the preservation of topology, where pixel-wise deep learning segmentation models such as the U-net (Ronneberger et al., 2015) or fully convolutional network (Long et al., 2015) can suffer from spurious and anatomically implausible segmentations at test time. Deep learning segmentation models typically improve with more training data, but are still prone to errors due to domain shift and out-of-distribution test cases (e.g. originating from different scanners, sites, acquisition protocols and caused by imaging artifacts). Among the many methods that have been proposed to tackle these challenges, we summarize a few that focus on reducing spurious segmentations and encourage anatomically plausible predictions. Firstly, post-processing steps have been commonly used, such as fully-connected conditional random fields (CRF) and connected component analysis (Kamnitsas et al., 2017), as well as shape-aware denoising auto-encoders (Larrazabal et al., 2020)

. Other approaches include anatomically constrained neural networks (ACNNs) where shape-regularization is enforced with a latent representation of the underlying anatomy via an autoencoder

(Oktay et al., 2018; Chen et al., 2019). One approach leveraged shape priors during training by using smooth 3D segmentation masks produced via atlas-registration to motion-corrupted 2D stacks of short-axis cardiac MR images (Duan et al., 2019), which improved topological accuracy of 3D FCN predictions. Another approach involved simultaneous training of parallel network branches for registration and segmentation, providing a form of regularization on each branch (Xu and Niethammer, 2019). Loss terms which explicitly penalize topologically undesirable predictions using persistent homology (Clough et al., 2019; Byrne et al., 2020) have also been proposed, improving topological accuracy over pixel-wise segmentation models. Incorporation of point cloud prediction as an intermediate representation in a 3D segmentation network has also demonstrated improvements in topological consistency and segmentation accuracy (Ye et al., 2020). Finally, prediction of signed distance fields defining segmentation boundaries has also been proposed to improve segmentation accuracy (Li et al., 2020; Tilborghs et al., 2020). While each of these approaches have their inherent advantages, they do not guarantee a target topology at test time.

Another class of deep learning models which has received growing attention utilises a PCA shape model of surface meshes of training cases, the weights of which can be predicted directly by a CNN for a target image (Milletari et al., 2017; Bhalodia et al., 2018; Tóthová et al., 2018, 2020; Adams et al., 2020). These models have been proposed to predict 3D surface meshes both from 3D images (Bhalodia et al., 2018; Adams et al., 2020) and in settings where only sparse 2D images are available (Milletari et al., 2017; Tóthová et al., 2018, 2020). A hierarchical variational auto-encoder has also been proposed for the latter setting (Cerrolaza et al., 2018), where shape parameters are implicitly encoded in the latent variables. While such models directly encode a topologically consistent structure, as with classic PCA shape models they potentially suffer from over-constraining shape descriptors to the training set and may not be sensitive to subtle anatomical variations or out-of-distribution examples at test time. While these models show promise, with the added benefit of uncertainty quantification (Tóthová et al., 2020; Adams et al., 2020), such models have not been benchmarked against state-of-the-art 3D semantic segmentation models in the setting where dense 3D images are available.

Recently, there has also been a growing interest in the field of deep learning for image registration (Haskins et al., 2020)

, following the seminal work on spatial transformer networks (STN)

(Jaderberg et al., 2015), with a number of models proposing to use registration to propagate labelmaps to target images as a means of segmentation (Lee et al., 2019; Balakrishnan et al., 2018; Dalca et al., 2019a; Dong et al., 2020; Mansilla et al., 2020)

. A prominent method in the recent literature is the VoxelMorph framework for unsupervised and semi-supervised learning of image registration

(Balakrishnan et al., 2018)

. This framework uses an encoder-decoder type CNN to predict a dense displacement field used to register a source image to a target image, which can also be used to propagate a source labelmap to the target image. The model can be trained in a fully unsupervised setting with loss terms including image similarity, and penalties on deformation field smoothness and magnitude. Auxiliary losses that consider labelmap overlap have also been used for semi-supervised learning of image registration

(Balakrishnan et al., 2018; Lee et al., 2020), where a subset of the training data may have labelmap annotations. VoxelMorph proved highly effective for test time image registration when trained on a large set of brain MR images, with the observation that instance-specific optimization of the predicted displacements at test time produced improved performance, particularly when fewer training examples were available (Balakrishnan et al., 2018). Mansilla et al. (2020) recently proposed AC-RegNet, an image registration model regularized with a shape-aware auto-encoder conditioned on labelmaps during training, which yielded more anatomically plausible predicted displacement fields at test time for 2D lung X-ray images compared to a pixel-wise baseline. They also demonstrated that AC-RegNet could be used for multi-atlas segmentation. Dong et al. (2020) proposed a deep learning framework with adversarial consistency for registration of a pre-defined population-derived atlas image and labelmap to a target image. Adversarial image and labelmap pairs were used to encourage the model to predict more accurate deformations, and both affine and non-rigid transformations were predicted by separately trained CNNs. The model was trained and evaluated using a limited set of 3D echocardiography images with annotations of the left ventricular myocardium. The performance of the proposed method improved over state-of-the-art voxel-wise segmentation methods, although only 25 cases were used for training with limited data augmentation (only rotations around the -axis), which provides a sub-optimal setting for training a voxel-wise segmentation model. Dalca et al. (2019a) proposed a framework that directly parameterizes a template (or atlas) image volume to be jointly learned with a registration model that registers the template image to target images. The learnable template could also be conditioned on parameters of interest, such as age and sex, and the method was evaluated with a dataset of brain MR images. The template image can be likened to a statistical atlas image learned from the training set. The authors demonstrated that a template labelmap could be constructed after training by registering training images to the constructed atlas image and propagating their labelmaps and subsequently fusing them. Image segmentation at test time was then performed by propagating the constructed template labelmap to target images, producing promising results compared to VoxelMorph (Balakrishnan et al., 2018).

Finally, the image-and-spatial-transformer network (ISTN) was proposed in (Lee et al., 2019), where an image transformer network (ITN) is trained jointly with a spatial transformer network (STN). This approach proposed to generate intermediate representations of SoI from an input source and target image pair with an ITN, which are passed downstream to the STN to register the inputs. Similarly to VoxelMorph, loss terms optionally included image similarity, deformation field penalties and terms which leveraged ground-truth labelmaps for SoI-guided registration. A demonstrated advantage of ISTN is that the STN parameters can be optimized on specific instances at test time to register ITN predictions of a source and target image pair, which improves overlap of the SoI compared to both a learned registration of the images via their intermediate representation (i.e. a single pass of the STN at test time) as well as a model that registered images without intermediate representations, such as Voxelmorph (Balakrishnan et al., 2018).

Figure 2: Intensity projections of images (top) and labelmaps (bottom) of a CCTA training case (left) and the constructed atlas (right). Structures depicted include the left ventricle myocardium (red), right ventricle blood pool (blue), right atrial blood pool (yellow) and left atrial blood pool (green).


Atlas-ISTN extends the ISTN framework and other proposed works to jointly learn a population-derived atlas together with a model which performs segmentation and registration. The contributions of the Atlas-ISTN framework are:

  1. An all-in-one deep learning framework for atlas construction, registration and segmentation;

  2. A robust segmentation system which improves over baseline segmentation and registration models;

  3. A method for construction of an unbiased population-derived atlas within a deep learning framework;

  4. Topological guarantees on test time segmentations via registration of a constructed atlas labelmap;

  5. A deep learning framework that provides inter-subject correspondences of SoI via a mapping to atlas space.

2 Methods

For many segmentation tasks, the topology of the target SoI is known a priori. For example, in medical image analysis, the chambers of the heart typically conform to the same spatial arrangement from subject to subject, with variation in shape, size, orientation and wall thickness. Voxel-wise segmentation models do not take advantage of this a priori knowledge, and can produce spurious segmentations or topologically inconsistent predictions, particularly with limited training data. The proposed Atlas-ISTN framework seeks to address this issue by both learning a voxel-wise prediction and learning to fit a topologically consistent atlas labelmap to the SoI, reducing noisy predictions and ensuring topological plausibility.

The main components of the framework consist of the joint training of the ITN and STN (Sec. 2.2), the simultaneous construction of the atlas (Sec. 2.3), and test time refinement of the atlas (Sec. 2.4). These components are put into context below, and described in detail in the following sections.

Similarly to ISTN (Lee et al., 2019), the Atlas-ISTN architecture consists of two sequential blocks: an ITN and a STN. The purpose of the ITN is to learn an intermediate representation from an input image which is useful to a downstream task, for example registration of the SoI (Lee et al., 2019). One option for this intermediate representation is a semantic segmentation, which requires labelled data during training. Such an intermediate representation can also be leveraged at test time for further refinement of the STN parameters.

The STN in the Atlas-ISTN framework learns a spatial transformation to map the ITN prediction to an atlas labelmap and vice versa, in order to maximize the agreement between a ground-truth labelmap and the deformed atlas labelmap (and vice versa). The ITN and STN are described in Section 2.2, and the overall framework is shown in Fig. 1.

Instead of registering pairs of images or intermediate representations as done in many recent works (Lee et al., 2019; Balakrishnan et al., 2018; Dalca et al., 2018; Mansilla et al., 2020), we propose to register predicted labelmaps to an atlas labelmap which is constructed during training. The most closely related recent works include (Dalca et al., 2019a) where an atlas image of the brain is learned during training, optimizing alignment of this constructed atlas image with training images. In (Dong et al., 2020), group-wise registration was first used to generate an atlas image and labelmap that were then fixed during training. In this work, we introduce a method to construct a multi-structure atlas labelmap and image on-the-fly during training, which is generated in tandem to optimizing the parameters of the ITN and STN. This approach draws on ideas from classic atlas construction methods, where images are registered to a reference space and averaged in an iterative procedure (Guimond et al., 2000; Joshi et al., 2004). The proposed approach is described in Section 2.3.

Finally, the Atlas-ISTN framework enables test time refinement in a similar way to the ISTN framework (Lee et al., 2019), but instead of registering the segmentations predicted by the ITN for a pair of input images, the STN parameters can be fine-tuned to maximize the agreement between the ITN prediction of an input image and the deformed atlas labelmap. This helps overcome limitations of the first-pass of a learned registration model at test time, particularly for out-of-distribution cases which often result in poor alignment of SoI. With appropriate choice of test time regularization, refinement allows for improved registration of the atlas labelmap with the SoI predicted by the ITN, circumventing spurious ITN predictions and false negatives. This process is described further in Section 2.4.

In the following sections, we present the transformation model (Sec. 2.1) neural network architecture (Sec. 2.2), the method by which the atlas is learned (Sec. 2.3), and the refinement procedure which can be followed at test time to optimize the STN to improve the fit of the constructed atlas to the ITN prediction (Sec. 2.4).

Figure 3: An overview of the (3D) ITN and STN architectures. Note,

conv. layers (red and green arrows) are followed by a ReLU activation, except for in the final layer of the STN. Outputs of the STN,

and , are passed to the Transformation Computation Module in Fig. 4.

2.1 Deformations

In this work we model both affine and non-rigid deformations, which are commonly used in two-stage atlas registration methods particularly for cardiac image data (Lamata et al., 2014; Bai et al., 2015). We consider the following composition of an affine transformation:


where , and are the translation, rotation and scaling matrices (we exclude shearing in this work). This affine transformation provides a coarse registration between a source and target image. Affine registration is often used as a pre-alignment step (Bai et al., 2015; Dong et al., 2020), providing a more optimal starting point for a non-rigid registration to be performed. This type of pre-alignment is also important for applications with brain images, so much so that it is built into standardized processing pipelines, and thus most recent work exploring learning deformations for brain image data has only utilized a non-rigid component in their registration models (Dalca et al., 2018; Balakrishnan et al., 2018; Krebs et al., 2019). While typically used as a pre-alignment, an affine transformation can also be optimized jointly with a non-rigid transformation (Stergios et al., 2018), where the affine component would be expected to account for large-scale deformations and coarse alignment of source and target images. In addition, by explicitly modeling the global transformation, a rigid normalization of pose is not penalized by the regularity commonly imposed on the local deformations.

As non-rigid component, we choose a diffeomorphic transformation model parameterized by a stationary velocity field (SVF), in order to guarantee preservation of the topology of the constructed atlas after deformation, and to ensure invertibility (Arsigny et al., 2006; Ashburner, 2007). The differential equation describing the evolution of a deformation generated by a SVF denoted by , is given by


The deformation at (

) is obtained by integration of this ordinary differential equation (ODE) over unit time starting with the identity

(Ashburner, 2007). Subsequently, we denote this deformation as to simplify notation. In the theory of Lie groups, solving this ODE is equivalent to computing the exponential map of the flow field (a member of the Lie algebra), i.e.,


The inverse transformation can thus be obtained by the exponentiation of the negative SVF. In practice, the exponential map is computed efficiently via scaling and squaring (Arsigny et al., 2006; Ashburner, 2007).

2.2 Atlas-ISTN Model

Image Transformer Network

Given input images and ground-truth labelmaps, , the ITN learns the mapping


where are the (multi-channel) logits of a labelmap prediction. Both and contain a background channel and as many foreground channels as structures in the training data.

The ITN can be any suitable pixel-wise segmentation model. In this work 2D and 3D U-net models are used (Ronneberger et al., 2015; Çiçek et al., 2016), which consist of convolutional layers in an encoder-decoder format with skip connections between the corresponding spatial scales of the encoder and decoder (Fig. 3

, left). Coarser spatial scales in the encoder are achieved via strided convolutions, and in the decoder bilinear up-sampling is used

111Improved performance was found over transposed convolutions..

Spatial Transformer Network

We propose an STN which learns the mapping


where the concatenation of (1) the ITN logits and (2) the atlas labelmap

make up the input tensor. The STN produces two outputs including (1) a SVF,

, and (2) an affine transformation matrix, , which are subsequently processed and composed into a final deformation field (see Fig. 4) by the function


Similarly to (Stergios et al., 2018), joint prediction and composition of affine and non-rigid deformations by the STN is performed. is represented as a matrix, and has a size of (in the 3D setting), where denotes the input image size in each spatial dimension. Modeling both affine and non-rigid deformations provides greater flexibility222See the next section on Losses. to generate transformations which capture the inherent spatial differences between SoI in many types of patient scans where images are not traditionally pre-aligned for a target anatomical structure, such as with CCTA images.

Figure 4: The Transformation Computation Module. The predicted SVF () is integrated via scaling and squaring, before linear upsampling. The resulting transformation () is composed with the predicted affine parameters (). Computation of the inverse and forward transformations are shown at the top and bottom, respectively. Compositions are defined in Eqs. (7) and (8).

Within the Transformation Computation Module, the forward and inverse transformations are the compositions:


The deformations are applied to each labelmap channel, , independently, such that a transformed labelmap is obtained by . A forward pass through the network via ITN (Eq. (4)), STN (Eq. (5)) and Transformation Computation Module (Eq. (6)) can be expressed concisely as:



A mean squared error (MSE) loss is used for the supervised learning of the ITN weights (), i.e.,


where is the image index and the total number of images. In experiments where an ITN was trained on its own with different losses (not reported in results), similar performance was obtained with a MSE loss compared to using cross-entropy. A MSE loss for was found to produce the best convergence and overall results for Atlas-ISTN333Careful weighting of loss terms was required when training with cross-entropy for while using MSE for and , which still under-performed compared to using an MSE loss for all three terms.. Two additional MSE losses, the atlas-to-segmentation loss () and the segmentation-to-atlas loss () are used:


where denotes the labelmap channel index (and corresponds to the background channel), denotes the case index, and the number of training images. encourages accurate transformation of the atlas labelmap () to the ground-truth labelmap (), despite potential noise in the ITN prediction () which is an input to the STN. encourages accurate transformation of the ground-truth labelmap () to the atlas labelmap (), which aides the atlas learning process (in Section 2.3).

A regularization term is also used to encourage smoothness of the predicted non-rigid deformation fields, i.e.,


The overall loss is


where controls the effect of the regularization term, and controls the overall weighting of the deformation-related loss terms. The segmentation loss () provides gradients only for updating the ITN weights () while all other loss terms also contribute to gradients which update the STN weights (). Note that since only penalizes the non-rigid component of the deformation (), this encourages the affine component of the STN prediction () to learn, and normalize for, global pose and scale, in turn encouraging the SVF to focus on learning local deformations.

Other Practicalities

The SVF is predicted at half the resolution of the input image, which was found to produce smoother deformation fields and better convergence during training, as well as reduced memory overhead. The full resolution deformation fields (as used in Eqs. (7) and (8)) are obtained by linear upsampling, shown schematically in Fig. 4.

Architecturally, the STN is similar to the ITN in terms of number of spatial scales, the encoder-decoder structure and skip connections, as depicted in Fig. 3

. In both components, batch normalization was found to perform worse and was therefore not used. Trilinear upsampling was used rather than transposed convolutions for upsampling throughout the network, which removed checkerboard artifacts and improved overall performance.

In practice, the foreground channels of and are concatenated and passed as inputs to the STN, as we found removing the background channel improved test time performance, improved training stability and reduced memory overhead. The number of input channels to the STN therefore is equivalent to , where is the number of channels (including background) of the training labelmaps. Having multiple input channels for the STN (compared to just 1 for the ITN) led to the use of more filters in the first convolutional layer of the STN, which was found to improve convergence of the STN during training. To reduce memory overhead associated with this increase in filters, the first convolutional layer of the STN has a stride of 2, immediately leading to a spatially reduced feature map. This did not negatively impact model performance, likely due to the fact that is predicted at half the resolution of the input images, where the SVF-derived deformation fields ( and ) are linearly upsampled to the input image resolution.

2.3 Atlas Construction

Unlike in (Dalca et al., 2019a) where a volumetric atlas of the training images is learned over the course of model training, in this work we propose a method to jointly construct volumetric atlases of the training images and labelmaps. This is achieved without explicitly parameterizing atlas voxels with learnable weights as in (Dalca et al., 2019a), but rather we draw inspiration from classic atlas construction work (Guimond et al., 2000; Joshi et al., 2004). The ITN and the STN are trained jointly, and the atlas labelmap () and image () are updated at the end of each epoch by warping training data to atlas space via a forward pass of Atlas-ISTN and averaging across all samples. The update procedure for both atlas labelmap and image are the same, so we denote both images () and labelmaps () by and the respective atlas by in the following. The atlas is initialized and updated by


where denotes the case index, the channel index, the epoch, and the total number of training cases. The atlas is initialized (at ) as the mean of all the (undeformed) training cases. The rate at which the atlas is updated is determined by , and is the mean of the transformed training cases:


At the end of each training epoch, a forward pass through the network (Eq. (9)) is used to warp each training case to atlas space, from which the channel-wise mean atlas is computed (Eq. (16)). The updated atlas labelmap at the end of each epoch, () in Eq. (15), is then used during the following epoch in the computation of the losses (Eq. (11)) and (Eq. (12)). The atlas labelmap and image are shown in Fig. 2 alongside the labelmap and image of a randomly selected training case from the CCTA dataset.

For the proposed Atlas-ISTN model, we focus on the use of the atlas labelmap to improve segmentation performance at test-time, described in the next section. The benefits of learning an atlas labelmap are that (1) an unbiased labelmap is produced through the training process, as opposed to using a fixed atlas, and (2) the final STN is optimized to warp this unbiased atlas labelmap to the ITN prediction, which positions the STN weights in a setting well-suited for test time refinement (as opposed to optimizing STN weights for test-time refinement from scratch). Creating an unbiased atlas via established methods such as group-wise registration (Joshi et al., 2004) as an alternative to (1) could also be done, but is not required since Atlas-ISTN is able to construct the atlas during training.

2.4 Test Time Refinement

At test time, an observation made in (Lee et al., 2019) was that the overlap of source and target image SoI could be further improved with test time optimization of the STN weights to register the source and target ITN predictions. In this work, the STN weights are optimized for the registration of the constructed atlas labelmap to the ITN prediction of a target image, referred to throughout the text as refinement. For a given target image, (Eq. (11)) and (Eq. (12)) can be repurposed to optimize the alignment between the atlas labelmap () and the ITN logits () instead of training labels () used during training,


where the overall refinement loss is given by:


with , and corresponding to weightings on the atlas-to-segmentation, segmentation-to-atlas and regularization terms, respectively. During refinement, ITN weights are fixed and only STN weights are updated. In the presence of noise in , larger values of can encourage a more rigid registration that retains more of the atlas shape, and in turn circumvent spurious segmentations or holes in . In practice we find that refinement without the segmentation-to-atlas loss (i.e. setting ) slightly improves the accuracy of the final deformed atlas while reducing memory overhead. The deformed atlas after refinement is given by:


where for a given input image , is obtained from a forward pass of Atlas-ISTN after refinement,


where denotes the STN with refined weights after refinement iterations minimizing the loss in Eq. (19

) via stochastic gradient descent. This process is illustrated in the lower panel of Fig.


3 Experiments and Results

3.1 Letter B

To illustrate some of the key properties of Atlas-ISTN for image segmentation, let us initially consider a simple toy example based on a binary image of the letter B. This example acts as proof-of-concept and allows us to demonstrate the behaviour of Atlas-ISTN in a controlled setting. We show some qualitative and quantitative results in the following.

Figure 5: The left-most column illustrates the base letter B (top), the initial atlas labelmap at the start of training defined in Eq. (16) at (middle), and the recovered letter B atlas labelmap after training (bottom). The other images are examples from the training set (top row) and clean (middle row) and corrupt test set (bottom row) generated randomly from the base letter B.

Data Description

Starting from a single, binary 2D image of the letter B, we generate warped instances of this base image using random affine transformations composed with random B-spline deformations. For each warped binary image we generate a corresponding intensity image with additive Gaussian noise. We use 1,000 of these image and labelmap pairs for training an Atlas-ISTN. We further use a hold-out set of 100 pairs to test the Atlas-ISTN, once on a clean version of the test set (coming from the same distribution as the training data) and a corrupted version (with random clutter added to the intensity images). Visual examples of the training set and the clean and corrupted test sets together with base letter B, the initial and resulting constructed atlases are shown in Fig 5. This synthetic 2D data is provided together with our source code such that the following results can be fully replicated.

Figure 6: Qualitative results for the 2D toy data with test data coming from the same distribution as the training data. Both the ITN and 1-pass Atlas-ISTN yield accurate segmentations. test time refinement with increasing regularization weight affects the smoothness of the final transformation.
Figure 7: Qualitative results for the 2D toy data with corrupted, out-of-distribution test data. The ITN yields many false positives and false negatives and is topologically implausible. The 1-pass Atlas-ISTN yields a reasonable atlas alignment despite the corrupted input data. test time refinement with increasing regularization weight can yield accurate and topologically plausible segmentations.

Qualitative Results

In the case of clean test data, we make the following observations: As expected, the ITN provides nearly perfect segmentations of the input images, and equally the 1-pass Atlas-ISTN predicts an accurate alignment of the constructed atlas to the ground truth. We run test time refinement with six different regularization weights and observe the effect of increased regularization on the final transformation. This example confirms that when sufficient training data is available, and the test data comes from the same distribution, the 1-pass Atlas-ISTN is en-par with an ITN-based segmentation in terms of accuracy with the added benefit that the resulting segmentations of the Atlas-ISTN come with correspondences to the constructed atlas space. Test time refinement further allows to control the degree of deformation through the regularization weight, but may be considered optional as it may not add significant improvements in segmentation accuracy.

In reality, however, test data often does not come from the exact same distribution as the training data and this may negatively affect the predictive performance of a trained network. Test images, for example, might exhibit variations due to artifacts or pathology which were not captured in the training data. This is simulated here by testing the above Atlas-ISTN on a corrupted version of the test set. We observe that the ITN now fails to accurately segment the structures of interest yielding both many false positives and false negatives. Still, the predicted segmentation may provide useful information for subsequent refinement in our Atlas-ISTN framework. The 1-pass prediction of the Atlas-ISTN is also affected by the noisy ITN output yielding a sub-optimal alignment of the atlas, yet providing a good initial atlas alignment. Here, the benefits of the test time refinement become clear. This test-specific optimization of the STN network weights results in plausible segmentations of the corrupted test images, removing both false positives and negatives and adhering to the topology of the constructed atlas. Again, we observe the effect of the regularization weight which provides control over how closely we wish to stay to the constructed atlas up to affine transformations.

Quantitative Results

We also present quantitative results over the 100 cases for both the clean and the corrupted test sets, summarized in Fig. 8. Metrics used to evaluate the segmentation performance include Dice similarity coefficient (DSC), average surface distance (ASD) in pixels and Hausdorff distance (HD) in pixels. We observe that for the clean data (blue bars) highly accurate segmentations are obtained with all approaches and across the range of values for the regularization weight , indicated by high DSC and low surface distances. For the case of corrupted test data (orange bars), we can see the clear benefit of Atlas-ISTNs with test time refinement. We also observe that the results are not very sensitive to the regularization weight.

Figure 8: Quantitative results for the 2D toy data with test data from the same distribution as the training data (blue bars), and corrupted, out-of-distribution test data (orange bars). The plots show the results using the segmentation metrics DSC (top), ASD (middle), and HD (bottom). On the within-distribution test data all metrics indicate good segmentation accuracy for all approaches and across different regularization weights. For the out-of-distribution data, Atlas-ISTNs with test time refinement out-perform the ITN and the 1-pass prediction by a significant margin with good robustness to the selection of the regularization weight .

3.2 3D Cardiac CCTA

Figure 9: An axial slice through the initial (top row) and final (bottom row) atlas image (first column) and the 6 channels of the atlas labelmap produced when training Atlas-ISTN on multi-label CCTA data.

The above experiments illustrate the behaviour and benefits of the Atlas-ISTN framework on 2D synthetic data. In the following, we are testing the framework on 3D medical imaging data. We focus on segmentation of large structures of the heart from 3D CCTA which is an important step in both calculation of derived clinical indices such as strain (Nicol et al., 2019), as well as providing a computational domain and boundary conditions for simulations of cardiac function and coronary flow (Taylor et al., 2013; Chabiniok et al., 2016). Analysis of Atlas-ISTN is performed with multi-channel labelmaps using a real-world dataset, where ablation studies are performed removing or modifying model components and loss terms from the framework, allowing comparison to baseline segmentation and registration models.

Data Description

1,109 3D CCTA images from multiple sites around the world were used, consisting of commercial cases which were received by HeartFlow, Inc. for analysis (Taylor et al., 2013). All cases were processed through the

production pipeline, one output of which is the segmentation of the left ventricle myocardium (LVM). Segmentation of the LVM involves the manual inspection and correction of an automated segmentation produced by a Random Forest + shape fitting model. 109 high quality images were selected for further 3D annotation of the large structures of the heart using in-house tools. In addition to the LVM, these included the left ventricle blood pool (LV), the right ventricle blood pool (RV), the right atrial blood pool (RA) and the left atrial blood pool (LA).

Image intensities were clipped to [-1000, 1000] Hounsfield units, and linearly rescaled to the range [-0.5, 0.5]. All images were isotropically resampled to ensure through-plane resolution (in the

direction) matched the already isotropic in-plane resolution, and images were subsequently downsampled by a factor of 4. While in-plane dimensions are fixed, images were either padded (for the vast majority of cases) or cropped in the

-dimension444SoI were used to define the crop region for these cases. to obtain volumes of size .

The 109 cases with large structure annotations (LSA) were randomly split into 80 training, 10 validation, and 19 testing cases. The remaining 1000 cases with just the LVM were used for additional testing to provide a more diverse test set on which to assess the performance of Atlas-ISTN. In all experiments, Atlas-ISTN is trained using the same 80 cases with all labelled structures, unless otherwise stated.

Model Settings

PyTorch (Paszke et al., 2019) was used for all model development. Models were trained for 800 epochs555Until convergence on validation data. (or 1200 epochs when using on-the-fly data augmentation) using NVIDIA Tesla V100 SXM2 GPUs (with 32GB memory). The full Atlas-ISTN model was trained with mini-batch size of 8, requiring 22GB of GPU memory. The Adam optimizer was used with a learning rate of , and an exponential learning rate decay with a half-life of 500 epochs was used as this improved model performance. Weighting variables for the training loss in Eq. (14) were set to and . The atlas update rate was in Eq. (15). It was found that introducing the affine component later in training stabilized the initial phase of atlas updates, so the affine component was introduced after 200 epochs. Parameter values for the refinement loss in Eq. (19) were , and , and number of refinement iterations . Refinement often would reach convergence within 30-50 iterations, but 100 iterations were used to ensure all cases had reached convergence. Refinement, performed with a single image at a time, required 3GB of GPU memory and 20s runtime for 100 iterations.

Statistical Analyses

Metrics used to evaluate model performance include Dice similarity coefficient (DSC), average surface distance (ASD) in millimetres and Hausdorff distance (HD) in millimetres. Superiority is shown with a one-sided paired hypothesis test at a significance level of

, where the test statistic is the mean difference of the metric of interest. We checked the rejection of the null hypothesis using the

confidence interval of the test statistic which was estimated with percentage bootstrap using 10,000 repetitions.

Figure 10: Test cases in order of increasing difficulty (top to bottom), displaying an axial slice from a CCTA image. In columns 3, 5 and 7, manual and predicted contours of the LVM are shown in green and red, respectively, while white contours are used for other predicted structures for clarity of the LVM comparison. Column descriptions are provided above. White arrows highlight false positive and false negative LVM predictions from the ITN. Orange arrows highlight errors between the 1-pass deformed LVM and the manual LVM contours. Rows 1-2 contain examples for which ITN, 1-pass and refinement predictions all perform well, representative of a significant proportion of test cases. Rows 3-4 show cases where the ITN produces spurious segmentations, but refinement is able to circumvent this. In row 4 however, the 1-pass LVM contour is visibly offset, while the refinement produces a better fit. Rows 5-6 show challenging cases, where row 5 contains an example where the heart is particularly small in the field-of-view. On top of this, the ITN predicts a large spurious segmentation extending from the base of the LVM. The deformed contours after 1-pass do not fit the target structures accurately, but after refinement the LVM and other chambers are well positioned. Row 6 shows a case which required a significant rotation and translation of the atlas, so much so that in column 2 a different axial slice is shown to include the undeformed atlas. Additionally the LVM wall is particularly thin, where the ITN predicts a hole near the apex, and an over-segmention in the basal septal region. The significant transformation proves challenging for a single pass of the model, resulting in poor alignment of the atlas to the target SoI. The deformed contours after refinement however align well with the target SoI and bridges the ITN’s predicted hole in the LVM (albeit not perfectly fitting to the manual LVM contour). Notice for cases which require larger global transformations (rows 3-6), the deformed grid after refinement contains a more noticeable affine transformation, and the non-rigid component appears to deform the grid less compared to the 1-pass.

Comparison with Baseline Segmentation Model

A U-net baseline model was trained by optimizing only the weights of the ITN, , with the segmentation loss . The Atlas-ISTN was trained by optimizing jointly the weights of the ITN, , as well as the STN, , with all losses in Eq. (14). A comparison is made between the U-net and (1) the prediction of the ITN (with identical architecture as the U-net) trained within the Atlas-ISTN framework (‘ITN’), (2) the warped atlas labelmap predicted from the first pass of the Atlas-ISTN (‘1-pass’), and (3) the warped atlas labelmap after test time refinement of the STN weights (‘Refine’).

Figure 9 shows the initial and final atlas image and multi-channel labelmap resulting from training Atlas-ISTN. The SoI in the final atlas image and labelmap are noticeably sharper, while the background structures in the atlas image remain fairly homogeneous. This is to be expected given that we are optimizing for the segmentation and alignment of the structures depicted in the labelmap. The final atlas image and labelmap are also shown in 3D in Fig. 2.

No augmentation
Label Metric U-net U-net ITN 1-pass Refine
DSC 0.883 0.883 0.893 0.803 0.894
ASD 0.202 0.190 0.169 0.401 0.165
HD 16.401 6.260 6.842 7.775 5.255
DSC 0.936 0.936 0.941 0.896 0.943
ASD 0.123 0.123 0.113 0.235 0.109
HD 7.157 7.132 7.124 8.619 6.938
DSC 0.894 0.895 0.900 0.846 0.898
ASD 0.344 0.287 0.309 0.483 0.284
HD 29.082 10.773 38.093 12.224 10.683
DSC 0.862 0.862 0.857 0.825 0.860
ASD 0.363 0.362 0.511 0.521 0.383
HD 17.263 13.459 35.670 13.736 13.082
DSC 0.886 0.886 0.899 0.846 0.900
ASD 0.344 0.338 0.304 0.494 0.286
HD 15.149 12.647 23.031 13.396 11.725
With augmentation
Label Metric U-net U-net ITN 1-pass Refine
DSC 0.911 0.911 0.909 0.896 0.911
ASD 0.137 0.136 0.138 0.169 0.136
HD 6.150 4.862 4.785 5.313 4.544
DSC 0.950 0.950 0.948 0.942 0.950
ASD 0.091 0.091 0.091 0.108 0.089
HD 6.283 6.283 5.981 6.527 5.973
DSC 0.903 0.903 0.906 0.897 0.906
ASD 0.267 0.267 0.263 0.270 0.258
HD 13.034 10.879 11.793 10.269 10.647
DSC 0.883 0.883 0.884 0.873 0.883
ASD 0.292 0.291 0.288 0.313 0.288
HD 14.593 12.243 12.862 12.468 12.187
DSC 0.911 0.911 0.917 0.892 0.913
ASD 0.236 0.236 0.230 0.297 0.238
HD 11.182 11.182 12.032 11.740 11.037
Table 1: Comparison with U-net baseline with and without spatial augmentation on the high quality 19 case LSA test set. Arrows indicate direction of metric improvement. Bold numbers are the best and second best, with the best also underlined, for a given metric and augmentation setting. Note that U-net and U-net models with augmentation are identical for LVM and LA labels. Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.

Models were initially trained with no data augmentation, and a significant drop in performance was observed from the ITN to 1-pass DSC (Table 1, left). This highlighted the importance of incorporating spatial augmentations when training the Atlas-ISTN with a limited dataset. The ITN learns both global and local image features, and can primarily rely on local image features to make accurate voxel-wise predictions. The STN performance however depends more heavily on learning global image features, given that the predicted displacement field must operate across the entire image to transform SoI to significantly different orientations, scales and morphological configurations. The bottom two rows of Figure 10 show the outputs of Atlas-ISTN on particularly challenging cases, where significant global and local deformations are required to register the undeformed atlas to the target SoI. The ITN and 1-pass results are inadequate, while the refinement suitably fits the atlas labelmap to the target SoI.

Given the limited set of training data, the U-net and the Atlas-ISTN models were also trained with on-the-fly spatial augmentations, which included translation (range: -8 to +8 voxels), rotation (range: -15 to 15 degrees in , , ) and scaling (range: 0.9 to 1.1 image resolution).

False positives, or spurious segmentations, are commonly observed in the predictions of voxel-wise segmentation models (Kamnitsas et al., 2017; Oktay et al., 2018; Larrazabal et al., 2020). A simple and commonly used post-processing step of retaining only the largest connected component of the U-net prediction (‘U-net’) was used as an additional comparison.

Table 1 shows the results of the U-net and Atlas-ISTN using the high quality 19 case LSA test set, for which labels of all chambers were available. For models trained with no data augmentation, we observe slight improvements of the ITN over the U-net for LVM, LV and RV DSC (, and , respectively), but not for other structures. The 1-pass performance of Atlas-ISTN falls short of the ITN across all metrics for the aforementioned reasons. Refinement produces the best results across almost all metrics, and although DSC improves by a moderate over the ITN, performance on ASD and HD metrics improves significantly. U-net also improves over the U-net in terms of HD and to a lesser extent ASD, but is almost always out-performed by refinement. Fig. 11 shows examples where U-net is unable to correct certain false positive and false negative predictions, while refinement of Atlas-ISTN does.

When models are trained with on-the-fly data augmentation, there is a significant improvement across all metrics compared to models trained without augmentation. Most marked is the improvement in 1-pass performance, where for example the 1-pass DSC improves across all channels in absolute terms by between , while the refinement DSC improves by , leaving a smaller gap between 1-pass and refine metrics. With data augmentation, the U-net, ITN and refine metrics all improve slightly and become more similar to each other. Given that the images in this test set were selected for their high quality, all models produce accurate results and it is perhaps unsurprising that we see only modest improvement with Atlas-ISTN.

To assess performance on a more diverse dataset originating from a wide range of scanners and sites, the same models are run on 1000 test cases each containing a 3D annotation of just the LVM. The results are summarized in Table 2. All metrics are noticeably worse compared to the LVM metrics on the 19 high quality test cases in Table 1 as a reflection of the more diverse and challenging images in the 1000 case test set. For the models trained without data augmentation, the ITN shows an improvement over the U-net across all metrics, and refinement further improves on all metrics. Data augmentation during training improves all metrics significantly, most noticeably for the result of 1-pass DSC with an increase in absolute terms of compared to an increase of for the refinement DSC.

No augmentation With augmentation
Atlas-ISTN Atlas-ISTN
Metric U-net U-net ITN 1-pass Refine U-net U-net ITN 1-pass Refine
DSC 0.840 0.850 0.863 0.683 0.869 0.884 0.885 0.883 0.850 0.888
ASD 0.973 0.417 0.367 1.207 0.256 0.301 0.224 0.342 0.311 0.212
HD 38.046 10.566 22.948 11.763 6.120 9.854 6.440 13.046 6.579 5.644
Table 2: Comparison with U-net baseline with and without spatial augmentation on the 1000 case LVM test set. Arrows indicate direction of metric improvement. Bold numbers are the best and second best, with the best also underlined, for a given metric and given augmentation setting. Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.
Figure 11: Axial (a) and sagittal (b) planes showing examples where refinement corrects for (a) holes (false negatives) and (b) spurious segmentations (false positives) in the LVM channel predicted by the ITN. Red contour: LVM label of deformed atlas labelmap after refinement. Yellow contour: ITN prediction of the LVM channel. Heatmap: ITN logits of LVM channel. In both (a) and (b), the ITN prediction contour consists of only a single connected component of the LVM.

We observe that U-net improves over the U-net performance both with and without augmentation, and also out-performs the ITN model trained with spatial augmentations. Refinement still produces the best results across both augmentation settings, despite a lower ITN performance compared to the U-net and U-net with augmentation. Additionally, the result of refinement guarantees topology and encourages smoothness of the target structures while U-net does not (see examples in Fig. 11). Statistically significant improvement across all metrics is achieved with refinement compared to all other models.

Figure 12: Scatter plots showing HD (left), ASD (middle) and DSC (right) results of Atlas-ISTN on the 1000 case test set, comparing the LVM label from the ITN (axis) versus refinement (-axis). The green/red gradients indicate increase/decrease in performance with refinement. HD and ASD of the ITN predictions almost always improve with refinement. Degradation observed for some cases in terms of DSC is always small, whereas improvements in DSC can be significant.

The improvements achieved with refinement over the ITN prediction are further illustrated in Fig. 12

. Generally we see that there is seldom any degradation in the metrics, while outliers are generally corrected for, particularly for the metrics HD and ASD.

Comparison with Baseline Deformation Models

Most related approaches in the literature propose the use of a single pass of an STN at test time to make a registration prediction, which can be used to propagate a segmentation (Balakrishnan et al., 2018; Dalca et al., 2019a; Dong et al., 2020). These methods also use images as inputs to the STN, while Atlas-ISTN enforces an explicit intermediate representation (semantic segmentation) as input to the STN. We make comparisons to variants of the Atlas-ISTN framework, including:

VML: A VoxelMorph-like model - training only an STN (without an ITN), where (i) an intensity image and (ii) the atlas image are passed directly to the STN.

Atlas-ISTN: Atlas-ISTN trained without the segmentation loss , thus removing the constraint on the ITN to produce a semantic segmentation as input to the STN.

While Atlas-ISTN has the same architecture as Atlas-ISTN, VML has slightly fewer than half of the parameters (i.e. just the STN) and takes two single channel images as input. The inputs to the VML model are a target image, , (see top left panel in Fig. 2 for an example) and the atlas image, (see top right panel in Fig. 2 for an example). The atlas labelmap and image are updated as usual (Section 2.3). VML is a VoxelMorph-like model similar to (Dalca et al., 2019a) as it registers a pair of intensity images, where one is an atlas image. In this work the atlas image is constructed on-the-fly during training via registration of training images, while Dalca et al. (2019a) explicitly parameterize the atlas image with learnable weights. Also unlike (Dalca et al., 2019a), no image-based loss terms are used and ground-truth labelmaps are provided with of the training data for the computation of the loss (referred to as an ‘auxilliary’ loss in (Balakrishnan et al., 2018)). Additionally, both an atlas labelmap and image are used during training, where the image is used as an input to the model and the labelmap is used in loss terms Eqs. (11), (12). Atlas-ISTN train and test hyper-parameters were kept the same for VML.

Atlas-ISTN also uses the same hyper-parameters, although batch-normalization after each convolutional layer in the ITN was required to prevent vanishing gradients during training (given the deeper architecture without ). Inputs to the STN are the same as for Atlas-ISTN, namely the atlas labelmap and the intermediate representation produced by the ITN. In this case however, the intermediate representation is not enforced to be a semantic segmentation of the SoI.

An MSE loss between source and target image intensities has been used in previous applications using brain MRI data with VoxelMorph (Balakrishnan et al., 2018) and conditional atlases (Dalca et al., 2019a). MSE losses between image pairs were experimented with in addition to the loss terms for the labelmaps (Eqs. (11) and (12)), but were found to significantly degrade performance of the models due to the high level of variability in image content, field-of-view, and voxel intensities of structures in the CCTA images, thus were not used in the reported experiments. The final constructed atlas image and labelmap for both of these models were similar in appearance to those produced by Atlas-ISTN.

Table 3 presents the results of the 1-pass performance of the models trained with data augmentation, with metrics computed for the LVM label using the 1000 case test set. Note, the Atlas-ISTN 1-pass results reported in Table 3 are repeated from Table 2 for convenience. Atlas-ISTN outperforms both Atlas-ISTN and VML across all metrics with gaps of about and in DSC, respectively. Furthermore, Atlas-ISTN and VML could not benefit from test time refinement of the STN weights by warping the atlas labelmap to the ITN logits as done by Atlas-ISTN, and we found that performing refinement by registering the atlas image to a target intensity image at test time with an MSE loss on image intensities worsened performance with a drop of about in DSC, for the same reasons mentioned earlier. We save the exploration of other image intensity-based losses for future work.

Metric VML Atlas-ISTN Atlas-ISTN
1-pass 1-pass 1-pass
DSC 0.822 0.839 0.850
ASD 0.413 0.368 0.311
HD 7.471 7.302 6.579
Table 3: Comparison of 1-pass performance of models trained with data augmentation on the 1000 case LVM test set. Bold numbers are the best and second best, with the best also underlined, for a given metric and given augmentation setting. Note further improvement for Atlas-ISTN is achieved with refinement (Table 2). Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.

Comparison of Framework Variants

The following variants of Atlas-ISTN were compared:

Independent: Independently trained ITN + refinement of randomly initialized STN weights

Fixed: Atlas-ISTN trained with a fixed atlas

SVF: Atlas-ISTN with SVF-only in the STN

Independent Fixed SVF Proposed
Metric ITN Identity Refine Refine 1-pass Refine 1-pass Refine 1-pass Refine
DSC 0.884 0.204 0.770 0.820 0.848 0.879 0.854 0.883 0.842 0.886
ASD 0.301 9.775 1.483 1.035 0.313 0.232 0.298 0.218 0.328 0.213
HD 9.854 32.552 11.312 9.405 6.980 6.196 6.549 5.539 6.567 5.506
Table 4: Comparison of 1-pass and refinement results of Atlas-ISTN variants with a U-net baseline on the 1000 case LVM test set, trained with data augmentation. All models use the independently trained U-net as the ITN at test time, where the STN is optimized for each case in refinement. Refine represents a model where 200 iterations were used in refinement with a randomly initialized STN. Bold numbers are the best and second best, with the best also underlined, for a given metric. Arrows indicate direction of metric improvement. Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.

Firstly, ‘Independent’ uses an independently trained ITN, followed by test time refinement using an STN with randomly initialized weights and a randomly selected training case for the atlas labelmap666The randomly selected case is shown on the left in Fig. 2.. This approach may be useful in a setting where a pre-trained U-net is available, and registration of a training case labelmap to a U-net prediction can be performed by optimizing the STN weights. The ‘U-net’ trained with spatial augmentations (Table 2) is used for this experiment. The ‘Fixed’ model is Atlas-ISTN trained and tested with an atlas labelmap that is selected from the training data (the same one as for ‘Independent’), thus by-passing the atlas construction step during training. ‘SVF’ is Atlas-ISTN trained and tested with only the SVF (i.e. no affine component) predicted by the STN.

These variants allow us to investigate (i) the impact of training the STN before test time refinement, (ii) the impact of learning an unbiased atlas during training, and (iii) the impact of including an affine transformation model in the STN. It was observed that for the U-net and Atlas-ISTN model variants trained with spatial augmentations, ITN performance was generally very similar, and the ITN of a given model could be paired with the STN and atlas labelmap of a different model at test time for refinement without significant differences in 1-pass or refinement performance. In light of this, we substitute the U-net trained independently (with augmentations) for the ITN in all model variants to make a head-to-head comparison of the 1-pass and refinement results, once again using iterations for refinement.

Table 4 shows the results of the baseline U-net as well as the 1-pass and refinement results of the ‘Independent’, ‘Fixed’, and ‘SVF’ models as well as the Atlas-ISTN. A first pass of the STN for the ‘Independent’ model is meaningless since the STN weights are randomly initialized777A single pass results in a near identity transform., so the column ‘Identity’ shows the metrics between the undeformed fixed atlas labelmap and the test data. Refinement of the STN weights from scratch for ‘Independent’ performs significantly worse than all other models. Standard refinement (with ) and even refinement with double the iterations () resulted in considerably worse results compared to the other models likely due to falling into bad local minima (e.g. registering to large spurious segmentations near the initial fixed atlas position), or not reaching convergence. The best 1-pass results are obtained with the ‘SVF’ model, followed by ‘Fixed’ and Atlas-ISTN. Refinement with the proposed Atlas-ISTN out-performs all other models, although is closely followed by refinement with ‘SVF’, and both out-perform refinement with ‘Fixed’ across all metrics. This highlights the benefit of using an unbiased atlas labelmap as opposed to using a fixed labelmap. Refinement with Atlas-ISTN, ‘Fixed’, and ‘SVF’ models all improve over the U-net (ITN) in terms of ASD and HD, while Atlas-ISTN also marginally improves DSC.

We observe that the 1-pass of Atlas-ISTN degrades slightly (by 0.8% DSC) in this experiment compared to using a jointly trained ITN (Table 2), although the refinement result overall is quite similar. In practice, the outputs of refinement would be used and not the 1-pass as the final segmentation.

Upper bound LVM model

To estimate an upper bound on the performance of Atlas-ISTN for the LVM label, an Atlas-ISTN model is trained using 2000 additional training cases which contain only the LVM label. Only the LVM label is used during training, and as a result the constructed atlas does not contain any other foreground labels. The original 80 LSA cases are included in the training process, and the atlas is still constructed from these original 80 cases at the end of each epoch for faster convergence of the atlas construction during training. At each epoch, the 80 LSA cases are passed to the network with on-the-fly spatial augmentation, while another 80 cases are sampled randomly without replacement from the 2000 case dataset without augmentation. The number of epochs was halved so that the models were trained with the same number of iterations as previous models. Hyper-parameters were kept the same as for previously trained models with augmentation, with epoch-dependent parameters adjusted appropriately. U-net and VML models were also trained in this way.

Metric U-net VML ITN 1-pass Refine
DSC 0.911 0.890 0.918 0.902 0.911
ASD 0.160 0.217 0.150 0.186 0.158
HD 5.464 6.143 5.380 5.836 5.093
Table 5: Results on the 1000 case LVM test set of U-net, VML, and Atlas-ISTN models trained with an additional 2000 cases with LVM label only. Bold numbers are the best and second best, with the best also underlined, for a given metric. Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.

Results on the 1000 case test for these LVM-only models trained with an additional 2000 cases are presented in Table 5. Interestingly, the ITN out-performs the U-net across all metrics, with a DSC increase. The ITN also performs better than the 1-pass and refinement for DSC and ASD. Refinement improves over the ITN, 1-pass and U-net in terms of HD, and performs similarly to the U-net for DSC and ASD. It should be noted that although the ITN out-performs refinement on DSC and ASD, differences between segmentations with DSC become practically insignificant. Inter-observer DSC for LVM segmentation from cardiac short-axis cine MRI for example is 0.88 (Bai et al., 2018), which one might expect to be slightly higher for segmentation from 3D CCTA images. Also, despite being trained using a significantly larger dataset, the ITN is still prone to predictions with holes, particularly for cases with thin LVM walls, which can be rectified by refinement (Fig. 13).

Figure 13: Axial (a) and sagittal (b) planes showing examples where refinement corrects for holes (false negatives) in the LVM channel predicted by the ITN, for the Atlas-ISTN trained with an additional 2000 cases. Red contour: LVM label of deformed atlas labelmap after refinement. Yellow contour: ITN prediction of the LVM channel. Heatmap: ITN logits of LVM channel. In both (a) and (b), the ITN prediction contour consists of only a single connected component of the LVM.

Compared to the Atlas-ISTN trained with spatial augmentation on the 80 cases with all structures (Table 2), an improvement of about in LVM DSC is observed. The improvement in 1-pass performance is about between the two models. The gap in performance between the 1-pass result and refinement is just under DSC for the Atlas-ISTN with the extended training set, compared to a gap of for the model trained on 80 cases, demonstrating the effect that a significantly larger training set can have on closing this gap. Finally, while the VML model also significantly improves compared to using limited training samples (Table 3), its performance still falls short of the 1-pass of Atlas-ISTN as before.

Inter-Subject Correspondence

In addition to segmentation, Atlas-ISTN provides correspondence of the SoI across subjects, which can be used for example to propagate the location of anatomical landmarks not originally in the training data, or to assess inter-subject variability. Registrations from subject to atlas and atlas to subject are jointly optimized via a symmetric loss, exploiting the invertibility of the chosen SVF and affine model. The use of this transformation model in conjunction with a mapping to atlas space provides an inherently inverse consistent registration between any two subjects, as described in (Joshi et al., 2004). Specifically, if we consider the registered atlas labelmap (after refinement) as the most accurate estimate of the SoI for each subject at test-time, as demonstrated in the above results, the composition of the transformations to and from atlas space for two subjects provides an inherently inverse consistent mapping of the SoI between these two subjects. Extending Eq. (20), the mapping of subject to subject via atlas space is given by:


and the mapping of subject to subject by:


where is the atlas labelmap, is the deformed atlas labelmap (representing the SoI) for subject , and is the transformation from subject to atlas space.

Theoretically, the properties of the diffeomorphic transformation model ensure that these mappings are inverse consistent, though error can arise from numerical precision, the integration of velocity fields, and discrete grid interpolation. Given that the SoI under consideration for a given subject is the deformed atlas labelmap, it follows that if the composed transformation from atlas to subject and back introduces minimal error, then the composition of transformations from one subject to another via atlas space (Eqs. (

22) and (23)) will similarly have minimal error.

Two metrics are proposed to estimate the error associated with composing transformations to and from atlas space, including masked inverse consistency error (MICE) and inverse consistency DSC (IC-DSC). An atlas labelmap deformed by both inverse and forward transformations for a given subject is first defined, . A grid, , of size , with voxel values corresponding to (, , ) voxel indices is also defined along with a twice deformed grid, . MICE is the mean absolute displacement error in terms of voxels computed over the voxels masked by the SoI of the undeformed atlas labelmap (), i.e. the mean of within the voxels of the atlas SoI888The SoI voxels are computed by taking the argmax of the atlas labelmap and using the foreground label mask, i.e. voxels .. IC-DSC is computed between the atlas labelmap () and the twice deformed atlas labelmap (). Table 6 shows that MICE is approximately voxels, and IC-DSC is extremely close to 1. This indicates that the predicted SoI resulting from both 1-pass or refinement of Atlas-ISTN for one subject can be mapped to atlas space and subsequently to another subject with minimal error.

Metric 1-pass Refine
IC-DSC 0.997 0.996
MICE 0.0440 0.0574
Table 6: Invertibility results on the 19 case test set. IC-DSC: inverse consistency Dice similarity coefficient, averaged over all labels, MICE: masked inverse consistency error (in terms of voxels).

3.3 3D Brain MRI

While cardiac CT is our main focus in this work, we also conducted experiments for the task of brain structure segmentation in T1-weighted 3D MRI scans. Brain MRI has been the predominant type of data in related work on learning-based image registration (Balakrishnan et al., 2018; Dalca et al., 2019a; Zhao et al., 2019; Hoffmann et al., 2020), including our own work on structure-guided image registration (Lee et al., 2019). Here, we use brain MRI to focus on the specific aspect of generalization across images from different sites and scanners. We train the Atlas-ISTN on data from one site, and compare the segmentation results with and without test time refinement when testing on data from several other sites.

Data Description

We utilize brain MRI data from three publicly available imaging studies. We use data from the UK Biobank imaging study (UKBB)999UK Biobank Resource under Application Number 12579 (Sudlow et al., 2015; Miller et al., 2016; Alfaro-Almagro et al., 2018), the Cambridge Centre for Ageing and Neuroscience study (Cam-CAN) (Shafto et al., 2014; Taylor et al., 2017), and the IXI dataset101010 Both UKBB and Cam-CAN use a similar imaging protocol with Siemens 3T scanners. IXI contains subsets from three different clinical sites, namely Guy’s Hospital (IXI-Guys) using a Philips 1.5T system, Hammersmith Hospital (IXI-HH) using a Philips 3T scanner, and Institute of Psychiatry (IXI-IoP) using a GE 1.5T system. While the UKBB data is provided with pre-processed images and segmentations, we apply the following pipeline to the Cam-CAN and IXI data in order to match these as closely as possible to UKBB: 1) Skull stripping with ROBEX v1.2111111 (Iglesias et al., 2011); 2) Bias field correction with N4ITK121212 (Tustison et al., 2010); 3) Sub-cortical brain structure segmentation using FSL FIRST 131313 (Patenaude et al., 2011)

. A very similar pipeline had been employed for UKBB with the same automatic segmentation for extracting brain structures. We resample all brain scans to an isotropic 2mm voxel size, and normalize the intensities within the brain masks to zero mean unit variance, where voxels outside the mask are set zero.

We merge the 15 individual brain structures from the FIRST algorithm into a single binary label map, similar to (Lee et al., 2019). In line with related work on learning-based registration for brain images, we pre-align all scans rigidly to MNI space using drop2141414, and hence the task of the Atlas-ISTN here is to recover the non-rigid deformation between the images and the to-be-learned brain atlas. We use 100 scans from UKBB for training, 20 for validation, and 200 scans each from UKBB and Cam-CAN and all 581 scans from IXI (with Guys n=322, HH n=185, IoP n=74) for testing the segmentation performance of Atlas-ISTN and baselines.

Model Settings

Here, an SVF-only transformation model is employed as all scans are rigidly pre-aligned to MNI. The Atlas-ISTN model for brain MRI is based on the same ITN and STN architectures as in the cardiac case with the last scale removed due to the smaller size of the input images. The model is trained for 800 epochs and an exponential learning rate decay with a half-life of 400 epochs. As there is less variation between scans, we found fewer epochs are necessary compared to the cardiac data. Weighting variables for the training loss in Eq. (14) included as for the cardiac experiments, with changes to and . was set empirically for brain data. While in cardiac experiments a fixed value of was used, for the brain experiments a ‘fade-in’ function was used to initially favour the segmentation loss , with weighting on the deformation-related loss terms coming into full effect after about 200 epochs. This was found to slightly improve (observed on the UKBB validation data) the performance of the ITN and in turn refinement results for the brain experiments.


where is the epoch index. This approach brought the performance of the ITN closer to that of the U-net, as it possibly reduced the effects of competing gradients from the deformation and segmentation loss terms early in training (competing gradients in multi-task models are studied in (Yu et al., 2020)). The atlas update rate was also the same as for cardiac, . Parameter values for the refinement loss in Eq.(19) were , , , and the number of refinement iterations .


We evaluate the segmentation accuracy again in terms of DSC, ASD and HD and compare with two baselines, a U-net and a VoxelMorph-like approach (VML) described earlier, i.e. an Atlas-ISTN without the ITN, where an intensity image and the atlas image are passed directly to the STN, by-passing an intermediate representation altogether, and at test time predicting the final transformation in a single forward pass.

Experimental Results

The quantitative results are presented in Table 7 with a sensitivity analysis regarding the regularization weight in Fig. 15 with visual, qualitative results shown in Fig. 14. Overall, the U-net baseline performed quite well, out-performing the VML baseline on all datasets and performing similarly to the ITN. Atlas-ISTN performed the best on almost all test five datasets across all metrics, with the exception of HD on the Cam-CAN dataset. Test time refinement almost always improved over the ITN and 1-pass performance across all metrics as well. In Fig. 14, the U-net and ITN predictions often include false positives which connect neighbouring structures, most noticeably in rows 2 (Cam-CAN), 3 (IXI-Guys) and 5 (IXI-IoP). These errors are generally corrected with the VML, 1-pass and refinement results.

The VML model under-performed compared to the U-net and Atlas-ISTN models generally on all datasets across all metrics. The VML model did not generalize as well to other datasets, with the performance gap between VML and the other models increasing for datasets less similar to the training dataset. For the least similar dataset compared to the training data, IXI-IoP, the VML model DSC was 0.813 compared to 0.846, 0.839 and 0.862 for the Atlas-ISTN’s ITN, 1-pass and refinement results, respectively. The Atlas-ISTN 1-pass results out-performed the VML model across almost all metrics for all datasets, which could be attributed to the intermediate representation provided to the STN. Test time refinement of Atlas-ISTN also produced greater improvements over the ITN and 1-pass results for less similar datasets. On the UKBB dataset, the increase in DSC compared to the ITN and 1-pass were just and , respectively, while for IXI-IoP it was and , respectively.

The sensitivity analysis in Fig. 15 shows robustness to the choice of the regularization weight. For DSC and ASD, Atlas-ISTN with test time refinement achieves the best performance for the entire range of values, while for HD it performs similarly to VML and Atlas-ISTN 1-pass. This also highlights that the improvement for Atlas-ISTN with test time refinement is obtained consistently and independent of the specific strength of regularization.

Metric U-net VML Id ITN 1-pass Refine
UKBB (n=200)
DSC 0.900 0.876 0.773 0.898 0.889 0.902
ASD 0.208 0.263 0.553 0.213 0.232 0.202
HD 5.868 5.851 7.399 5.976 5.743 5.651
Cam-CAN (n=200)
DSC 0.869 0.863 0.765 0.866 0.873 0.877
ASD 0.369 0.315 0.597 0.388 0.291 0.301
HD 9.131 5.862 7.266 8.715 6.311 6.926
IXI all (n=581)
DSC 0.880 0.851 0.741 0.880 0.874 0.890
ASD 0.261 0.343 0.683 0.264 0.279 0.235
HD 7.006 6.104 7.610 6.780 6.075 6.022
IXI-Guys (n=322)
DSC 0.897 0.874 0.769 0.898 0.890 0.906
ASD 0.217 0.278 0.574 0.215 0.233 0.197
HD 5.640 5.588 6.976 5.779 5.458 5.367
IXI-HH (n=185)
DSC 0.866 0.827 0.707 0.861 0.859 0.875
ASD 0.296 0.403 0.809 0.309 0.310 0.269
HD 7.945 6.418 8.206 7.896 6.530 6.656
IXI-IoP (n=74)
DSC 0.844 0.813 0.708 0.846 0.839 0.862
ASD 0.368 0.480 0.845 0.367 0.401 0.316
HD 10.599 7.568 8.882 8.343 7.618 7.290
Table 7: Quantitative results for the brain MRI experiments where segmentation methods are trained on 100 cases from UKBB and tested on different datasets from three imaging studies, UKBB, Cam-CAN and IXI. The column ‘Id’ refer to the initial alignment (with identity transformation) of the constructed atlas to put the resulting DSC, ASD and HD into context. Bold numbers are the best and second best per row, with the best also being underlined. Statistically significant () improvement of the best or second best model over a given model is indicated by superscripts and , respectively.
Figure 14: Qualitative results for the brain structure segmentation for fives cases from the five different datasets (one case per row) comparing U-net, VML (i.e. a VoxelMorph-like STN-only baseline), initial atlas alignment ‘Id’, and the three outputs ITN, ‘1-pass’ and ‘Refine’ of our Atlas-ISTN approach. The reference segmentation is displayed with green contours and the predicted segmentation boundaries with red contours. Note how the voxel-wise segmentation methods, U-net and ITN, tend to merge neighboring structures, while the VML baseline produces less accurate atlas alignment than ‘1-pass’ or ‘Refine’. The test time refinement of the Atlas-ISTN seems to produce the visually best results, followed by the ‘1-pass’ prediction which acts as an initialization for the refinement.
Figure 15: Sensitivity analysis for the effect of the regularization weight for the application of 3D brain segmentation. The Atlas-ISTN with test time refinement achieves overall best performance for the metrics DSC (top) and ASD (middle) across the range of regularization weights, and performs similar on HD (bottom) compared to the VML baseline and 1-pass prediction. The Atlas-ISTN outperforms the U-net baseline on all three metrics. The results shown here are the averages over the UKBB, Cam-CAN and IXI test datasets. U-net results are independent of and thus constant.

4 Discussion

Results from experiments with synthetic 2D data, real 3D CCTA and T1-weighted brain MRI scans illustrate the benefits of the Atlas-ISTN framework.

Firstly, experiments with synthetic 2D data demonstrated that improvements in terms of DSC, ASD and HD were achieved with test-time refinement on corrupted, out-of-distribution test images compared to both the ITN and 1-pass of Atlas-ISTN, and that this improvement was insensitive to the choice of the weighting () of the regularization term (Fig. 8). The parameter could also be adjusted for test-time refinement to adjust the rigidity of the deformation used to obtain a final registration of the atlas. Since only the non-rigid component is penalized by the regularization, higher values of result in deformations that rely increasingly on the affine parameters. This flexibility allows for scenarios where one might want to restrict the non-rigid deformation from adhering too closely to potentially noisy predictions of the ITN, and retain more of the underlying shape of the atlas. On clean test data, the Atlas-ISTN performs en-par with a baseline U-net but with the added benefit of yielding atlas correspondences.

For the application to real 3D data, values of were selected empirically. Generally, values of that were too high resulted in worse performance with test time refinement, and values that were too low could produce undesirably sharp gradients in the predicted non-rigid deformations and less robustness to spurious segmentations from the ITN. Adjustment of between training and test time was not extensively explored on the real cardiac data, although the experiments with synthetic data demonstrate the potential benefits.

The experiments with both CCTA and T1-weighted brain MRI data demonstrate the improved performance of Atlas-ISTN over segmentation-only and registration-only baseline models. The 1-pass of Atlas-ISTN out-performed the VML model in both applications, where a larger gap in performance was observed for data further from the training distribution (Table 7). The 1-pass of Atlas-ISTN also out-performed the 1-pass of an Atlas-ISTN model without an imposed intermediate representation (Table 3). This indicates that the use of semantic segmentations as intermediate representations in Atlas-ISTN are advantageous for the 1-pass registration, providing robustness to variability and noise in the input images. This reinforces the findings of (Lee et al., 2019) which proposed the use of intermediate representations in an ISTN for pairwise registration.

The experiments with synthetic and real data also consistently demonstrate that test time refinement improves performance over a 1-pass of Atlas-ISTN. This improvement was also shown to be larger for data further from the training distribution (Table 7). Test time refinement of Atlas-ISTN also consistently out-performed the ITN and baseline U-net in terms of DSC, ASD and HD, and demonstrated the ability to circumvent spurious segmentations and false negatives in the voxel-wise ITN prediction (as shown in Figs. 10 and 11). Particularly for data further from the training distribution, test-time refinement produced the greatest improvements over the U-net and ITN, as well as the 1-pass of Atlas-ISTN (Tables 2 and 7).

Obtaining and annotating large sets of 3D medical image data is a common challenge. Most of our experiments with real 3D data involved training with datasets of fewer than 100 samples. In experiments where models were trained on a significantly larger dataset of CCTA cases with LVM labels only, the performance gap between 1-pass and test time refinement of Atlas-ISTN narrowed significantly (Table 5), with ITN and test time refinement of Atlas-ISTN still out-performing the 1-pass result and the VML model. This suggests that learned (1-pass) registration models generally require more training data to reach the performance of the ITN and subsequent test time refinement of Atlas-ISTN, particularly for datasets which may have significant inherent spatial variability like CCTA data. This demonstrates the advantage of using test time refinement with Atlas-ISTN for models trained with a limited size annotated dataset. Furthermore, when auxiliary information in the form of labelmaps is available, it can be used not only in the loss but also to generate intermediate representations.

While hyper-parameters were tuned, an exhaustive search of architectures and parameters was not undertaken. Between the 3D cardiac and brain experiments, minimal changes were made to hyper-parameters, with modifications to and providing some slight performance gains (Eq. (14)). A U-net was chosen as a baseline and as the ITN component, but any suitable segmentation model (or image-to-image architecture) can be used as the ITN. Improvements in the ITN performance are also likely to result in improvements in performance of test time refinement, as shown throughout the experiments.

We do not assess the sensitivity of the constructed atlas to hyper-parameters. Additionally, the atlas construction process does not explicitly guarantee a consistent topology for the atlas labelmap, i.e. an atlas which conforms to the desired target topology, with structures that are smooth, contiguous, non-overlapping, and each containing only a single connected component. However, empirically these properties were observed for all explored settings, both for single label structures with 2D synthetic and 3D MRI data, and in the setting of multiple cardiac structures with 3D CCTA data. The general technique of averaging the labelmaps of multiple structures from a set of co-registered images has after all been used in the past for cardiac (Bai et al., 2015) and brain (Joshi et al., 2004; Cabezas et al., 2011) atlases. For more complex structures than those explored in this work, a more sophisticated fusion step during the training process or after training may help ensure topological consistency of the final atlas labelmap.

While we propose a method to generate an unbiased atlas by incorporating some ideas from classic atlas construction methods (Joshi et al., 2004) into a deep learning framework, we do not explore the use of constraints that can be imposed to ensure a mean shape (Joshi et al., 2004; Dalca et al., 2019a; Bône et al., 2020), or preserve high resolution detail in the atlas image (Guimond et al., 2000). Indeed, we do not extensively explore the use of image loss terms which are typically part of atlas construction frameworks, particularly for atlas images. An MSE loss was used in training Atlas-ISTN with the cardiac CCTA image data, which was found to be poorly suited to the modality and reduced segmentation performance (for reasons discussed in Sec. 3.2). In this work, improving segmentation accuracy over baseline methods was facilitated by the construction of the atlas labelmap. Additionally, the use of the constructed atlas labelmap in test time refinement was demonstrated to improve performance compared to the use of a fixed atlas labelmap (Table 4). We have not explored the option of constructing an atlas using traditional methods for subsequent use in a CNN framework as in (Dong et al., 2020), though the proposed framework arguably reduces the need for this additional step. We leave further exploration of atlas construction within the Atlas-ISTN framework to future work.

Further exploration of image similarity loss terms could open opportunities not just for atlas construction but for semi-supervised learning with CCTA data, as done previously for brain MRI (Balakrishnan et al., 2018; Dalca et al., 2019a, b; Xu and Niethammer, 2019), chest X-ray (Mansilla et al., 2020) and knee MRI (Xu and Niethammer, 2019). Cardiac CCTA presents several unique challenges, including large variability in the extent of visible anatomy, variability in shape of the field-of-view, tissue-level intensity variations due to differences in contrast timing, artifacts due to implants and differences between scanners and acquisition protocols. Accounting for these factors would be important when introducing image similarity losses into the network. As demonstrated with Atlas-ISTN, an intermediate representation of SoI segmentations helps to mitigate for such variations, with the proposed method out-performing the baseline U-net and registration models when assessed on 1000 test cases from a wide range of sites around the world.

Our findings on brain MRI may further be of interest to people working on learning-based image registration. The fact that the accuracy of the 1-pass predictions became significantly worse for data from a different site might suggest that the same may be true for STN-based registration approaches such as (Balakrishnan et al., 2018; Dalca et al., 2019b). In particular, this may be an issue when only limited amounts of training data are available, a point we made in our earlier work Lee et al. (2019).

The Atlas-ISTN framework could also be adapted for other potential applications. Firstly, new structures such as landmarks, or representations such as meshes, can be added directly into the atlas after training. Additional labels can also be learned by the ITN without contributing to the atlas construction (e.g. structures which may not have strong one-to-one mappings between cases, such as coronary trees). Inter-subject correspondence via atlas space also provides the opportunity for population shape and motion analysis.

5 Conclusions

Atlas-ISTN provides a framework to jointly learn image segmentation and registration, while simultaneously generating a population-derived atlas used in the model training process. Subsequent registration of the atlas labelmap via test time refinement provides a topologically consistent and accurate segmentation of the target structures. We have demonstrated quantitatively and qualitatively the improvement in segmentation performance of the proposed Atlas-ISTN model over baseline segmentation and registration models. Through ablation studies, we have also demonstrated the importance of different design choices, including the use of both affine and non-rigid components of the transformation model, the value of using intermediate representations of SoI for registration, and the advantage of using an unbiased atlas compared to a fixed atlas. Furthermore, Atlas-ISTN shows greater improvement over segmentation and registration baselines on test data further from the training distribution, particularly when trained with limited data. Atlas-ISTN may benefit segmentation applications where a known topology is expected, and where inter-subject correspondences may be of interest.


This research was funded by HeartFlow, Inc.


  • J. Adams, R. Bhalodia, and S. Elhabian (2020) Uncertain-deepssm: from images to probabilistic shape models. In International Workshop on Shape in Medical Imaging, M. Reuter, C. Wachinger, H. Lombaert, B. Paniagua, O. Goksel, and I. Rekik (Eds.), Vol. 12474 LNCS, Cham, pp. 57–72. External Links: ISBN 978-3-030-61056-2 Cited by: §1.
  • F. Alfaro-Almagro, M. Jenkinson, N. K. Bangerter, J. L.R. Andersson, L. Griffanti, G. Douaud, S. N. Sotiropoulos, S. Jbabdi, M. Hernandez-Fernandez, E. Vallee, D. Vidaurre, M. Webster, P. McCarthy, C. Rorden, A. Daducci, D. C. Alexander, H. Zhang, I. Dragonu, P. M. Matthews, K. L. Miller, and S. M. Smith (2018) Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage 166, pp. 400–424. External Links: Document, ISSN 1053-8119 Cited by: §3.3.
  • V. Arsigny, O. Commowick, X. Pennec, and N. Ayache (2006) A log-euclidean framework for statistics on diffeomorphisms.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    4190 LNCS, pp. 924–931.
    External Links: Document, ISBN 3540447075, ISSN 16113349 Cited by: §2.1, §2.1.
  • J. Ashburner (2007) A fast diffeomorphic image registration algorithm. NeuroImage 38 (1), pp. 95–113. External Links: Document, ISSN 10538119 Cited by: §2.1, §2.1, §2.1.
  • W. Bai, W. Shi, A. de Marvao, T. J.W. Dawes, D. P. O’Regan, S. A. Cook, and D. Rueckert (2015) A bi-ventricular cardiac atlas built from 1000+ high resolution MR images of healthy subjects and an analysis of shape and motion. Medical Image Analysis 26 (1), pp. 133–145. External Links: Document, ISSN 13618423 Cited by: §1, §2.1, §2.1, §4.
  • W. Bai, M. Sinclair, G. Tarroni, O. Oktay, M. Rajchl, G. Vaillant, A. M. Lee, N. Aung, E. Lukaschuk, M. M. Sanghvi, F. Zemrak, K. Fung, J. M. Paiva, V. Carapella, Y. J. Kim, H. Suzuki, B. Kainz, P. M. Matthews, S. E. Petersen, S. K. Piechnik, S. Neubauer, B. Glocker, and D. Rueckert (2018) Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance 20 (65), pp. 1–12. External Links: Document, 1710.09289, ISSN 1097-6647 Cited by: §3.2.
  • G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca (2018) VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Transactions on Medical Imaging 38 (8), pp. 1788–1800. External Links: Document, 1809.05231, ISSN 0278-0062 Cited by: §1, §1, §1, §2.1, §2, §3.2, §3.2, §3.2, §3.3, §4, §4.
  • R. Bhalodia, S. Y. Elhabian, L. Kavan, and R. T. Whitaker (2018) DeepSSM: a deep learning framework for statistical shape modeling from raw images. In Shape in Medical Imaging, M. Reuter, C. Wachinger, H. Lombaert, B. Paniagua, M. Lüthi, and B. Egger (Eds.), Cham, pp. 244–257. External Links: ISBN 978-3-030-04747-4 Cited by: §1.
  • A. Bône, P. Vernhet, O. Colliot, and S. Durrleman (2020) Learning joint shape and appearance representations with metamorphic auto-encoders. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 12261 LNCS, pp. 202–211. External Links: Document, ISBN 9783030597092, ISSN 16113349 Cited by: §4.
  • N. Byrne, J. R. Clough, G. Montana, and A. P. King (2020)

    A persistent homology-based topological loss function for multi-class CNN segmentation of cardiac MRI

    In MICCAI STACOM, pp. 1–11. External Links: 2008.09585 Cited by: §1.
  • M. Cabezas, A. Oliver, X. Lladó, J. Freixenet, and M. Bach Cuadra (2011) A review of atlas-based segmentation for magnetic resonance brain images. Computer Methods and Programs in Biomedicine 104 (3), pp. 158–177. External Links: ISSN 0169-2607, Document Cited by: §4.
  • J. J. Cerrolaza, Y. Li, C. Biffi, A. Gomez, M. Sinclair, J. Matthew, C. Knight, B. Kainz, and D. Rueckert (2018) 3D fetal skull reconstruction from 2dus via deep conditional generative networks. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, J. A. Frangi, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Cham, pp. 383–391. External Links: ISBN 978-3-030-00928-1 Cited by: §1.
  • R. Chabiniok, V. Y. Wang, M. Hadjicharalambous, L. Asner, J. Lee, M. Sermesant, E. Kuhl, A. A. Young, P. Moireau, M. P. Nash, D. Chapelle, and D. A. Nordsletten (2016) Multiphysics and multiscale modelling, data–model fusion and integration of organ physiology in the clinic: Ventricular cardiac mechanics. Interface Focus 6 (2). External Links: Document, ISSN 20428901 Cited by: §3.2.
  • C. Chen, C. Biffi, G. Tarroni, S. Petersen, W. Bai, and D. Rueckert (2019) Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11765 LNCS, pp. 523–531. External Links: Document, 1907.09983, ISBN 9783030322441, ISSN 16113349 Cited by: §1.
  • Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells (Eds.), Cham, pp. 424–432. External Links: ISBN 978-3-319-46723-8 Cited by: §2.2.
  • J. R. Clough, I. Oksuz, N. Byrne, J. A. Schnabel, and A. P. King (2019) Explicit Topological Priors for Deep-Learning Based Image Segmentation Using Persistent Homology. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11492 LNCS (40119), pp. 16–28. External Links: Document, arXiv:1901.10244v1, ISSN 16113349 Cited by: §1, §1.
  • A. Dalca, M. Rakic, J. Guttag, and M. Sabuncu (2019a) Learning conditional deformable templates with convolutional networks. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 806–818. Cited by: §1, §1, §2.3, §2, §3.2, §3.2, §3.2, §3.3, §4, §4.
  • A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu (2019b) Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical Image Analysis 57, pp. 226 – 236. External Links: ISSN 1361-8415, Document Cited by: §4, §4.
  • A. V. Dalca, J. V. Guttag, and M. R. Sabuncu (2018) Anatomical priors in convolutional networks for unsupervised biomedical segmentation. In

    2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018

    pp. 9290–9299. External Links: Document Cited by: §1, §2.1, §2.
  • A. V. Dalca, E. Yu, P. Golland, B. Fischl, M. R. Sabuncu, and J. Eugenio Iglesias (2019c) Unsupervised deep learning for bayesian brain mri segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 356–365. External Links: ISBN 978-3-030-32248-9 Cited by: §1.
  • S. Dong, G. Luo, C. Tam, W. Wang, K. Wang, S. Cao, B. Chen, H. Zhang, and S. Li (2020) Deep Atlas Network for Efficient 3D Left Ventricle Segmentation on Echocardiography. Medical Image Analysis 61, pp. 101638. External Links: Document, ISSN 13618423 Cited by: §1, §1, §2.1, §2, §3.2, §4.
  • J. Duan, G. Bello, J. Schlemper, W. Bai, T. J. W. Dawes, C. Biffi, A. de Marvao, G. Doumoud, D. P. O’Regan, and D. Rueckert (2019) Automatic 3D Bi-Ventricular Segmentation of Cardiac Images by a Shape-Refined Multi- Task Deep Learning Approach. IEEE Transactions on Medical Imaging 38 (9), pp. 2151–2164. External Links: Document, 1808.08578, ISSN 0278-0062 Cited by: §1.
  • A. Guimond, J. Meunier, and J. P. Thirion (2000) Average brain models: A convergence study. Computer Vision and Image Understanding 77 (2), pp. 192–210. External Links: Document, ISSN 10773142 Cited by: §2.3, §2, §4.
  • G. Haskins, U. Kruger, and P. Yan (2020) Deep learning in medical image registration: a survey. Machine Vision and Applications 31 (1), pp. 1–18. External Links: Document, ISSN 14321769 Cited by: §1.
  • T. Heimann and H. P. Meinzer (2009) Statistical shape models for 3D medical image segmentation: A review. Medical Image Analysis 13 (4), pp. 543–563. External Links: Document, ISSN 13618415 Cited by: §1, §1.
  • M. P. Heinrich and J. Oster (2018) MRI whole heart segmentation using discrete nonlinear registration and fast non-local fusion. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10663 LNCS, pp. 233–241. External Links: Document, ISBN 9783319755403, ISSN 16113349 Cited by: §1.
  • M. Hoffmann, B. Billot, J. E. Iglesias, B. Fischl, and A. V. Dalca (2020) Learning image registration without images. External Links: 2004.10282 Cited by: §3.3.
  • J. E. Iglesias, C. Liu, P. M. Thompson, and Z. Tu (2011) Robust brain extraction across datasets and comparison with publicly available methods. IEEE Transactions on Medical Imaging 30 (9), pp. 1617–1634. Cited by: §3.3.
  • J. E. Iglesias and M. R. Sabuncu (2015) Multi-atlas segmentation of biomedical images: A survey. Medical Image Analysis 24 (1), pp. 205–219. External Links: Document, 1412.3421, ISSN 13618423 Cited by: §1, §1.
  • I. Išgum, M. Staring, A. Rutten, M. Prokop, M. A. Viergever, and B. Van Ginneken (2009) Multi-atlas-based segmentation with local decision fusion-application to cardiac and aortic segmentation in CT scans. IEEE Transactions on Medical Imaging 28 (7), pp. 1000–1010. External Links: Document, ISSN 02780062 Cited by: §1.
  • M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu (2015) Spatial Transformer Networks. In Neural Information Processing Symposium, Vol. 2, pp. 2017–2025. External Links: Document, ISBN 9781450341363 Cited by: §1.
  • S. Joshi, B. Davis, M. Jomier, and G. Gerig (2004) Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage 23, pp. S151–S160. Cited by: §2.3, §2.3, §2, §3.2, §4, §4.
  • K. Kamnitsas, C. Ledig, V. F.J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis 36, pp. 61–78. External Links: Document, 1603.05959, ISSN 13618423 Cited by: §1, §1, §3.2.
  • H. A. Kirişli, M. Schaap, S. Klein, S. L. Papadopoulou, M. Bonardi, C. H. Chen, A. C. Weustink, N. R. Mollet, E. J. Vonken, R. J. Van Der Geest, T. Van Walsum, and W. J. Niessen (2010) Evaluation of a multi-atlas based method for segmentation of cardiac CTA data: A large-scale, multicenter, and multivendor study. Medical Physics 37 (12), pp. 6279–6291. External Links: Document, ISSN 00942405 Cited by: §1.
  • J. Krebs, H. Delingette, B. Mailhe, N. Ayache, and T. Mansi (2019) Learning a Probabilistic Model for Diffeomorphic Registration. IEEE transactions on medical imaging 38 (9), pp. 2165–2176. External Links: Document, 1812.07460, ISSN 1558254X Cited by: §2.1.
  • P. Lamata, M. Sinclair, E. Kerfoot, A. Lee, A. Crozier, B. Blazevic, S. Land, A. J. Lewandowski, D. Barber, S. Niederer, and N. Smith (2014) An automatic service for the personalization of ventricular cardiac meshes. Journal of the Royal Society Interface 11 (91). External Links: Document, ISSN 17425662 Cited by: §2.1.
  • A. J. Larrazabal, C. Martínez, B. Glocker, and E. Ferrante (2020)

    Post-dae: anatomically plausible segmentation via post-processing with denoising autoencoders

    IEEE Transactions on Medical Imaging 39 (12), pp. 3813–3820. External Links: Document Cited by: §1, §1, §3.2.
  • H. W. Lee, M. R. Sabuncu, and A. V. Dalca (2020) Few labeled atlases are necessary for deep-learning-based segmentation. External Links: 1908.04466 Cited by: §1.
  • M. C. H. Lee, K. Petersen, N. Pawlowski, B. Glocker, and M. Schaap (2019) TETRIS: Template Transformer Networks for Image Segmentation with Shape Priors. IEEE Transactions on Medical Imaging PP, pp. 1–1. External Links: Document, ISSN 0278-0062 Cited by: §1, §1, §4.
  • M. Lee, O. Oktay, A. Schuh, M. Schaap, and B. Glocker (2019) Image-and-spatial transformer networks for structure-guided image registration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 337–345. External Links: ISBN 978-3-030-32244-1, Document Cited by: §1, §1, §2.4, §2, §2, §2, §3.3, §3.3, §4.
  • S. Li, C. Zhang, and X. He (2020) Shape-aware semi-supervised 3d semantic segmentation for medical images. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, D. Racoceanu, and L. Joskowicz (Eds.), Cham, pp. 552–561. External Links: ISBN 978-3-030-59710-8 Cited by: §1.
  • J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3431–3440. External Links: Document Cited by: §1.
  • J. B. Maintz and M. A. Viergever (1998) A survey of medical image registration. Medical Image Analysis 2 (1), pp. 1–36. External Links: Document, ISSN 13618415 Cited by: §1.
  • L. Mansilla, D. H. Milone, and E. Ferrante (2020) Learning deformable registration of medical images with anatomical constraints. Neural Networks 124, pp. 269 – 279. External Links: ISSN 0893-6080, Document, Link Cited by: §1, §1, §2, §4.
  • P. Medrano-Gracia, B. R. Cowan, B. Ambale-Venkatesh, D. A. Bluemke, J. Eng, J. P. Finn, C. G. Fonseca, J. A.C. Lima, A. Suinesiaputra, and A. A. Young (2014) Left ventricular shape variation in asymptomatic populations: The multi-ethnic study of atherosclerosis. Journal of Cardiovascular Magnetic Resonance 16 (1), pp. 1–10. External Links: Document Cited by: §1.
  • K. L. Miller, F. Alfaro-Almagro, N. K. Bangerter, D. L. Thomas, E. Yacoub, J. Xu, A. J. Bartsch, S. Jbabdi, S. N. Sotiropoulos, J. L. R. Andersson, L. Griffanti, G. Douaud, T. W. Okell, P. Weale, I. Dragonu, S. Garratt, S. Hudson, R. Collins, M. Jenkinson, P. M. Matthews, and S. M. Smith (2016) Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19 (11), pp. 1523–1536. External Links: Document, ISSN 1097-6256 Cited by: §3.3.
  • F. Milletari, A. Rothberg, J. Jia, and M. Sofka (2017) Integrating statistical prior knowledge into convolutional neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10433 LNCS, pp. 161–168. External Links: Document, ISBN 9783319661810, ISSN 16113349 Cited by: §1.
  • E. D. Nicol, B. L. Norgaard, P. Blanke, A. Ahmadi, J. Weir-McCall, P. M. Horvat, K. Han, J. J. Bax, and J. Leipsic (2019) The Future of Cardiovascular Computed Tomography: Advanced Analytics and Clinical Insights. JACC: Cardiovascular Imaging 12 (6), pp. 1058–1072. External Links: Document, ISSN 18767591 Cited by: §3.2.
  • O. Oktay, E. Ferrante, K. Kamnitsas, M. Heinrich, W. Bai, J. Caballero, S. A. Cook, A. De Marvao, T. Dawes, D. P. O’Regan, B. Kainz, B. Glocker, and D. Rueckert (2018) Anatomically Constrained Neural Networks (ACNNs): Application to Cardiac Image Enhancement and Segmentation. IEEE Transactions on Medical Imaging 37 (2), pp. 384–395. External Links: Document, ISSN 1558254X Cited by: §1, §1, §3.2.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. Cited by: §3.2.
  • B. Patenaude, S. M. Smith, D. N. Kennedy, and M. Jenkinson (2011) A bayesian model of shape and appearance for subcortical brain segmentation. Neuroimage 56 (3), pp. 907–922. Cited by: §3.3.
  • D. L. Pham, C. Xu, and J. L. Prince (2000) Current Methods in Medical Image Segmentation. Annual Review of Biomedical Engineering 2 (1), pp. 315–337. External Links: Document, ISSN 1523-9829 Cited by: §1.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9351, pp. 12–20. External Links: Document, ISBN 9783319245737, ISSN 16113349 Cited by: §1, §1, §2.2.
  • M. A. Shafto, L. K. Tyler, M. Dixon, J. R. Taylor, J. B. Rowe, R. Cusack, A. J. Calder, W. D. Marslen-Wilson, J. Duncan, T. Dalgleish, et al. (2014) The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurology 14 (1), pp. 204. Cited by: §3.3.
  • C. Stergios, S. Mihir, V. Maria, C. Guillaume, R. Marie-Pierre, M. Stavroula, and P. Nikos (2018) Linear and Deformable Image Registration with 3D Convolutional Neural Networks. In D. Stoyanov et al. (Eds.): RAMBO 2018/BIA 2018/TIA 2018, Vol. 11040 LNCS, pp. 13–22. External Links: Document, ISBN 978-3-030-00945-8, ISSN 13618423 Cited by: §2.1, §2.2.
  • C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, J. Danesh, P. Downey, P. Elliott, J. Green, M. Landray, B. Liu, P. Matthews, G. Ong, J. Pell, A. Silman, A. Young, T. Sprosen, T. Peakman, and R. Collins (2015) UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Medicine 12 (3), pp. e1001779. External Links: Document, ISBN 1549-1676 (Electronic)$\$r1549-1277 (Linking), ISSN 15491676 Cited by: §3.3.
  • C. A. Taylor, T. A. Fonte, and J. K. Min (2013) Computational fluid dynamics applied to cardiac computed tomography for noninvasive quantification of fractional flow reserve: Scientific basis. Journal of the American College of Cardiology 61 (22), pp. 2233–2241. External Links: Document, ISSN 15583597 Cited by: §3.2, §3.2.
  • J. R. Taylor, N. Williams, R. Cusack, T. Auer, M. A. Shafto, M. Dixon, L. K. Tyler, R. N. Henson, et al. (2017) The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. NeuroImage 144, pp. 262–269. External Links: Document Cited by: §3.3.
  • S. Tilborghs, T. Dresselaers, P. Claus, J. Bogaert, and F. Maes (2020) Shape constrained cnn for cardiac mr segmentation with simultaneous prediction of shape and pose parameters. External Links: 2010.08952 Cited by: §1.
  • K. Tóthová, S. Parisot, M. C. H. Lee, E. Puyol-Antón, L. M. Koch, A. P. King, E. Konukoglu, and M. Pollefeys (2018) Uncertainty quantification in cnn-based surface prediction using shape priors. In International Workshop on Shape in Medical Imaging, M. Reuter, C. Wachinger, H. Lombaert, B. Paniagua, M. Lüthi, and B. Egger (Eds.), Cham, pp. 300–310. External Links: ISBN 978-3-030-04747-4 Cited by: §1.
  • K. Tóthová, S. Parisot, M. Lee, E. Puyol-Antón, A. King, M. Pollefeys, and E. Konukoglu (2020) Probabilistic 3d surface reconstruction from sparse mri information. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, D. Racoceanu, and L. Joskowicz (Eds.), Cham, pp. 813–823. External Links: ISBN 978-3-030-59710-8 Cited by: §1.
  • N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich, and J. C. Gee (2010) N4ITK: improved N3 bias correction. IEEE Transactions on Medical Imaging 29 (6), pp. 1310–1320. Cited by: §3.3.
  • Z. Xu and M. Niethammer (2019) DeepAtlas: joint semi-supervised learning of image registration and segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 420–429. External Links: ISBN 978-3-030-32245-8 Cited by: §1, §1, §1, §4.
  • M. Ye, Q. Huang, D. Yang, P. Wu, J. Yi, L. Axel, and D. Metaxas (2020) PC-u net: learning to jointly reconstruct and segment the cardiac walls in 3d from ct data. External Links: 2008.08194 Cited by: §1.
  • A. A. Young and A. F. Frangi (2009) Computational cardiac atlases: From patient to population and back. Experimental Physiology 94 (5), pp. 578–596. External Links: Document, ISSN 1469445X Cited by: §1, §1.
  • T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn (2020) Gradient surgery for multi-task learning. External Links: 2001.06782 Cited by: §3.3.
  • A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca (2019) Data augmentation using learned transformations for one-shot medical image segmentation. External Links: 1902.09383 Cited by: §3.3.
  • X. Zhuang, L. Li, C. Payer, D. Štern, M. Urschler, M. P. Heinrich, J. Oster, C. Wang, Ö. Smedby, C. Bian, X. Yang, P. A. Heng, A. Mortazi, U. Bagci, G. Yang, C. Sun, G. Galisot, J. Y. Ramel, T. Brouard, Q. Tong, W. Si, X. Liao, G. Zeng, Z. Shi, G. Zheng, C. Wang, T. MacGillivray, D. Newby, K. Rhode, S. Ourselin, R. Mohiaddin, J. Keegan, D. Firmin, and G. Yang (2019) Evaluation of algorithms for Multi-Modality Whole Heart Segmentation: An open-access grand challenge. Medical Image Analysis 58, pp. 101537. External Links: Document, 1902.07880, ISSN 13618423 Cited by: §1.