Deep Learning in Medical Image Registration: A Survey

by   Grant Haskins, et al.
Rensselaer Polytechnic Institute

The establishment of image correspondence through robust image registration is critical to many clinical tasks such as image fusion, organ atlas creation, and tumor growth monitoring, and is a very challenging problem. Since the beginning of the recent deep learning renaissance, the medical imaging research community has developed deep learning based approaches and achieved the state-of-the-art in many applications, including image registration. The rapid adoption of deep learning for image registration applications over the past few years necessitates a comprehensive summary and outlook, which is the main scope of this survey. This requires placing a focus on the different research areas as well as highlighting challenges that practitioners face. This survey, therefore, outlines the evolution of deep learning based medical image registration in the context of both research challenges and relevant innovations in the past few years. Further, this survey highlights future research directions to show how this field may be possibly moved forward to the next level.



There are no comments yet.



DeepReg: a deep learning toolkit for medical image registration

DeepReg ( is a community-supported...

Slice-to-volume medical image registration: a survey

During the last decades, the research community of medical imaging has w...

Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning

To date few studies have comprehensively compared medical image registra...

Going Deeper Into Face Detection: A Survey

Face detection is a crucial first step in many facial recognition and fa...

Applications of deep learning in traffic congestion alleviation: A survey

Prediction tasks related to congestion are targeted at improving the lev...

Medical Image Registration Using Deep Neural Networks: A Comprehensive Review

Image-guided interventions are saving the lives of a large number of pat...

3D Morphable Face Models – Past, Present and Future

In this paper, we provide a detailed survey of 3D Morphable Face Models ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image registration is the process of transforming different image datasets into one coordinate system with matched imaging contents, which has significant applications in medicine. Registration may be necessary when analyzing a pair of images that were acquired from different viewpoints, at different times, or using different sensors/modalities Hill et al. (2001); Zitova and Flusser (2003). Until recently, image registration was mostly performed manually by clinicians. However, many registration tasks can be quite challenging and the quality of manual alignments are highly dependent upon the expertise of the user, which can be clinically disadvantageous. To address the potential shortcomings of manual registration, automatic registration has been developed. Although other methods for automatic image registration have been extensively explored prior to (and during) the deep learning renaissance, deep learning has changed the landscape of image registration research Ambinder (2005)

. Ever since the success of AlexNet in the ImageNet challenge of 2012

Alom et al. (2018)

, deep learning has allowed for state-of-the-art performance in many computer vision tasks including, but not limited to: object detection

Ren et al. (2015)

, feature extraction

He et al. (2016), segmentation Ronneberger et al. (2015), image classification Alom et al. (2018), image denoising Yang et al. (2018), and image reconstruction Yao et al. (2018).

Initially, deep learning was successfully used to augment the performance of iterative, intensity based registration. Soon after this initial application, several groups investigated the intuitive application of reinforcement learning to registration. Further, demand for faster registration methods later motivated the development of deep learning based one-step transformation estimation techniques. The challenges associated with procuring/generating ground truth data have recently motivated many groups to develop unsupervised frameworks for one-step transformation estimation. One of the hurdles associated with this framework is the familiar challenge of image similarity quantification. Recent efforts that use information theory based similarity metrics, segmentations of anatomical structures, and generative adversarial network like frameworks to address this challenge have shown promising results. As the trends visualized in Figures

1 and 2 suggest, this field is moving very quickly to surmount the hurdles associated with deep learning based medical image registration and several groups have already enjoyed significant successes for their applications.

Figure 1: An overview of deep learning based medical image registration broken down by approach type. The popular research directions are written in bold.

Therefore, the purpose of this article is to comprehensively survey the field of deep learning based medical image registration, highlight common challenges that practitioners face, and discuss future research directions that may address these challenges. Prior to surveying deep learning based medical image registration works, background information pertaining to deep learning is discussed in Section 2. The methods surveyed in this article were divided into the following three categories: Deep Iterative Registration, Supervised Transformation Estimation, and Unsupervised Transformation Estimation. Following a discussion of the methods that belong to each of the aforementioned categories in Sections 3, 4, and 5 respectively, future research directions and current trends are discussed in Section 6.

Figure 2: An overview of the number of deep learning based image registration works and deep learning based medical imaging works. The red line represents the trend line for medical imaging based approaches and the blue line represents the trend line for deep learning based medical image registration approaches. The dotted line represents extrapolation.

2 Deep Learning

Deep learning belongs to a larger class of machine learning that uses neural networks with a large number of layers to learn representations of data

Goodfellow et al. (2016)

. Based on the way that networks are trained, most deep learning approaches fall into one of two categories: supervised learning and unsupervised learning. Supervised learning involves the designation of a desired neural network output, while unsupervised learning involves drawing inferences from a set of data without the use of any manually defined labels

Goodfellow et al. (2016); Schmidhuber (2015)

. Both supervised and unsupervised learning, allow for the use of a variety of deep learning paradigms. In this section, several of those approaches will be explored, including: convolutional neural networks, recurrent neural networks, reinforcement learning, and generative adversarial networks. Note that there are many publicly available libraries that can be used to build the networks described in the section, for example TensorFlow

Abadi et al. (2016), MXNet Chen et al. (2015)

, Keras

Chollet et al. (2015)

, Caffe

Jia et al. (2014)

, and PyTorch

Paszke et al. (2017).

2.1 Convolutional neural networks

Convolutional neural networks (CNNs) and their variants (such as the fully convolutional neural network (FCN) are among the most commonly used deep neural networks for computer vision and image processing and analysis applications. CNNs are feed forward neural networks that are often used to analyze images and perform tasks such as classification and object detection/recognition

Krizhevsky et al. (2012); Simonyan and Zisserman (2014)

. They utilize a variation of the multilayer perceptron (MLP) and are famously translation invariant due to their parameter weight sharing

Goodfellow et al. (2016). In each layer of these networks, a number of convolutional filters “slide” across the feature maps from the previous layer. The output is another set of feature maps that are constructed from the inner products of the kernel and the corresponding patches associated with previous feature maps. The feature maps that result from these convolutions are stacked and inputted into the next layer of the network. This allows for hierarchical feature extraction of the image. Further, these operations can be performed patch-wise, which is useful for a number of computer vision tasks Hosny et al. (2018); Zagoruyko and Komodakis (2015)

. Because the construction of a feature map from either an input image/image pair or a feature map is linear, non-linear activation functions are used to introduce non-linearities and enhance the expressivity of feature maps

Goodfellow et al. (2016).

Convolutional filters and their activations are often combined with pooling layers, either average or max, in a typical CNN to reduce the dimensionality of feature maps Szegedy et al. (2014); LeCun et al. (2015)

. Batch normalization (BN) is commonly used after convolutional layers as well

Ioffe and Szegedy (2015)

because of its ability to reduce internal covariate shift. Furthermore, many modern neural networks make use of residual connections

Szegedy et al. (2017) and depthwise separable convolutions Chollet (2017). These networks can be trained in an end-to-end fashion using back propagation to iteratively update the parameters that constitute the network Goodfellow et al. (2016); LeCun et al. (2015).

Additionally, randomly dropping connections in certain layers of a model during training, a strategy known as dropout, allows for the implicit use of an ensemble of models Srivastava et al. (2014). This is a popular regularization strategy that is frequently used to prevent overfitting.

2.2 Recurrent Neural Networks

Although CNNs and their variants are typically used to analyze data that exists in the spatial domain, recurrent neural networks (RNNs) that are composed of several of the network components described in the above section can be used to analyze time series data. Each element (e.g image) in the time series data is mapped to a feature representation and the “current” representation is determined by a combination of the previous representations and the “current” input datum. RNNs can be “many-to-one” or “many-to-many” (i.e. the output of the RNN can be a single datum or time series data). Further, a gated variant of RNNs- Long Short-Term Memory Networks (LSTMs)

Hochreiter and Schmidhuber (1997)- can be used to model long term dependencies by helping to prevent gradient vanishing/explosion.

2.3 Reinforcement learning

Another popular deep learning strategy is reinforcement learning. Problems that use reinforcement learning can essentially be cast as Markov decision processes

Littman (2001); Lovejoy (1991); Puterman (2014)

associated with a tuple of a state, action, transition probability, reward, and discount factor. When an agent is in a particular state, it uses a policy in order to determine an action to take among a set of state-dependent actions

Kaelbling et al. (1996). Upon performing the action that was selected by the policy, the agent transitions into the next state with a given probability and receives a reward. The goal of the agent is to maximize the total reward that it receives while performing a given task. Because the rewards that the agent will receive are subject to stochastic processes, the agent will seek to maximize the cumulative expected rewards, while using the discount factor in order to prioritize longer term rewards. The primary goal is to learn the optimal policy with respect to the expected future rewards Diuk et al. (2008); Mnih et al. (2015); Wang et al. (2015). Instead of doing this directly, most reinforcement learning paradigms learn the action-value function by using the Bellman equation Bellmann (1957). The process through which functions are approximated is referred to as -learning. These approaches utilize value functions, action-value functions Wang et al. (2015) that determine the advantageous nature of a given state and state-action pair respectivelyVan Hasselt et al. (2016); Wang et al. (2015). Further, an advantage function determines the advantageous nature of a given state-action pair relative to the other pairs Van Hasselt et al. (2016). These approaches have been applied to various video/board games and have often been able to demonstrate superhuman performance Chaslot et al. (2008); Grace et al. (2017); Guo et al. (2014); Silver et al. (2016, 2017); Takeuchi et al. (2008). The performances of such methods are often used as benchmarks that indicate the current state of deep learning research.

2.4 Generative Adversarial Networks

A generative adversarial network (GAN) Goodfellow et al. (2014)

is composed of two competing neural networks: a generator and a discriminator. The generator maps data from one domain to another. In their original implementation, they mapped a random noise vector to an image domain associated with a particular dataset. The discriminator is tasked with discerning between real data that originated from said domain and data produced by the generator. The goal for training GANs is to converge to a differentiable Nash Equilibrium

Ratliff et al. (2013), at which point generated data and real data are indistinguishable Goodfellow (2016).

When GANs are applied to medical image registration, they are commonly used for regularization. The generator predicts a transformation and the discriminator takes the resulting resampled images as its input. The discriminator is trained to discern between aligned image pairs and resampled image pairs following the generator’s prediction. The generator is typically trained using a linear combination of an adversarial loss function term (based on the discriminator’s predictions) and a target loss function term (e.g euclidean distance from ground truth). For both the generator and discriminator, a binary cross entropy (BCE) loss function is commonly used.

Further discussion of deep learning based medical image analysis and various deep learning research directions outlined above is outside of the scope of this article. However, comprehensive review articles that survey the application of deep learning to medical image analysis Lee et al. (2017); Litjens et al. (2017), reinforcement learning Kaelbling et al. (1996), and the application of GANs to medical image analysis Kazeminia et al. (2018) are recommended to the interested readers.

3 Deep Iterative Registration

Ref Learning Transform Modality ROI Model
Eppenhof and Pluim (2018b) Metric Deformable CT Thorax 9-layer CNN
Blendowski and Heinrich (2018) Metric Deformable CT Lung FCN
Simonovsky et al. (2016) Metric Deformable MR Brain 5-layer CNN
Wu et al. (2013) Metric Deformable MR Brain 2-layer CAE
Cheng et al. (2018) Metric Deformable CT/MR Head 5-layer DNN
Sedghi et al. (2018) Metric Rigid MR/US Abdominal 5-layer CNN
Haskins et al. (2019) Metric Rigid MR/US Prostate 14-layer CNN
Matthew et al. (2018) Metric Rigid MR/US Fetal Brain LSTM/STN
Krebs et al. (2017) RL Agent Deformable MR Prostate 8-layer CNN
Liao et al. (2017) RL Agent Rigid CT/CBCT Spine/ 8-layer CNN
Miao et al. (2017) Multiple Rigid X-ray/CT Spine Dilated FCN
RL Agents
Ma et al. (2017) RL Agent Rigid MR/CT Spine Dueling
Table 1: Deep Iterative Registration Methods Overview

Automatic intensity-based image registration requires both a metric that quantifies the similarity between a moving image and a fixed image and an optimization algorithm that updates the transformation parameters such that the similarity between the images is maximized. Prior to the deep learning renaissance, several manually crafted metrics were frequently used for such registration applications, including: sum of squared differences (SSD), cross-correlation (CC), mutual information (MI) Maes et al. (1997); Viola and Wells III (1997), normalized cross correlation (NCC), and normalized mutual information (NMI). Early applications of deep learning to medical image registration are direct extensions of this classical framework Simonovsky et al. (2016); Wu et al. (2013, 2016). Several groups later used a reinforcement learning paradigm to iteratively estimate a transformation Krebs et al. (2017); Liao et al. (2017); Ma et al. (2017); Miao et al. (2017) because this application is more consistent with how practitioners perform registration.

A description of both types of methods is given in Table 1. We will survey earlier methods that used deep similarity based registration in Section 3.1 and then some more recently developed methods that use deep reinforcement learning based registration in Section 3.2.

Figure 3: A visualization of the registration pipeline for works that use deep learning to quantify image similarity in an intensity-based registration framework.

3.1 Deep Similarity based Registration

In this section, methods that use deep learning to learn a similarity metric are surveyed. This similarity metric is inserted into a classical intensity-based registration framework with a defined interpolation strategy, transformation model, and optimization algorithm. A visualization of this overall framework is given in Fig. 

3. The solid lines represent data flows that are required during training and testing, while the dashed lines represent data flows that are required only during training. Note that this is the case for the remainder of the figures in this article as well.

3.1.1 Overview of Works

Although manually crafted similarity metrics perform reasonably well in the unimodal registration case, deep learning has been used to learn superior metrics. This section will first discuss approaches that use deep learning to augment the performance of unimodal intensity based registration pipelines before multimodal registration. Unimodal Registration

Wu et al. Wu et al. (2013, 2016)

were the first to use deep learning to obtain an application specific similarity metric for registration. They extracted the features that are used for unimodal, deformable registration of 3D brain MR volumes using a convolutional stacked autoencoder (CAE). They subsequently performed the registration using gradient descent to optimize the NCC of the two sets of features. This method outperformed diffeomorphic demons

Vercauteren et al. (2009) and HAMMER Shen (2007) based registration techniques.

Recently, Eppenhof et al. Eppenhof and Pluim (2018b) estimated registration error for the deformable registration of 3D thoracic CT scans (inhale-exhale) in an end-to-end capacity. They used a 3D CNN to estimate the error map for inputted inhale-exhale pairs of thoracic CT scans. Like the above method, only learned features were used in this work.

Instead, Blendowski et al. Blendowski and Heinrich (2018) proposed the combined use of both CNN-based descriptors and manually crafted MRF-based self-similarity descriptors for lung CT registration. Although the manually crafted descriptors outperformed the CNN-based descriptors, optimal performance was achieved using both sets of descriptors. This indicates that, in the unimodal registration case, deep learning may not outperform manually crafted methods. However, it can be used to obtain complementary information. Multimodal Registration

The advantages of the application of deep learning to intensity based registration are more obvious in the multimodal case, where manually crafted similarity metrics have had very little success.

Cheng et al. Cheng et al. (2016, 2018)

recently used a stacked denoising autoencoder to learn a similarity metric that assesses the quality of the rigid alignment of CT and MR images. They showed that their metric outperformed NMI and local cross correlation (LCC) for their application.

In an effort to explicitly estimate image similarity in the multimodal case, Simonovsky et al. Simonovsky et al. (2016) used a CNN to learn the dissimilarity between aligned 3D T1 and T2 weighted brain MR volumes. Given this similarity metric, gradient descent was used in order to iteratively update the parameters that define a deformation field. This method was able to outperform MI based registration and set the stage for deep intensity based multimodal registration.

Additionally, Sedghi et al. Sedghi et al. (2018) performed the rigid registration of 3D US/MR (modalities with an even greater appearance difference than MR/CT) abdominal scans by using a 5-layer neural network to learn a similarity metric that is then optimized by Powell’s method. This approach also outperformed MI based registration.

Haskins et al. Haskins et al. (2019)

learned a similarity metric for multimodal rigid registration of MR and transrectal US (TRUS) volumes by using a CNN to predict target registration error (TRE). Instead of using a traditional optimizer like the above methods, they used an evolutionary algorithm to explore the solution space prior to using a traditional optimization algorithm because of the learned metric’s lack of convexity. This registration framework outperformed MIND

Heinrich et al. (2012) and MI based registration.

In stark contrast to the above methods, Wright et al. Matthew et al. (2018)

used LSTM spatial co-transformer networks to iteratively register MR and US volumes group-wise. The recurrent spatial co-transformation occurred in three steps: image warping, residual parameter prediction, parameter composition. They demonstrated that their method is more capable of quantifying image similarity than a previous multimodal image similarity quantification method that uses self-similarity context descriptors

Heinrich et al. (2013).

3.1.2 Discussion and Assessment

Recent works have confirmed the ability of neural networks to assess image similarity in multimodal medical image registration. The results achieved by the approaches described in this section demonstrate that deep learning can be successfully applied to challenging registration tasks. However, the findings from Blendowski and Heinrich (2018) suggest that learned image similarity metrics may be best suited to complement existing similarity metrics in the unimodal case. Further, it is difficult to use these iterative techniques for real time registration.

3.2 Reinforcement Learning based Registration

In this section, methods that use reinforcement learning for their registration applications are surveyed. Here, a trained agent is used to perform the registration as opposed to a pre-defined optimization algorithm. A visualization of this framework is given in Fig. 4. Reinforcement learning based registration typically involves a rigid transformation model. However, it is possible to use a deformable transformation model.

Figure 4: A visualization of the registration pipeline for works that use deep reinforcement learning to implicitly quantify image similarity for image registration. Here, an agent learns to map states to actions based on rewards that it receives from the environment.

Liao et al. Liao et al. (2017) were the first to use reinforcment learning based registration to perform the rigid registration of cardiac and abdominal 3D CT images and cone-beam CT (CBCT) images. They used a greedy supervised approach for end-to-end training with an attention-driven hierarchical strategy. Their method outperformed MI based registration and semantic registration using probability maps.

Shortly after, Kai et al. Ma et al. (2017) used a reinforcement learning approach to perform the rigid registration of MR/CT chest volumes. This approach is derived from -learning and leverages contextual information to determine the depth of the projected images. The network used in this method is derived from the dueling network architecture Wang et al. (2015). Notably, this work also differentiates between terminal and non-terminal rewards. This method outperforms registration methods that are based on iterative closest points (ICP), landmarks, Hausdorff distance, Deep Q Networks, and the Dueling Network Wang et al. (2015).

Instead of training a single agent like the above methods, Miao et al. Miao et al. (2017) used a multi-agent system in a reinforcement learning paradigm to rigidly register X-Ray and CT images of the spine. They used an auto-attention mechanism to observe multiple regions and demonstrate the efficacy of a multi-agent system. They were able to significantly outperform registration approaches that used a state-of-the-art similarity metric given by De Silva et al. (2016).

As opposed to the above rigid registration based works, Krebs et al. Krebs et al. (2017) used a reinforcement learning based approach to perform the deformable registration of 2D and 3D prostate MR volumes. They used a low resolution deformation model for the registration and fuzzy action control to influence the stochastic action selection. The low resolution deformation model is necessary to restrict the dimensionality of the action space. This approach outperformed Elastix Klein et al. (2010) and LCC-Demons Lorenzi et al. (2013) based registration techniques.

Figure 5: A visualization of supervised single step registration.

The use of reinforcement learning is intuitive for medical image registration applications. One of the principle challenges for reinforcement learning based registration is the ability to handle high resolution deformation fields. There are no such challenges for rigid registration. Because of the intuitive nature and recency of these methods, we expect that such approaches will receive more attention from the research community in the next few years.

Ref Supervision Transform Modality ROI Model
Yang et al. (2016) Real Transforms Deformable MR Brain FCN
Cao et al. (2017) Real Transforms Deformable MR Brain 9-layer CNN
Lv et al. (2018) Real Transforms Deformable MR Abdominal CNN
Rohé et al. (2017) Real Transforms Deformable MR Cardiac SVF-Net
Sokooti et al. (2017) Synthetic Deformable CT Chest RegNet
Eppenhof and Pluim (2018a) Synthetic Deformable CT Lung U-Net
Uzunova et al. (2017) Synthetic Deformable MR Brain/ FlowNet
Transforms Cardiac
Ito and Ino (2018) Synthetic Deformable MR Brain GoogleNet
Sun et al. (2018) Synthetic Deformable CT/US Liver DVFNet
Yang (2017) Real + Synthetic Deformable MR Brain FCN
Sloan et al. (2018) Synthetic Rigid MR Brain 6-layer CNN
Transforms 10-layer FCN
Salehi et al. (2018) Synthetic Rigid MR Brain 11-layer CNN
Transforms ResNet-18
Zheng et al. (2018) Synthetic Rigid X-ray Bone 17-layer CNN
Transforms PDA Module
Miao et al. (2016b) Synthetic Rigid X-ray/ Bone 6-layer CNN
Transforms DDR
Chee and Wu (2018) Synthetic Rigid MR Brain AIRNet
Hu et al. (2018c) Segmentations Deformable MR/US Prostate 30-layer FCN
Hering et al. (2018) Segmentations + Deformable MR/US Prostate U-Net
Similarity Metric GAN
Hu et al. (2018a) Segmentations + Deformable MR/US Prostate GAN
Adversarial Loss
Fan et al. (2018b) Real Transforms + Deformable MR Brain U-Net
Similarity Metric
Yan et al. (2018) Synthetic Rigid MR/US Prostate GAN
Transforms +
Adversarial Loss
Table 2: Supervised Transformation Estimation Methods Overview

4 Supervised Transformation Estimation

Despite the early success of the previously described approaches, the transformation estimation in these methods is iterative, which can lead to slow registration. This is especially true in the deformable registration case where the solution space is high dimensional Lee et al. (2017). This motivated the development of networks that could estimate the transformation that corresponds to optimal similarity in one step. However, fully supervised transformation estimation (the exclusive use of ground truth data to define the loss function) has several challenges that are highlighted in this section.

A visualization of supervised transformation estimation is given in Fig. 5 and a description of notable works is given in Table 2. This section first discusses methods that use fully supervised approaches in Section 4.1 and then discusses methods that use dual/weakly supervised approaches in Section 4.2.

4.1 Fully Supervised Transformation Estimation

In this section, methods that used full supervision for single-step registration are surveyed. Using a neural network to perform registration as opposed to an iterative optimizer significantly speeds up the registration process.

4.1.1 Overview of works

Because the methods discussed in this section use a neural network to estimate transformation parameters directly, the use of a deformable transformation model does not introduce additional computational constraints. This is advantageous because deformable transformation models are generally superior to rigid transformation models Poulin et al. (2018). This section will first discuss approaches that use a rigid transformation model and then discuss approaches that use a deformable transformation model. Rigid Registration

Miao et al. Miao et al. (2016a, b) were the first to use deep learning to predict rigid transformation parameters. They used a CNN to predict the transformation matrix associated with the rigid registration of 2D/3D X-ray attenuation maps and 2D X-ray images. Hierarchical regression is proposed in which the 6 transformation parameters are partitioned into 3 groups. Ground truth data was synthesized in this approach by transforming aligned data. This is the case for the next three approaches that are described as well. This approach outperformed MI, CC, and gradient correlation based iterative registration approaches.

Recently, Chee et al. Chee and Wu (2018) used a CNN to predict the transformation parameters used to rigidly register 3D brain MR volumes. In their framework, affine image registration network (AIRNet), the MSE between the predicted and ground truth affine transforms is used to train the network. They are able to outperform iterative MI based registration for both the unimodal and multimodal cases.

That same year, Salehi et al. Salehi et al. (2018) used a deep residual regression network, a correction network, and a bivariant geodesic distance based loss function to rigidly register T1 and T2 weighted 3D fetal brain MRs for atlas construction. The use of the residual network to initially register the image volumes prior to the forward pass through the correction network allowed for an enhancement of the capture range of the registration. This approach was evaluated for both slice-to-volume registration and volume-to-volume registration. They validated the efficacy of their geodesic loss term and outperformed NCC registration.

Additionally, Zheng et al. Zheng et al. (2018) proposed the integration of a pairwise domain adaptation module (PDA) into a pre-trained CNN that performs the rigid registration of pre-operative 3D X-Ray images and intraoperative 2D X-ray images using a limited amount of training data. Domain adaptation was used to address the discrepancy between synthetic data that was used to train the deep model and real data.

Sloan et al. Sloan et al. (2018) used a CNN is used to regress the rigid transformation parameters for the registration of T1 and T2 weighted brain MRs. Both unimodal and multimodal registration were investigated in this work. The parameters that constitute the convolutional layers that were used to extract low-level features in each image were only shared in the unimodal case. In the multimodal case, these parameters were learned separately. This approach also outperformed MI based image registration. Deformable Registration

Unike the previous section, methods that use both real and synthesized ground truth labels will be discussed. Methods that use clinical/publicly available ground truth labels for training are discussed first. This ordering is reflective of the fact that simulating realistic deformable transformations is more difficult than simulating realistic rigid transformations.

First, Yang et al. Yang et al. (2016) predicted the deformation field with an FCN that is used to register 2D/3D intersubject brain MR volumes in a single step. A U-net like architecture Ronneberger et al. (2015) was used in this approach. Further, they used large diffeomorphic metric mapping to provide a basis, used the initial momentum values of the pixels of the image volumes as the network input, and evolved these values to obtain the predicted deformation field. This method outperformed semi-coupled dictionary learning based registration Cao et al. (2015).

The following year, Rohe et al. Rohé et al. (2017) also used a U-net Ronneberger et al. (2015) inspired network to estimate the deformation field used to register 3D cardiac MR volumes. Mesh segmentations are used to compute the reference transformation for a given image pair and SSD between the prediction and ground truth is used as the loss function. This method outperformed LCC Demons based registration Lorenzi et al. (2013).

That same year, Cao et al. Cao et al. (2017) used a CNN to map input image patches of a pair of 3D brain MR volumes to their respective displacement vector. The totality of these displacement vectors for a given image constitutes the deformation field that is used to perform the registration. Additionally, they used the similarity between inputted image patches to guide the learning process. Further, they used equalized active-points guided sampling strategy that makes it so that patches with higher gradient magnitudes and displacement values are more likely to be sampled for training. This method outperforms SyN Avants et al. (2008) and Demons Vercauteren et al. (2009) based registration methods.

Recently, Jun et al. Lv et al. (2018) used a CNN to perform the deformable registration of abdominal MR images to compensate for the deformation that is caused by respiration. This approach achieved registration results that are superior to those obtained using non-motion corrected registrations and local affine registration. Recently, unlike many of the other approaches discussed in this paper, Yang et al. Yang (2017)

quantified the uncertainty associated with the deformable registration of 3D T1 and T2 weighted brain MRs using a low-rank Hessian approximation of the variational gaussian distribution of the transformation parameters. This method was evaulated on both real and synthetic data.

Just as deep learning practitioners use random transformations to enhance the diversity of their dataset, Sokooti et al. Sokooti et al. (2017) used random DVFs to augment their dataset. They used a multi-scale CNN to predict a deformation field. This deformation is used to perform intra-subject registration of 3D chest CT images. This method used late fusion as opposed to early fusion, in which the patches are concatenated and used as the input to the network. The performance of their method is competitive with B-Spline based registration Sokooti et al. (2017).

Such approaches have notable, but also limited ability to enhance the size and diversity of datasets. These limitations motivated the development of more sophisticated ground truth generation. The rest of the approaches described in this section use simulated ground truth data for their applications.

For example, Eppenhof et al. Eppenhof and Pluim (2018a) used a 3D CNN to perform the deformable registration of inhale-exhale 3D lung CT image volumes. A series of multi-scale, random transformations of aligned image pairs eliminate the need for manually annotated ground truth data while also maintaining realistic image appearance. Further, as is the case with other methods that generate ground truth data, the CNN can be trained using relatively few medical images in a supervised capacity.

Unlike the above works, Uzunova et al. Uzunova et al. (2017) generated ground truth data using statistical appearance models (SAMs). They used a CNN to estimate the deformation field for the registration of 2D brain MRs and 2D cardiac MRs, and adapt FlowNet Dosovitskiy et al. (2015) for their application. They demonstrated that training FlowNet using SAM generated ground truth data resulted in superior performance to CNNs trained using either randomly generated ground truth data or ground truth data obtained using the registration method described in Ehrhardt et al. (2015).

Unlike the other methods in this section that use random transformations or manually crafted methods to generate ground truth data, Ito et al. Ito and Ino (2018) used a CNN to learn plausible deformations for ground truth data generation. They evaluated their approach on the 3D brain MR volumes in the ADNI dataset and outperformed the MI based approach proposed in Ikeda et al. (2014).

4.1.2 Discussion and Assessment

Supervised transformation estimation has allowed for real time, robust registration across applications. However, such works are not without their limitations. Firstly, the quality of the registrations using this framework is dependent on the quality of the ground truth registrations. The quality of these labels is, of course, dependent upon the expertise of the practitioner. Furthermore, these labels are fairly difficult to obtain because there are relatively few individuals with the expertise necessary to perform such registrations. Transformations of training data and the generation of synthetic ground truth data can address such limitations. However, it is important to ensure that simulated data is sufficiently similar to clinical data. These challenges motivated the development of partially supervised/unsupervised approaches, which will be discussed next.

4.2 Dual/Weakly Supervised Transformation Estimation

Dual supervision refers to the use of both ground truth data and some metric that quantifies image similarity to train a model. On the other hand, weak supervision refers to using the overlap of segmentations of corresponding anatomical structures to design the loss function. This section will discuss the contributions of such works in Section 4.2.1 and then discuss the overall state of this research direction in Section 4.2.2.

4.2.1 Overview of works

Figure 6: A visualization of deep single step registration where the agent is trained using dual supervision. The loss function is determined using both a metric that quantifies image similarity and ground truth data.

First, this section will discuss methods that use dual supervised and then will discuss methods that use weak supervision. Recently, Fan et al. Fan et al. (2018b) used hierarchical, dual-supervised learning to predicted the deformation field for 3D brain MR registration. They amend the traditional U-Net architecture Ronneberger et al. (2015) by using “gap-filling” (i.e., inserting convolutional layers after the U-type ends or the architecture) and coarse-to-fine guidance. This approach leveraged both the similarity between the predicted and ground truth transformations, and the similarity between the warped and fixed images to train the network. The architecture detailed in this method outperformed the traditional U-Net architecture and the dual supervision strategy is verified by ablating the image similarity loss function term. A visualization of dual supervised transformation estimation is given in Fig. 6.

On the other hand, Yan et al. Yan et al. (2018) used a GAN Goodfellow et al. (2014) framework to perform the rigid registration of 3D MR and TRUS volumes. In this work, the generator was trained to estimate a rigid transformation. While, the discriminator was trained to discern between images that were aligned using the ground truth transformations and images that were aligned using the predicted transformations. Both Euclidean distance to ground truth and an adversarial loss term are used to construct the loss function in this method, which outperformed both MIND based registration and MI based registration. Note that the adversarial supervision strategy that was used in this approach is similar to the ones that are used in a number of unsupervised works that will be described in the next section. A visualization of adversarial transformation estimation is given in Fig. 7.

Figure 7: A visualization of an adversarial image registration framework. Here, the generator is trained using output from the discriminator. The discriminator takes the form of a learned metric here.

Unlike the above methods that used dual supervision, Hu et al. Hu et al. (2018b, c)

recently used label similarity to train their network to perform MR-TRUS registration. In their initial work, they used two neural networks: local-net and global-net to estimate the global affine transformation with 12 degrees of freedom and the local dense deformation field respectively

Hu et al. (2018b). The local-net uses the concatenation of the transformation of the moving image given by the global-net and the fixed image as its input. However, in their later work Hu et al. (2018c), they combine these networks in an end-to-end framework. This method outperformed NMI based and NCC based registration. A visualization of weakly supervised transformation estimation is given in Fig. 8. In another work, Hu et al. Hu et al. (2018a) simultaneously maximized label similarity and minimized an adversarial loss term to predict the deformation for MR-TRUS registration. This regularization term forces the predicted transformation to result in the generation of a realistic image. Using the adversarial loss as a regularization term is likely to successfully force the transformation to be realistic given proper hyper parameter selection. The performance of this registration framework was inferior to the performance of their previous registration framework described above. However, they showed that adversarial regularization is superior to standard bending energy based regularization. Similar to the above method, Hering et al. Hering et al. (2018) built upon the progress made with respect to both dual and weak supervision by introducing a label and similarity metric based loss function for cardiac motion tracking via the deformable registration of 2D cine-MR images. Both segmentation overlap and edge based normalized gradient fields distance were used to construct the loss function in this approach. Their method outperformed a multilevel registration approach similar to the one proposed in Rühaak et al. (2013).

Figure 8: A visualization of deep single step registration where the agent is trained using label similarity (i.e weak supervision). Manually annotated data (segmentations) are used to define the loss function used to train the network.

4.2.2 Discussion and Assessment

Direct transformation estimation marked a major breakthrough for deep learning based image registration. With full supervision, promising results have been obtained. However, at the same time, those techniques require a large amount of detailed annotated images for training. Partially/weakly supervised transformation estimation methods alleviated the limitations associated with the trustworthiness and expense of ground truth labels. However, they still require manually annotated data (e.g ground truth and/or segmentations). On the other hand, weak supervision allows for similarity quantification in the multimodal case. Further, partial supervision allows for the aggregation of methods that can be used to assess the quality of a predicted registration. As a result, there is growing interest in these research areas.

Ref Loss Function Transform Modality ROI Model
Jiang and Shackleford (2018) SSD Deformable CT Chest Multi-scale
Ghosal and Ray (2017) UB SSD Deformable MR Brain 19-layer
Zhang (2018) MSD Deformable MR Brain ICNet
Shu et al. (2018) MSE Deformable SEM Neurons 11-layer
Dalca et al. (2018) MSE Deformable MR Brain VoxelMorph
Sheikhjafari et al. (2018) MSE Deformable MR Cardiac 8-layer
Cine FCNet
Kuang and Schmah (2018) CC Deformable MR Brain FAIM
Li and Fan (2018) NCC Deformable MR Brain 8-layer
Cao et al. (2018) NCC Deformable CT, MR Pelvis U-Net
de Vos et al. (2017) NCC Deformable MR Cardiac DIRNet
de Vos et al. (2018) NCC Deformable MR Cardiac DLIR
Ferrante et al. (2018) NCC Deformable X-ray, MR Bone U-Net
Cardiac STN
Sun and Zhang (2018) L2 Distance + Deformable MR, US Brain FCN
Image Gradient
Neylon et al. (2017) Predicted TRE Deformable CT Head/Neck FCN
Fan et al. (2018a) BCE Deformable MR Brain GAN
Mahapatra (2018) NMI + SSIM Deformable MR, FA/ Cardiac GAN
+ VGG Outputs Color fundus Retinal
Mahapatra et al. (2018) NMI + SSIM + Deformable X-ray Bone GAN
VGG Outputs +
Yoo et al. (2017) MSE AE Output Deformable ssEM Neurons CAE
Wu et al. (2016) MSE Stacked Deformable MR Brain Stacked
AE Outputs AE
Wu et al. (2013) NCC of Deformable MR Brain Stacked
ISA Outputs ISA
Krebs et al. (2018a) Log Likelihood Deformable MR Brain cVAE
Liu and Leung (2017) SSD MIND + Deformable CT, MR Chest FCN
PCANet Outputs Brain PCANet
Kori et al. (2018) SSD VGG Rigid MR Brain CNN
Outputs MLP
Table 3: Unsupervised Transformation Estimation Methods Overview

5 Unsupervised Transformation Estimation

Despite the success of the methods described in the previous sections, the difficult nature of the acquisition of reliable ground truth remains a significant hindrance. This has motivated a number of different groups to explore unsupervised approaches. One key innovation that has been useful to these works is the spatial transformer network (STN)

Jaderberg et al. (2015). Several methods use an STN to perform the deformations associated with their registration applications. This section discusses unsupervised methods that utilize image similarity metrics (Section 5.1) and feature representations of image data (Section 5.2) to train their networks. A description of notable works is given in Table 3.

5.1 Similarity Metric based Unsupervised Transformation Estimation

5.1.1 Standard Methods

This section begins by discussing approaches that use a common similarity metric with common regularization strategies to define their loss functions. Later in the section, approaches that use more complex similarity metric based strategies are discussed. A visualization of standard similarity metric based transformation estimation is given in Fig. 9.

Figure 9: A visualization of deep single step registration where the network is trained using a metric that quantifies image similarity. Therefore, the approach is unsupervised.

Inspired to overcome the difficulty associated with obtaining ground truth data, Li et al. Li and Fan (2017, 2018) trained an FCN to perform deformable intersubject registration of 3D brain MR volumes using ”self-supervision.” NCC between the warped and fixed images and several common regularization terms (e.g smoothing constraints) constitute the loss function in this method. Although many manually defined similarity metrics fail in the multimodal case (with the occasional exception of MI), they are often suitable for the unimodal case. The method detailed in this work outperforms ANTs based registration and the deep learning methods proposed by Sokooti et al. Sokooti et al. (2017) (discussed previously) and Yoo et al. Yoo et al. (2017) (discussed in the next section).

Further, de Vos et al. de Vos et al. (2017) used NCC to train an FCN to perform the deformable registration of 4D cardiac cine MR volumes. A DVF is used in this method to deform the moving volume. Their method outperforms Elastix based registration Klein et al. (2010).

In another work, de Vos et al. de Vos et al. (2018) use a multistage, multiscale approach to perform unimodal registration on several datasets. NCC and a bending-energy regularization term are used to train the networks that predict an affine transformation and subsequent coarse-to-fine deformations using a B-Spline transformation model. In addition to validating their multi-stage approach, they show that their method outperforms simple elastix based registration with and without bending energy.

The unsupervised deformable registration framework used by Ghosal et al. Ghosal and Ray (2017) minimizes the upper bound of the SSD (UB SSD) between the warped and fixed 3D brain MR images. The design of their network was inspired by the SKIP architecture Long et al. (2015). This method outperforms log-demons based registration.

Shu et al. Shu et al. (2018) used a coarse-to-fine, unsupervised deformable registration approach to register images of neurons that are acquired using a scanning electron microscope (SEM). The mean squared error (MSE) between the warped and fixed volumes is used as the loss function here. Their approach is competitive with and faster than the sift flow framework Liu et al. (2011).

Sheikhjafari et al. Sheikhjafari et al. (2018) used learned latent representations to perform the deformable registration of 2D cardiac cine MR volumes. Deformation fields are thus obtained by embedding. This latent representation is used as the input to a network that is composed of 8 fully connected layers to obtain the transformation. The sum of absolute errors (SAE) is used as the loss function. This method outperforms a moving mesh correspondence based method described in Punithakumar et al. (2017).

Stergios et al. Stergios et al. (2018) used a CNN to both linearly and locally register inhale-exhale pairs of lung MR volumes. Therefore, both the affine transformation and the deformation are jointly estimated. The loss function is composed of an MSE term and regularization terms. Their method outperforms several state-of-the-art methods that do not utilized ground truth data, including Demons Lorenzi et al. (2013), SyN Avants et al. (2008), and a deep learning based method that uses an MSE loss term. Further, the inclusion of the regularization terms is validated by an ablation study.

The successes of deep similarity metric based unsupervised registration motivated Neylon et al. Neylon et al. (2017) to use a neural network to learn the relationship between image similarity metric values and TRE when registering CT image volumes. This is done in order to robustly assess registration performance. The network was able to achieve subvoxel accuracy in 95% of cases. Similarly inspired, Balakrishnan et al. Balakrishnan et al. (2018a, b) proposed a general framework for unsupervised image registration, which can be either unimodal or multimodal theoretically. The neural networks are trained using a selected, manually-defined image similarity metric (e.g. NCC, NMI, etc.).

In a follow-up paper, Dalca et al. Dalca et al. (2018) casted deformation prediction as variational inference. Diffeomorphic integration is combined with a transformer layer to obtain a velocity field. Squaring and rescaling layers are used to integrate the velocity field to obtain the predicted deformation. MSE is used as the similarity metric that, along with a regularization term, define the loss function. Their method outperforms ANTs based registration Avants et al. (2011) and the deep learning based method described in Balakrishnan et al. (2018a).

Shortly after, Kuang et al. Kuang and Schmah (2018)

used a CNN and STN inspired framework to perform the deformable registration of T1-weighted brain MR volumes. The loss function is composed of a NCC term and a regularization term. This method uses Inception modules, a low capacity model, and residual connections instead of skip connections. They compare their method with VoxelMorph (the method proposed by Balakrishnan et al., described above)

Balakrishnan et al. (2018b) and uTIlzReg GeoShoot Vialard et al. (2012) using the LBPA40 and Mindboggle 101 datasets and demonstrate superior performance with respect to both.

Building upon the progress made by the previously described metric-based approaches, Ferrante et al. Ferrante et al. (2018)

used a transfer learning based approach to perform unimodal registration of both X-ray and cardiac cine images. In this work, the network is trained on data from a source domain using NCC as the primary loss function term and tested in a target domain. They used a U-net like architecture

Ronneberger et al. (2015) and an STN Jaderberg et al. (2015) to perform the feature extraction and transformation estimation respectively. They demonstrated that transfer learning using either domain as the source or the target domain produces effective results. This method outperformed the Elastix registration technique Klein et al. (2010).

Although applying similarity metric based approaches to the multimodal case is difficult, Sun et al. Sun and Zhang (2018) proposed an unsupervised method for 3D MR/US brain registration that uses a 3D CNN that consists of a feature extractor and a deformation field generator. This network is trained using a similarity metric that incorporates both pixel intensity and gradient information. Further, both image intensity and gradient information are used as inputs into the CNN.

5.1.2 Extensions

Cao et al. Cao et al. (2018) also applied similarity metric based training to the multimodal case. Specifically, they used intra-modality image similarity to supervise the multimodal deformable registration of 3D pelvic CT/MR volumes. The NCC between the moving image that is warped using the ground truth transformation and the moving image that is warped using the predicted transformation is used as the loss function. This work utilizes ”dual” supervision (i.e. the intra-modality supervision previously described is used for both the CT and the MR images). This is not to be confused with the dual supervision strategies described earlier.

Inspired by the limiting nature of the asymmetric transformations that typical unsupervised methods estimate, Zhang et al. Zhang (2018) used their network Inverse-Consistent Deep Network (ICNet)-to learn the symmetric diffeomorphic transformations for each of the brain MR volumes that are aligned into the same space. Different from other works that use standard regularization strategies, this work introduces an inverse-consistent regularization term and an anti-folding regularization term to ensure that a highly weighted smoothness constraint does not result in folding. Finally, the MSD between the two images allows this network to be trained in an unsupervised manner. This method outperformed SyN based registration Avants et al. (2008), Demons based registration Lorenzi et al. (2013), and several deep learning based approaches.

The next three approaches described in this section used a GAN for their applications. Unlike the GAN-based approaches described previously, these methods use neither ground truth data nor manually crafted segmentations. Mahapatra et al. Mahapatra (2018) used a GAN to implicitly learn the density function that represents the range of plausible deformations of cardiac cine images and multimodal retinal images (retinal colour fundus images and fluorescein angiography (FA) images). In addition to NMI, structual similarity index measure (SSIM), and a feature perceptual loss term (determined by the SSD between VGG outputs), the loss function is comprised of conditional and cyclic constraints, which are based on recent advances involving the implementation of adversarial frameworks. Their approach outperforms Elastix based registration and the method proposed by de Vos et al. de Vos et al. (2017).

Further, Fan et al. Fan et al. (2018a) used a GAN to perform unsupervised deformable image registration of 3D brain MR volumes. Unlike most other unsupervised works that use a manually crafted similarity metric to determine the loss function and unlike the previous approach that used a GAN to ensure that the predicted deformation is realistic, this approach uses a discriminator to assess the quality of the alignment. This approach outperforms Diffeomorphic Demons and SyN registration on every dataset except for MGH10. Further, the use of the discriminator for supervision of the registration network is superior to the use of ground truth data, SSD, and CC on all datasets.

Different from the hitherto previously described works (not just the GAN based ones), Mahapatra et al. Mahapatra et al. (2018) proposed simultaneous segmentation and registration of chest X-rays using a GAN framework. The network takes 3 inputs: reference image, floating image, and the segmentation mask of the reference image and outputs the segmentation mask of the transformed image, and the deformation field. Three discriminators are used to assess the quality of the generated outputs (deformation field, warped image, and segmentation) using cycle consistency and a dice metric. The generator is additionally trained using NMI, SSIM, and a feature perceptual loss term.

Finally, instead of predicting a deformation field given a fixed parameterization as the other methods in this section do, Jiang et al. Jiang and Shackleford (2018) used a CNN to learn an optimal parameterization of an image deformation using a multi-grid B-Spline method and L1-norm regularization. They use this approach to parameterize the deformable registration of 4D CT thoracic image volumes. Here, SSD is used as the similarity metric and L-BFGS-B is used as the optimizer. The convergence rate using the parameterized deformation model obtained using the proposed method is faster than the one obtained using a traditional L1-norm regularized multi-grid parameterization.

5.1.3 Discussion and Assessment

Image similarity based unsupervised image registration has received a lot of attention from the research community recently because it bypasses the need for expert labels of any kind. This means that the performance of the model will not depend on the expertise of the practitioner. Further, extensions of the original similarity metric based method that introduce more sophisticated similarity metrics (e.g the discriminator of a GAN) and/or regularization strategies have yielded promising results. However, it is still difficult to quantify image similarity for multimodal registration applications. As a result, the scope of unsupervised, image similarity based works is largely confined to the unimodal case. Given that multimodal registration is often needed in many clinical applications, we expect to see more papers in the near future that will tackle this challenging problem.

5.2 Feature based Unsupervised Transformation Estimation

In this section, methods that use learned feature representations to train neural networks are surveyed. Like the methods surveyed in the previous section, the methods surveyed in this section do not require ground truth data. In this section, approaches that create unimodal registration pipelines are presented first. Then, an approach that tackles multimodal image registration is discussed. A visualization of featured based transformation estimation is given in Fig. 10.

5.2.1 Unimodal Registration

Yoo et al. Yoo et al. (2017) used an STN to register serial-section electron microscopy images (ssEMs).. An autoencoder is trained to reconstruct fixed images and the L2 distance between reconstructed fixed images and corresponding warped moving images is used along with several regularization terms to construct the loss function. This approach outperforms the bUnwarpJ registration technique Arganda-Carreras et al. (2006) and the Elastic registration technique Saalfeld et al. (2012).

In the same year, Liu et al. Liu and Leung (2017)

proposed a tensor based MIND method using a principle component analysis based network (PCANet)

Chan et al. (2015) for both unimodal and multimodal registration. Both inhale-exhale pairs of thoracic CT volumes and multimodal pairs of brain MR images are used for experimental validation of this approach. MI and residual complexity (RC) based Myronenko and Song (2010), and the original MIND-based Heinrich et al. (2012) registration techniques were outperformed by the proposed method.

Figure 10: A visualization of feature based unsupervised image registration. Here, a feature extractor is used to map inputted images to a feature space to facilitate the prediction of transformation parameters.

Krebs et al. Krebs et al. (2018a, b) performed the registration of 2D brain and cardiac MRs and bypassed the need for spatial regularization using a stochastic latent space learning approach. A conditional variational autoencoder Doersch (2016)

is used to ensure that the parameter space follows a prescribed probability distribution. The negative log liklihood of the fixed image given the latent representation and the warped volume and KL divergence of the latent distribution from a prior distribution are used to define the loss function. This method outperforms the Demons technique

Lorenzi et al. (2013) and the deep learning method described in Balakrishnan et al. (2018a).

5.2.2 Multimodal Registration

Unlike all of the other methods described in this section, Kori et al. perform feature extraction and affine transformation parameter regression for the multimodal registration of 2-D T1 and T2 weighted brain MRs in an unsupervised capacity using pre-trained networks Kori et al. (2018)

. The images are binarized and then the Dice score between the moving and the fixed images is used as the cost function. As the appearance difference between these two modalities is not significant, the use of these pre-trained models can be reasonably effective.

5.2.3 Discussion and Assessment

Performing multimodal image registration in an unsupervised capacity is significantly more difficult than performing unimodal image registration because of the difficulty associated with using manually crafted similarity metrics to quantify the similarity between the two images, and generally using the unsupervised techniques described above to establish/detect voxel-to-voxel correspondence. The use of unsupervised learning to learn feature representations to determine an optimal transformation has generated significant interest from the research community recently. Along with the previously discussed unsupervised image registration method, we expect feature based unsupervised registration to continue to generate significant interest from the research community. Further, extension to the multimodal case (especially for applications that use image with significant appearance differences) is likely to be a prominent research focus in the next few years.

6 Research Trends and Future Directions

In this section, we summarize the current research trends and future directions of deep learning in medical image registration. As we can see from Fig. 2, some research trends have emerged. First, deep learning based medical image registration seems to be following the observed trend for the general application of deep learning to medical image analysis. Second, unsupervised transformation estimation methods have been garnering more attention recently from the research community. Further, deep learning based methods consistently outperform traditional optimization based techniques Nazib et al. (2018). Based on the observed research trends, we speculate that the following research directions will receive more attention in the research community.

6.1 Deep Adversarial Image Registration

We further speculate that GANs will be used more frequently in deep learning based image registration in the next few years. As described above, GANs can serve several different purposes in deep learning based medical image registration: ensuring that predicted transformations are realistic, using a discriminator as a learned similarity metric, and using a GAN to perform image translation to transform a multimodal registration problem into a unimodal registration problem.

Unconstrained deformation field prediction can result in warped moving images with unrealistic organ appearances. A common approach to add the L2 norm of the predicted deformation field to the loss function. However, Hu et al. Hu et al. (2018a) explored the use of a GAN like framework to produce realistic deformations. Constraining the deformation prediction using a discriminator results in superior performance relative to the use of L2 norm regularization.

Further, GANs have been used in several works to obtain a learned similarity metric. Several recent works Fan et al. (2018a); Yan et al. (2018) use a discriminator to discern between aligned and misaligned image pairs. This is particularly useful in the multimodal registration case where manually crafted similarity metrics famously have little success. Because this allows the generator to be trained without ground truth transformations, further research into using discriminators as similarity metric will likely allow for unsupervised multimodal registration.

Lastly, GANs can be used to map medical images in a source domain (e.g MR) to a target domain (e.g CT) Choi et al. (2018); Isola et al. (2017); Liu et al. (2017); Yi et al. (2017), regardless of whether or not paired training data is available Zhu et al. (2017). This would be advantageous because many unimodal unsupervised registration methods use similarity metrics, which often fail in the multimodal case, to define their loss functions. If image translation could be performed as a pre-processing step, then commonly used similarity metrics could be used to define the loss function.

6.2 Reinforcement Learning based Registration

We also project that reinforcement learning will also be more commonly used for medical image registration in the next few years because it is very intuitive and can mimic the manner in which physicians perform registration. It should be noted that there are some unique challenges associated with deep learning based medical image registration: including the dimensionality of the action space in the deformable registration case. However, we believe that such limitations are surmountable because there is already one proposed method that uses reinforcement learning based registration with a deformable transformation model Krebs et al. (2017).

6.3 Raw Imaging Domain Registration

This article has focused on surveying methods performing registration using reconstructed images. However, we speculate that it is possible to incorporate reconstruction into an end-to-end deep learning based registration pipeline. In 2016, Wang Wang (2016) postulated that deep neural networks could be used to perform image reconstruction. Further, several works Yao et al. (2018); Zhu et al. (2018); Rivenson et al. (2018) recently demonstrated the ability of deep learning to map data points in the raw data domain to the reconstructed image domain. Therefore, it is reasonable to expect that registration pipelines that take raw data as input and output registered, reconstructed images can be developed within the next few years.

7 Conclusion

In this article, the recent works that use deep learning to perform medical image registration have been examined. As each application has its own unique challenges, the creation of the deep learning based frameworks must be carefully designed. Most deep learning based medical image registration applications share similar challenges (e.g. lack of a large database, the difficulty associated with robustly labeling medical images). Recent successes have demonstrated the impact of the application of deep learning to medical image registration. This trend can be observed across medical imaging applications. Many future exciting works are sure to build on the recent progress that has been outlined in this paper.


  • Abadi et al. (2016) Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). Tensorflow: a system for large-scale machine learning. In OSDI, volume 16, pages 265–283.
  • Alom et al. (2018) Alom, M. Z., Taha, T. M., Yakopcic, C., Westberg, S., Hasan, M., Van Esesn, B. C., Awwal, A. A. S., and Asari, V. K. (2018). The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164.
  • Ambinder (2005) Ambinder, E. P. (2005). A history of the shift toward full computerization of medicine. Journal of oncology practice, 1(2):54–56.
  • Arganda-Carreras et al. (2006) Arganda-Carreras, I., Sorzano, C. O., Marabini, R., Carazo, J. M., Ortiz-de Solorzano, C., and Kybic, J. (2006). Consistent and elastic registration of histological sections using vector-spline regularization. In International Workshop on Computer Vision Approaches to Medical Image Analysis, pages 85–95. Springer.
  • Avants et al. (2008) Avants, B. B., Epstein, C. L., Grossman, M., and Gee, J. C. (2008). Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26–41.
  • Avants et al. (2011) Avants, B. B., Tustison, N. J., Song, G., Cook, P. A., Klein, A., and Gee, J. C. (2011). A reproducible evaluation of ants similarity metric performance in brain image registration. Neuroimage, 54(3):2033–2044.
  • Balakrishnan et al. (2018a) Balakrishnan, G., Zhao, A., Sabuncu, M. R., Guttag, J., and Dalca, A. V. (2018a). An unsupervised learning model for deformable medical image registration. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 9252–9260.
  • Balakrishnan et al. (2018b) Balakrishnan, G., Zhao, A., Sabuncu, M. R., Guttag, J., and Dalca, A. V. (2018b). Voxelmorph: A learning framework for deformable medical image registration. arXiv preprint arXiv:1809.05231.
  • Bellmann (1957) Bellmann, R. (1957). Dynamic programming princeton university press. Princeton, NJ.
  • Blendowski and Heinrich (2018) Blendowski, M. and Heinrich, M. P. (2018). Combining mrf-based deformable registration and deep binary 3d-cnn descriptors for large lung motion estimation in copd patients. International journal of computer assisted radiology and surgery, pages 1–10.
  • Cao et al. (2015) Cao, T., Singh, N., Jojic, V., and Niethammer, M. (2015). Semi-coupled dictionary learning for deformation prediction. In Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on, pages 691–694. IEEE.
  • Cao et al. (2018) Cao, X., Yang, J., Wang, L., Xue, Z., Wang, Q., and Shen, D. (2018). Deep learning based inter-modality image registration supervised by intra-modality similarity. arXiv preprint arXiv:1804.10735.
  • Cao et al. (2017) Cao, X., Yang, J., Zhang, J., Nie, D., Kim, M., Wang, Q., and Shen, D. (2017). Deformable image registration based on similarity-steered cnn regression. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 300–308. Springer.
  • Chan et al. (2015) Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., and Ma, Y. (2015). Pcanet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, 24(12):5017–5032.
  • Chaslot et al. (2008) Chaslot, G., Bakkes, S., Szita, I., and Spronck, P. (2008). Monte-carlo tree search: A new framework for game ai. In AIIDE.
  • Chee and Wu (2018) Chee, E. and Wu, J. (2018). Airnet: Self-supervised affine registration for 3d medical images using neural networks. arXiv preprint arXiv:1810.02583.
  • Chen et al. (2015) Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., and Zhang, Z. (2015). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
  • Cheng et al. (2016) Cheng, X., Zhang, L., and Zheng, Y. (2016). Deep similarity learning for multimodal medical images. In International conference on medical image computing and computer-assisted intervention.
  • Cheng et al. (2018) Cheng, X., Zhang, L., and Zheng, Y. (2018). Deep similarity learning for multimodal medical images. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 6(3):248–252.
  • Choi et al. (2018) Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., and Choo, J. (2018).

    Stargan: Unified generative adversarial networks for multi-domain image-to-image translation.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797.
  • Chollet (2017) Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. arXiv preprint, pages 1610–02357.
  • Chollet et al. (2015) Chollet, F. et al. (2015). Keras.
  • Dalca et al. (2018) Dalca, A. V., Balakrishnan, G., Guttag, J., and Sabuncu, M. R. (2018). Unsupervised learning for fast probabilistic diffeomorphic registration. arXiv preprint arXiv:1805.04605.
  • De Silva et al. (2016) De Silva, T., Uneri, A., Ketcha, M., Reaungamornrat, S., Kleinszig, G., Vogt, S., Aygun, N., Lo, S., Wolinsky, J., and Siewerdsen, J. (2016). 3d–2d image registration for target localization in spine surgery: investigation of similarity metrics providing robustness to content mismatch. Physics in Medicine & Biology, 61(8):3009.
  • de Vos et al. (2018) de Vos, B. D., Berendsen, F. F., Viergever, M. A., Sokooti, H., Staring, M., and Išgum, I. (2018). A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis.
  • de Vos et al. (2017) de Vos, B. D., Berendsen, F. F., Viergever, M. A., Staring, M., and Išgum, I. (2017). End-to-end unsupervised deformable image registration with a convolutional neural network. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 204–212. Springer.
  • Diuk et al. (2008) Diuk, C., Cohen, A., and Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pages 240–247. ACM.
  • Doersch (2016) Doersch, C. (2016). Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908.
  • Dosovitskiy et al. (2015) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., and Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2758–2766.
  • Ehrhardt et al. (2015) Ehrhardt, J., Schmidt-Richberg, A., Werner, R., and Handels, H. (2015). Variational registration. In Bildverarbeitung für die Medizin 2015, pages 209–214. Springer.
  • Eppenhof and Pluim (2018a) Eppenhof, K. A. and Pluim, J. P. (2018a). Pulmonary ct registration through supervised learning with convolutional neural networks. IEEE transactions on medical imaging.
  • Eppenhof and Pluim (2018b) Eppenhof, K. A. J. and Pluim, J. P. (2018b). Error estimation of deformable image registration of pulmonary ct scans using convolutional neural networks. Journal of Medical Imaging, 5(2):024003.
  • Fan et al. (2018a) Fan, J., Cao, X., Xue, Z., Yap, P.-T., and Shen, D. (2018a). Adversarial similarity network for evaluating image alignment in deep learning based registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 739–746. Springer.
  • Fan et al. (2018b) Fan, J., Cao, X., Yap, P.-T., and Shen, D. (2018b). Birnet: Brain image registration using dual-supervised fully convolutional networks. arXiv preprint arXiv:1802.04692.
  • Ferrante et al. (2018) Ferrante, E., Oktay, O., Glocker, B., and Milone, D. H. (2018). On the adaptability of unsupervised cnn-based deformable image registration to unseen image domains. In International Workshop on Machine Learning in Medical Imaging, pages 294–302. Springer.
  • Ghosal and Ray (2017) Ghosal, S. and Ray, N. (2017). Deep deformable registration: Enhancing accuracy by fully convolutional neural net. Pattern Recognition Letters, 94:81–86.
  • Goodfellow (2016) Goodfellow, I. (2016). Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160.
  • Goodfellow et al. (2016) Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep learning, volume 1. MIT press Cambridge.
  • Goodfellow et al. (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680.
  • Grace et al. (2017) Grace, K., Salvatier, J., Dafoe, A., Zhang, B., and Evans, O. (2017). When will ai exceed human performance? evidence from ai experts. arXiv preprint arXiv:1705.08807.
  • Guo et al. (2014) Guo, X., Singh, S., Lee, H., Lewis, R. L., and Wang, X. (2014). Deep learning for real-time atari game play using offline monte-carlo tree search planning. In Advances in neural information processing systems, pages 3338–3346.
  • Haskins et al. (2019) Haskins, G., Kruecker, J., Kruger, U., Xu, S., Pinto, P. A., Wood, B. J., and Yan, P. (2019). Learning deep similarity metric for 3d mr-trus image registration. International Journal of Computer Assisted Radiology and Surgery, 14:417–425.
  • He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  • Heinrich et al. (2012) Heinrich, M. P., Jenkinson, M., Bhushan, M., Matin, T., Gleeson, F. V., Brady, M., and Schnabel, J. A. (2012). Mind: Modality independent neighbourhood descriptor for multi-modal deformable registration. Medical image analysis, 16(7):1423–1435.
  • Heinrich et al. (2013) Heinrich, M. P., Jenkinson, M., Papież, B. W., Brady, M., and Schnabel, J. A. (2013). Towards realtime multimodal fusion for image-guided interventions using self-similarities. In International conference on medical image computing and computer-assisted intervention, pages 187–194. Springer.
  • Hering et al. (2018) Hering, A., Kuckertz, S., Heldmann, S., and Heinrich, M. (2018). Enhancing label-driven deep deformable image registration with local distance metrics for state-of-the-art cardiac motion tracking. arXiv preprint arXiv:1812.01859.
  • Hill et al. (2001) Hill, D. L., Batchelor, P. G., Holden, M., and Hawkes, D. J. (2001). Medical image registration. Physics in medicine and biology, 46(3):R1–R45.
  • Hochreiter and Schmidhuber (1997) Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.
  • Hosny et al. (2018) Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., and Aerts, H. J. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, page 1.
  • Hu et al. (2018a) Hu, Y., Gibson, E., Ghavami, N., Bonmati, E., Moore, C. M., Emberton, M., Vercauteren, T., Noble, J. A., and Barratt, D. C. (2018a). Adversarial deformation regularization for training image registration neural networks. arXiv preprint arXiv:1805.10665.
  • Hu et al. (2018b) Hu, Y., Modat, M., Gibson, E., Ghavami, N., Bonmati, E., Moore, C. M., Emberton, M., Noble, J. A., Barratt, D. C., and Vercauteren, T. (2018b). Label-driven weakly-supervised learning for multimodal deformarle image registration. In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, pages 1070–1074. IEEE.
  • Hu et al. (2018c) Hu, Y., Modat, M., Gibson, E., Li, W., Ghavami, N., Bonmati, E., Wang, G., Bandula, S., Moore, C. M., Emberton, M., et al. (2018c). Weakly-supervised convolutional neural networks for multimodal image registration. Medical image analysis, 49:1–13.
  • Ikeda et al. (2014) Ikeda, K., Ino, F., and Hagihara, K. (2014). Efficient acceleration of mutual information computation for nonrigid registration using cuda. IEEE J. Biomedical and Health Informatics, 18(3):956–968.
  • Ioffe and Szegedy (2015) Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
  • Isola et al. (2017) Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

    Image-to-image translation with conditional adversarial networks.

    arXiv preprint.
  • Ito and Ino (2018) Ito, M. and Ino, F. (2018). An automated method for generating training sets for deep learning based image registration. In The 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: BIOIMAGING, pages 140–147. INSTICC, SciTePress.
  • Jaderberg et al. (2015) Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025.
  • Jia et al. (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678. ACM.
  • Jiang and Shackleford (2018) Jiang, P. and Shackleford, J. A. (2018). Cnn driven sparse multi-level b-spline image registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9281–9289.
  • Kaelbling et al. (1996) Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285.
  • Kazeminia et al. (2018) Kazeminia, S., Baur, C., Kuijper, A., van Ginneken, B., Navab, N., Albarqouni, S., and Mukhopadhyay, A. (2018). Gans for medical image analysis. arXiv preprint arXiv:1809.06222.
  • Klein et al. (2010) Klein, S., Staring, M., Murphy, K., Viergever, M. A., and Pluim, J. P. (2010). Elastix: a toolbox for intensity-based medical image registration. IEEE transactions on medical imaging, 29(1):196–205.
  • Kori et al. (2018) Kori, A., Kumari, K., and Krishnamurthi, G. (2018). Zero shot learning for multi-modal real time image registration.
  • Krebs et al. (2017) Krebs, J., Mansi, T., Delingette, H., Zhang, L., Ghesu, F. C., Miao, S., Maier, A. K., Ayache, N., Liao, R., and Kamen, A. (2017). Robust non-rigid registration through agent-based action learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 344–352. Springer.
  • Krebs et al. (2018a) Krebs, J., Mansi, T., Mailhé, B., Ayache, N., and Delingette, H. (2018a). Learning structured deformations using diffeomorphic registration. arXiv preprint arXiv:1804.07172.
  • Krebs et al. (2018b) Krebs, J., Mansi, T., Mailhé, B., Ayache, N., and Delingette, H. (2018b). Unsupervised probabilistic deformation modeling for robust diffeomorphic registration. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 101–109. Springer.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
  • Kuang and Schmah (2018) Kuang, D. and Schmah, T. (2018). Faim–a convnet method for unsupervised 3d medical image registration. arXiv preprint arXiv:1811.09243.
  • LeCun et al. (2015) LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436.
  • Lee et al. (2017) Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H., Kim, G. B., Seo, J. B., and Kim, N. (2017). Deep learning in medical imaging: general overview. Korean journal of radiology, 18(4):570–584.
  • Li and Fan (2017) Li, H. and Fan, Y. (2017). Non-rigid image registration using fully convolutional networks with deep self-supervision. arXiv preprint arXiv:1709.00799.
  • Li and Fan (2018) Li, H. and Fan, Y. (2018). Non-rigid image registration using self-supervised fully convolutional networks without training data. In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, pages 1075–1078. IEEE.
  • Liao et al. (2017) Liao, R., Miao, S., de Tournemire, P., Grbic, S., Kamen, A., Mansi, T., and Comaniciu, D. (2017). An artificial agent for robust image registration. In AAAI, pages 4168–4175.
  • Litjens et al. (2017) Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., van der Laak, J. A., Van Ginneken, B., and Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88.
  • Littman (2001) Littman, M. L. (2001). Value-function reinforcement learning in markov games. Cognitive Systems Research, 2(1):55–66.
  • Liu et al. (2011) Liu, C., Yuen, J., and Torralba, A. (2011). Sift flow: Dense correspondence across scenes and its applications. IEEE transactions on pattern analysis and machine intelligence, 33(5):978–994.
  • Liu et al. (2017) Liu, M.-Y., Breuel, T., and Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, pages 700–708.
  • Liu and Leung (2017) Liu, Q. and Leung, H. (2017). Tensor-based descriptor for image registration via unsupervised network. In Information Fusion (Fusion), 2017 20th International Conference on, pages 1–7. IEEE.
  • Long et al. (2015) Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440.
  • Lorenzi et al. (2013) Lorenzi, M., Ayache, N., Frisoni, G. B., Pennec, X., (ADNI, A. D. N. I., et al. (2013). Lcc-demons: a robust and accurate symmetric diffeomorphic registration algorithm. NeuroImage, 81:470–483.
  • Lovejoy (1991) Lovejoy, W. S. (1991). Computationally feasible bounds for partially observed markov decision processes. Operations research, 39(1):162–175.
  • Lv et al. (2018) Lv, J., Yang, M., Zhang, J., and Wang, X. (2018). Respiratory motion correction for free-breathing 3d abdominal mri using cnn-based image registration: a feasibility study. The British journal of radiology, 91(xxxx):20170788.
  • Ma et al. (2017) Ma, K., Wang, J., Singh, V., Tamersoy, B., Chang, Y.-J., Wimmer, A., and Chen, T. (2017). Multimodal image registration with deep context reinforcement learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 240–248. Springer.
  • Maes et al. (1997) Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., and Suetens, P. (1997). Multimodality image registration by maximization of mutual information. IEEE transactions on Medical Imaging, 16(2):187–198.
  • Mahapatra (2018) Mahapatra, D. (2018). Elastic registration of medical images with gans. arXiv preprint arXiv:1805.02369.
  • Mahapatra et al. (2018) Mahapatra, D., Ge, Z., Sedai, S., and Chakravorty, R. (2018). Joint registration and segmentation of xray images using generative adversarial networks. In International Workshop on Machine Learning in Medical Imaging, pages 73–80. Springer.
  • Matthew et al. (2018) Matthew, J., Hajnal, J. V., Rueckert, D., and Schnabel, J. A. (2018). Lstm spatial co-transformer networks for registration of 3d fetal us and mr brain images. In Data Driven Treatment Response Assessment and Preterm, Perinatal, and Paediatric Image Analysis, pages 149–159. Springer.
  • Miao et al. (2017) Miao, S., Piat, S., Fischer, P., Tuysuzoglu, A., Mewes, P., Mansi, T., and Liao, R. (2017). Dilated fcn for multi-agent 2d/3d medical image registration. arXiv preprint arXiv:1712.01651.
  • Miao et al. (2016a) Miao, S., Wang, Z. J., and Liao, R. (2016a). A cnn regression approach for real-time 2d/3d registration. IEEE transactions on medical imaging, 35(5):1352–1363.
  • Miao et al. (2016b) Miao, S., Wang, Z. J., Zheng, Y., and Liao, R. (2016b). Real-time 2d/3d registration via cnn regression. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pages 1430–1434. IEEE.
  • Mnih et al. (2015) Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529.
  • Myronenko and Song (2010) Myronenko, A. and Song, X. (2010). Intensity-based image registration by minimizing residual complexity. IEEE transactions on medical imaging, 29(11):1882–1891.
  • Nazib et al. (2018) Nazib, A., Fookes, C., and Perrin, D. (2018). A comparative analysis of registration tools: Traditional vs deep learning approach on high resolution tissue cleared data. arXiv preprint arXiv:1810.08315.
  • Neylon et al. (2017) Neylon, J., Min, Y., Low, D. A., and Santhanam, A. (2017). A neural network approach for fast, automated quantification of dir performance. Medical physics, 44(8):4126–4138.
  • Paszke et al. (2017) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017). Automatic differentiation in pytorch. In NIPS-W.
  • Poulin et al. (2018) Poulin, E., Boudam, K., Pinter, C., Kadoury, S., Lasso, A., Fichtinger, G., and Ménard, C. (2018). Validation of mri to trus registration for high-dose-rate prostate brachytherapy. Brachytherapy, 17(2):283–290.
  • Punithakumar et al. (2017) Punithakumar, K., Boulanger, P., and Noga, M. (2017). A gpu-accelerated deformable image registration algorithm with applications to right ventricular segmentation. IEEE Access, 5:20374–20382.
  • Puterman (2014) Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  • Ratliff et al. (2013) Ratliff, L. J., Burden, S. A., and Sastry, S. S. (2013). Characterization and computation of local nash equilibria in continuous games. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 917–924. IEEE.
  • Ren et al. (2015) Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99.
  • Rivenson et al. (2018) Rivenson, Y., Zhang, Y., Günaydın, H., Teng, D., and Ozcan, A. (2018). Phase recovery and holographic image reconstruction using deep learning in neural networks. Light: Science & Applications, 7(2):17141.
  • Rohé et al. (2017) Rohé, M.-M., Datar, M., Heimann, T., Sermesant, M., and Pennec, X. (2017). Svf-net: Learning deformable image registration using shape matching. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 266–274. Springer.
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer.
  • Rühaak et al. (2013) Rühaak, J., Heldmann, S., Kipshagen, T., and Fischer, B. (2013). Highly accurate fast lung ct registration. In Medical Imaging 2013: Image Processing, volume 8669, page 86690Y. International Society for Optics and Photonics.
  • Saalfeld et al. (2012) Saalfeld, S., Fetter, R., Cardona, A., and Tomancak, P. (2012). Elastic volume reconstruction from series of ultra-thin microscopy sections. Nature methods, 9(7):717.
  • Salehi et al. (2018) Salehi, S. S. M., Khan, S., Erdogmus, D., and Gholipour, A. (2018). Real-time deep registration with geodesic loss. arXiv preprint arXiv:1803.05982.
  • Schmidhuber (2015) Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61:85–117.
  • Sedghi et al. (2018) Sedghi, A., Luo, J., Mehrtash, A., Pieper, S., Tempany, C. M., Kapur, T., Mousavi, P., and Wells III, W. M. (2018). Semi-supervised deep metrics for image registration. arXiv preprint arXiv:1804.01565.
  • Sheikhjafari et al. (2018) Sheikhjafari, A., Noga, M., Punithakumar, K., and Ray, N. (2018). Unsupervised deformable image registration with fully connected generative neural network. In International conference on Medical Imaging with Deep Learning.
  • Shen (2007) Shen, D. (2007). Image registration by local histogram matching. Pattern Recognition, 40(4):1161–1172.
  • Shu et al. (2018) Shu, C., Chen, X., Xie, Q., and Han, H. (2018). An unsupervised network for fast microscopic image registration. In Medical Imaging 2018: Digital Pathology, volume 10581, page 105811D. International Society for Optics and Photonics.
  • Silver et al. (2016) Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484.
  • Silver et al. (2017) Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the game of go without human knowledge. Nature, 550(7676):354.
  • Simonovsky et al. (2016) Simonovsky, M., Gutiérrez-Becker, B., Mateus, D., Navab, N., and Komodakis, N. (2016). A deep metric for multimodal registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 10–18. Springer.
  • Simonyan and Zisserman (2014) Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  • Sloan et al. (2018) Sloan, J. M., Goatman, K. A., and Siebert, J. P. (2018). Learning rigid image registration - utilizing convolutional neural networks for medical image registration. In 11th International Joint Conference on Biomedical Engineering Systems and Technologies, pages 89–99. SCITEPRESS-Science and Technology Publications.
  • Sokooti et al. (2017) Sokooti, H., de Vos, B., Berendsen, F., Lelieveldt, B. P., Išgum, I., and Staring, M. (2017). Nonrigid image registration using multi-scale 3d convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 232–239. Springer.
  • Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
  • Stergios et al. (2018) Stergios, C., Mihir, S., Maria, V., Guillaume, C., Marie-Pierre, R., Stavroula, M., and Nikos, P. (2018). Linear and deformable image registration with 3d convolutional neural networks. In Image Analysis for Moving Organ, Breast, and Thoracic Images, pages 13–22. Springer.
  • Sun and Zhang (2018) Sun, L. and Zhang, S. (2018). Deformable mri-ultrasound registration using 3d convolutional neural network. In Simulation, Image Processing, and Ultrasound Systems for Assisted Diagnosis and Navigation, pages 152–158. Springer.
  • Sun et al. (2018) Sun, Y., Moelker, A., Niessen, W. J., and van Walsum, T. (2018). Towards robust ct-ultrasound registration using deep learning methods. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 43–51. Springer.
  • Szegedy et al. (2017) Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, volume 4, page 12.
  • Szegedy et al. (2014) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions, corr abs/1409.4842. URL http://arxiv. org/abs/1409.4842.
  • Takeuchi et al. (2008) Takeuchi, S., Kaneko, T., and Yamaguchi, K. (2008). Evaluation of monte carlo tree search and the application to go. In Computational Intelligence and Games, 2008. CIG’08. IEEE Symposium On, pages 191–198. IEEE.
  • Uzunova et al. (2017) Uzunova, H., Wilms, M., Handels, H., and Ehrhardt, J. (2017). Training cnns for image registration from few samples with model-based data augmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 223–231. Springer.
  • Van Hasselt et al. (2016) Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI, volume 2, page 5. Phoenix, AZ.
  • Vercauteren et al. (2009) Vercauteren, T., Pennec, X., Perchant, A., and Ayache, N. (2009). Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage, 45(1):S61–S72.
  • Vialard et al. (2012) Vialard, F.-X., Risser, L., Rueckert, D., and Cotter, C. J. (2012). Diffeomorphic 3d image registration via geodesic shooting using an efficient adjoint calculation. International Journal of Computer Vision, 97(2):229–241.
  • Viola and Wells III (1997) Viola, P. and Wells III, W. M. (1997). Alignment by maximization of mutual information. International journal of computer vision, 24(2):137–154.
  • Wang (2016) Wang, G. (2016). A perspective on deep imaging. arXiv preprint arXiv:1609.04375.
  • Wang et al. (2015) Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., and De Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581.
  • Wu et al. (2013) Wu, G., Kim, M., Wang, Q., Gao, Y., Liao, S., and Shen, D. (2013).

    Unsupervised deep feature learning for deformable registration of mr brain images.

    In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 649–656. Springer.
  • Wu et al. (2016) Wu, G., Kim, M., Wang, Q., Munsell, B. C., and Shen, D. (2016). Scalable high-performance image registration framework by unsupervised deep feature representations learning. IEEE Transactions on Biomedical Engineering, 63(7):1505–1516.
  • Yan et al. (2018) Yan, P., Xu, S., Rastinehad, A. R., and Wood, B. J. (2018). Adversarial image registration with application for mr and trus image fusion. arXiv preprint arXiv:1804.11024.
  • Yang et al. (2018) Yang, Q., Yan, P., Zhang, Y., Yu, H., Shi, Y., Mou, X., Kalra, M. K., Zhang, Y., Sun, L., and Wang, G. (2018). Low dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE transactions on medical imaging.
  • Yang (2017) Yang, X. (2017). Uncertainty Quantification, Image Synthesis and Deformation Prediction for Image Registration. PhD thesis, The University of North Carolina at Chapel Hill.
  • Yang et al. (2016) Yang, X., Kwitt, R., and Niethammer, M. (2016). Fast predictive image registration. In Deep Learning and Data Labeling for Medical Applications, pages 48–57. Springer.
  • Yao et al. (2018) Yao, R., Ochoa, M., Intes, X., and Yan, P. (2018). Deep compressive macroscopic fluorescence lifetime imaging. In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, pages 908–911. IEEE.
  • Yi et al. (2017) Yi, Z., Zhang, H., Tan, P., and Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. arXiv preprint.
  • Yoo et al. (2017) Yoo, I., Hildebrand, D. G., Tobin, W. F., Lee, W.-C. A., and Jeong, W.-K. (2017). ssemnet: Serial-section electron microscopy image registration using a spatial transformer network with learned features. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 249–257. Springer.
  • Zagoruyko and Komodakis (2015) Zagoruyko, S. and Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4353–4361.
  • Zhang (2018) Zhang, J. (2018). Inverse-consistent deep networks for unsupervised deformable image registration. arXiv preprint arXiv:1809.03443.
  • Zheng et al. (2018) Zheng, J., Miao, S., Wang, Z. J., and Liao, R. (2018). Pairwise domain adaptation module for cnn-based 2-d/3-d registration. Journal of Medical Imaging, 5(2):021204.
  • Zhu et al. (2018) Zhu, B., Liu, J. Z., Cauley, S. F., Rosen, B. R., and Rosen, M. S. (2018). Image reconstruction by domain-transform manifold learning. Nature, 555(7697):487.
  • Zhu et al. (2017) Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint.
  • Zitova and Flusser (2003) Zitova, B. and Flusser, J. (2003). Image registration methods: a survey. Image and vision computing, 21(11):977–1000.