Image registration has been widely used in medical image analysis, for instance, atlas-based image segmentation [2, 16], anatomical shape analysis [20, 33], and motion correction in dynamic imaging . The problem of deformable image registration is typically formulated as an optimization, seeking for a nonlinear and dense (voxelwise) spatial transformation between images. In many applications, it is desirable that such transformations be diffeomorphisms, i.e., differentiable, bijective mappings with differentiable inverses. In this paper, we focus on diffeomorphic image registration highlighting with a set of critical features: (i) it captures large deformations that often occur in brain shape variations, lung motions, or fetal movements; (ii) the topology of objects in the image remain intact; and (iii) no non-differentiable artifacts, such as creases or sharp corners, are introduced. However, achieving an optimal solution of diffeomorphic image registration, especially for large-scale or high-resolution images (e.g., a 3D brain MRI scan with the size of ), is computationally intensive and time-consuming.
Attempts at speeding up diffeomorphic image registration have been made in recent works by improving numerical approximation schemes. For example, Ashburner and Friston  employ a Gauss-Newton method to accelerate the convergence of large deformation diffeomorphic metric mapping (LDDMM) algorithm . Zhang et al. propose a low dimensional approximation of diffeomorphic transformations, resulting in fast computation of the gradients for iterative optimization [32, 27]. While these methods have led to substantial reductions in running time, such gradient-based optimization still takes minutes to finish. Instead of minimizing a complex registration energy function [8, 31]
, an alternative approach has leveraged deep learning techniques to improve registration speed by building prediction models of transformation parameters. Such algorithms typically adopt convolutional neural networks (CNNs) to learn a mapping between pairwise images and associated spatial transformations in training dataset[28, 21, 7, 10, 9]. Registration of new testing images is then achieved rapidly by evaluating the learned mapping on given volumes. While the aforementioned deep learning approaches are able to fast predict the deformation parameters in testing, the training process is extremely slow and memory intensive due to the high dimensionality of deformation parameters in imaging space. In addition, enforcing the smoothness constraints of transformations when large deformation occurs is challenging in neural networks.
To address this issue, we propose a novel learning-based registration framework DeepFLASH in a low dimensional bandlimited space, where the diffeomorphic transformations are fully characterized with much fewer dimensions. Our work is inspired by a recent registration algorithm FLASH (Fourier-approximated Lie Algebras for Shooting) 
with the novelty of (i) developing a learning-based predictive model that further speeds up the current registration algorithms; (ii) defining a set of complex-valued operations (e.g., complex convolution, complex-valued rectifier, etc.) and complex-valued loss function of transformations entirely in a bandlimited space; and (iii) proving that our model can be easily implemented by a dual-network in the space of real-valued functions with a careful design of the network architecture. To the best of our knowledge, we are the first to introduce the low dimensional Fourier representations of diffeomorphic transformations to learning-based registration algorithms. In contrast to traditional methods that learn spatial transformations in a high dimensional imaging space, our method dramatically reduces the computational complexity of the training process where iterative computation of gradient terms are required. This greatly alleviates the problem of time-consuming and expensive training for deep learning based registration networks. Another major benefit of DeepFLASH is that the smoothness of diffeomorphic transformations is naturally preserved in the bandlimited space with low frequency components. Note that while we implement DeepFLASH in the context of convolutional neural network (CNN), it can be easily adapted to a variety of other neural networks, such as fully connected network (FCN), or recurrent neural network (RNN). We demonstrate the effectiveness of our model in both 2D synthetic and 3D real brain MRI data.
2 Background: Deformable Image Registration
In this section, we briefly review the fundamentals of deformable image registration in the setting of LDDMM algorithm with geodesic shooting [26, 29]. While this paper focuses on LDDMM, the theoretical tool developed in our model is widely applicable to various registration frameworks, for example, stationary velocity fields.
Consider a source image and a target image as square-integrable functions defined on a torus domain (). The problem of diffeomorphic image registration is to find the shortest path (also known as geodesic) of diffeomorphic transformations , such that the deformed image at time point is similar to . An explicit energy function of LDDMM with geodesic shooting is formulated as an image matching term plus a regularization term that guarantees the smoothness of the transformation fields
where is a positive weight parameter, denotes a Jacobian matrix, and
is a distance function that measures the similarity between images. Commonly used distance metrics include sum-of-squared difference (L2-norm) of image intensities, normalized cross correlation (NCC) , and mutual information (MI) . The deformation is defined as an integral flow of the time-varying Eulerian velocity field that lies in the tangent space of diffeomorphisms . Here
is a symmetric, positive-definite differential operator that maps a tangent vectorinto the dual space , with its inverse . The denotes a paring of a momentum vector with a tangent vector .
The geodesic at the minimum of (2) is uniquely determined by integrating the geodesic constraint, a.k.a. Euler-Poincaré differential equation (EPDiff) [1, 19], which is computationally expensive in high dimensional image spaces. A recent model FLASH demonstrated that the entire optimization of LDDMM with geodesic shooting can be efficiently carried in a low dimensional bandlimited space with dramatic speed improvement . This is based on the fact that the velocity fields do not develop high frequencies in the Fourier domain (as shown in Fig. 1). We briefly review the basic concepts below.
Let and denote the space of Fourier representations of diffeomorphisms and velocity fields respectively. Given time-dependent velocity field , the diffeomorphism in the finite-dimensional Fourier domain can be computed as
where is the frequency of an identity element,
is a tensor product, representing the Fourier frequencies of a Jacobian matrix with central difference approximation, and
is a circular convolution with zero padding to avoid aliasing111To prevent the domain from growing infinity, we truncate the output of the convolution in each dimension to a suitable finite set..
The Fourier representation of the geodesic constraint (EPDiff) is defined as
where is the truncated matrix-vector field auto-correlation. The operator is the discrete divergence of a vector field. Here is a smoothing operator with its inverse
, which is the Fourier transform of a commonly used Laplacian operator, with a positive weight parameter and a smoothness parameter . The Fourier coefficients of is, i.e., , where is a d-dimensional frequency vector.
3 Our Method: DeepFLASH
We introduce a learning-based registration network DeepFLASH in a low dimensional bandlimited space , with newly defined operators and functions in a complex vector space . Since the spatial transformations can be uniquely determined by an initial velocity (as introduced in Eq. (3)), we naturally integrate this new parameterization in the architecture of DeepFLASH. To simplify the notation, we drop the time index of in remaining sections.
Analogous to , we use optimal registration results, denoted as
, estimated by numerical optimization of the LDDMM algorithm as part of the training data. Our goal is then to predict an initial velocityfrom image patches of the moving and target images. Before introducing DeepFLASH, we first define a set of complex-valued operations and functions that provide key components of the neural architecture.
Consider a -dimensional complex-valued vector of input signal and a real-valued kernel , we have a complex-valued convolution as
where denotes the real part of a complex-valued vector, and denotes an imaginary part.
Following a recent work on complex networks in the output layer, i.e.,
3.1 Loss function
Let a labeled training dataset including pairwise images and their associated optimal initial velocity fields be , where is the number of pairwise images. We model a prediction function by using a convolutional neural network (CNN), with being a real-valued weight matrix for convolutional layers. We then define a loss function as
where is a positive parameter balancing between function and a regularity term on the weight matrix . While we use norm as regularity in this paper, it is generic to other commonly used operators such as , or norm.
The optimization problem of Eq. (6) is typically solved by using gradient-based methods. The weight matrix is updated by moving in the direction of the loss function’s steepest descent, found by its gradient .
3.2 Network Architecture: Decoupled Complex-valued Network
While we are ready to design a complex-valued registration network based on CNN, the implementation of such network is not straightforward. Trabelsi et al. developed deep complex networks to specifically handle complex-valued inputs at the cost of computational efficiency . In this section, we present an efficient way to decouple the proposed complex-valued registration network into a combination of regular real-valued networks. More specifically, we construct a dual-net that separates the real and imaginary part of complex-valued network in an equivalent manner.
Given the fact that the computation of real and imaginary parts in the convolution (Eq. (4)) and the activation function (Eq. (5)) are separable, we next show that the loss defined in Eq. (6) can be equivalently constructed by the real and imaginary part individually. To simplify the notation, we define the predicted initial velocity and rewrite Eq. (6) as
Let and in Eq. (7), we then obtain
Fig. 2 visualizes the architecture of our proposed model DeepFLASH. The input source and target images are defined in the Fourier space with the real part of frequencies as and vs. the imaginary parts as and . We train real and imaginary parts in two individual neural networks and . The optimization stays in a low dimensional bandlimited space without converting the coefficients (, ) to high dimensional imaging domain back and forth. It is worthy to mention that our decoupled network is not constrained by any specific architecture. The CNN network in Fig. 2 can be easily replaced by a variety of the state-of-the-art models, e.g., U-net , or fully connected neural network.
3.3 Computational complexity
It has been previously shown that the time complexity of convolutional layers  is , where is the index of a convolutional layer and denotes the number of convolutional layers. The , are defined as the number of input channels, output channels and the kernel size of the -th layer respectively. Here is the output dimension of the -th layer, i.e., when the last layer predicts transformation fields for 3D images with the dimension of .
Current learning approaches for image registration have been performed in the original high dimensional image space [7, 28]. In contrast, our proposed model DeepFLASH significantly reduces the dimension of into a low dimensional bandlimited space (where ). This makes the training of traditional registration networks much more efficient in terms of both time and memory consumption.
4 Experimental Evaluation
We demonstrate the effectiveness of our model by training and testing on both 2D synthetic data and 3D real brain MRI scans.
2D synthetic data.
3D brain MRI.
We include public T1-weighted 3D brain MRI scans from Alzheimer’s Disease Neuroimaging Initiative (ADNI) , Open Access Series of Imaging Studies (OASIS) , Autism Brain Imaging Data Exchange (ABIDE) , and LONI Probabilistic Brain Atlas Individual Subject Data (LPBA40)  with subjects. Due to the difficulty of preserving the diffeomorphic property across individual subjects particularly with large age variations, we carefully evaluate images from subjects aged from 60 to 90. All MRIs were all pre-processed as , isotropic voxels, and underwent skull-stripped, intensity normalized, bias field corrected and pre-aligned with affine transformation.
To further validate our model accuracy through segmentation labels, we use manually delineated anatomical structures in LPBA40 dataset. We randomly select pairs of MR images with segmentation labels from LPBA40. We then carefully down-sampled all images and labels from the dimension of to .
To validate the registration performance of DeepFLASH on both 2D and 3D data, we run a recent state-of-the-art image registration algorithm FLASH [30, 32] on 2D synthetic data and pairs of 3D MR images to optimal solutions, which are used as ground truth. We then randomly choose 500 pairs of 2D data and 1000 pairs of 3D MRIs from the rest of the data as testing cases. We set registration parameter for the operator , the number of time steps for Euler integration in geodesic shooting as . We adopt the band-limited dimension of the initial velocity field as , which has been shown to produce comparable registration accuracy . We set the batch size as with learning rate for network training, then run epochs for 2D synthetic training and epochs for 3D brain training.
Next, we compare our 2D prediction with registration performed in full-spatial domain. For 3D testing, we compare our method with three baseline algorithms, including FLASH  (a fast optimization-based image registration method), Voxelmorph  (an unsupervised registration in image space) and Quicksilver  (a supervised method that predicts transformations in a momentum space). We also compare the registration time of DeepFLASH with traditional optimization-based methods, such as vector momenta LDDMM (VM-LDDMM) , and symmetric diffeomorphic image registration with cross-correlation (SyN) from ANTs . For fair comparison, we train all baseline algorithms on the same dataset and report their best performance from published source codes (e.g., https://github.com/rkwitt/quicksilver, https://github.com/balakg/voxelmorph).
To better validate the transformations generated by DeepFLASH, we perform registration-based segmentation and examine the resulting segmentation accuracy over eight brain structures, including Putamen (Puta), Cerebellum (Cer), Caudate (Caud), Hippocampus (Hipp), Insular cortex (Cor), Cuneus (Cune), brain stem (Stem) and superior frontal gyrus (Gyrus). We evaluate a volume-overlapping similarity measurement, also known as SrensenDice coefficient, between the propagated segmentation and the manual segmentation .
We demonstrate the efficiency of our model by comparing quantitative time and GPU memory consumption across all methods. All optimal solutions for training data are generated on an i7, 9700K CPU with 32 GB internal memory. The training and prediction procedure of all learning-based methods are performed on Nvidia GTX 1080Ti GPUs.
Fig.3 visualizes the deformed images, transformation fields, and the determinant of Jacobian (DetJac) of transformations estimated by DeepFLASH and a registration method that performed in full-spatial image domain. Note that the value of DetJac indicates how volume changes, for instance, there is no volume change when , while volume shrinks when and expands when . The value of DetJac smaller than zero indicates an artifact or singularity in the transformation field, i.e., a failure to preserve the diffeomorphic property when the effect of folding occurs. Both methods show similar patterns of volume changes over transformation fields. Our predicted results are fairly close to the estimates from registration algorithms in full spatial domain.
Fig.4 visualizes the deformed images and the determinant of Jacobian with pairwise registration on 3D brain MRIs for all methods. It demonstrates that our method DeepFLASH is able to produce comparable registration results with little to no loss of the accuracy. In addition, our method gracefully guarantees the smoothness of transformation fields without artifacts and singularities.
The left panel of Fig.5 displays an example of the comparison between the manually labeled segmentation and propagated segmentation deformed by transformation field estimated from our algorithm. It clearly shows that our generated segmentations align fairly well with the manual delineations. The right panel of Fig.5 compares quantitative results of the average dice for all methods, with the observations that our method slightly outperforms the baseline algorithms.
reports the statistics of dice scores (mean and variance) of all methods over eight brain structures fromregistration pairs. Our method DeepFLASH produces comparable dice scores with significantly fast training of the neural networks.
Table.1 reports the time consumption of optimization-based registration methods, as well as the training and testing time of learning-based registration algorithms. Our method can predict the transformation fields between images approximately times faster than optimization-based registration methods. In addition, DeepFLASH outperforms learning-based registration approaches in testing, while with significantly reduced computational time in training.
|Methods||Training Time (GPU hours)||
Fig.7 provides the average training and testing GPU memory usage across all methods. It has been shown that our proposed method dramatically lowers the GPU footprint compared to other learning-based methods in both training and testing.
In this paper, we present a novel diffeomorphic image registration network with efficient training process and inference. In contrast to traditional learning-based registration methods that are defined in the high dimensional imaging space, our model is fully developed in a low dimensional bandlimited space with the diffeomorphic property of transformation fields well preserved. Based on the fact that current networks are majorly designed in real-valued spaces, we are the first to develop a decoupled-network to solve the complex-valued optimization with the support of rigorous math foundations. Our model substantially lowers the computational complexity of the neural network and significantly reduces the time consumption on both training and testing, while preserving a comparable registration accuracy. The theoretical tools developed in our work is flexible/generic to a wide variety of the state-of-the-art networks, e.g., FCN, or RNN. To the best of our knowledge, we are the first to characterize the diffeomorphic deformations in Fourier space via network learning. This work also paves a way for further speeding up unsupervised learning for registration models.
-  Vladimir Arnold. Sur la géométrie différentielle des groupes de lie de dimension infinie et ses applications à l’hydrodynamique des fluides parfaits. In Annales de l’institut Fourier, volume 16, pages 319–361, 1966.
-  John Ashburner and Karl J Friston. Unified segmentation. Neuroimage, 26(3):839–851, 2005.
-  John Ashburner and Karl J Friston. Diffeomorphic registration using geodesic shooting and gauss–newton optimisation. NeuroImage, 55(3):954–967, 2011.
-  Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26–41, 2008.
-  Brian B Avants, Nicholas J Tustison, Gang Song, Philip A Cook, Arno Klein, and James C Gee. A reproducible evaluation of ants similarity metric performance in brain image registration. Neuroimage, 54(3):2033–2044, 2011.
-  Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. An unsupervised learning model for deformable medical image registration. In , pages 9252–9260, 2018.
-  Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. Voxelmorph: a learning framework for deformable medical image registration. IEEE transactions on medical imaging, 2019.
-  Mirza Faisal Beg, Michael I Miller, Alain Trouvé, and Laurent Younes. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International journal of computer vision, 61(2):139–157, 2005.
-  Tian Cao, Nikhil Singh, Vladimir Jojic, and Marc Niethammer. Semi-coupled dictionary learning for deformation prediction. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pages 691–694. IEEE, 2015.
-  Chen-Rui Chou, Brandon Frederick, Gig Mageras, Sha Chang, and Stephen Pizer. 2d/3d image registration using regression learning. Computer Vision and Image Understanding, 117(9):1095–1106, 2013.
-  Adriana Di Martino, Chao-Gan Yan, Qingyang Li, Erin Denio, Francisco X Castellanos, Kaat Alaerts, Jeffrey S Anderson, Michal Assaf, Susan Y Bookheimer, Mirella Dapretto, et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry, 19(6):659, 2014.
-  Lee R Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945.
-  Anthony F Fotenos, AZ Snyder, LE Girton, JC Morris, and RL Buckner. Normative estimates of cross-sectional and longitudinal brain volume decline in aging and ad. Neurology, 64(6):1032–1039, 2005.
-  Kaiming He and Jian Sun. Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5353–5360, 2015.
-  Clifford R Jack Jr, Matt A Bernstein, Nick C Fox, Paul Thompson, Gene Alexander, Danielle Harvey, Bret Borowski, Paula J Britson, Jennifer L. Whitwell, Chadwick Ward, et al. The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4):685–691, 2008.
-  Sarang Joshi, Brad Davis, Matthieu Jomier, and Guido Gerig. Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage, 23, Supplement1:151–160, 2004.
-  Michael Leventon, William M Wells III, and Eric Grimson. Multiple view 2d-3d mutual information registration. In Image Understanding Workshop, volume 20, page 21. Citeseer, 1997.
-  Ruizhi Liao, Esra A Turk, Miaomiao Zhang, Jie Luo, P Ellen Grant, Elfar Adalsteinsson, and Polina Golland. Temporal registration in in-utero volumetric mri time series. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 54–62. Springer, 2016.
-  Micheal I Miller, Alain Trouvé, and L. Younes. Geodesic shooting for computational anatomy. Journal of Mathematical Imaging and Vision, 24(2):209–228, 2006.
-  Marc Niethammer, Yang Huang, and François X Vialard. Geodesic regression for image time-series. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 655–662. Springer, 2011.
-  Marc-Michel Rohé, Manasi Datar, Tobias Heimann, Maxime Sermesant, and Xavier Pennec. Svf-net: Learning deformable image registration using shape matching. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 266–274. Springer, 2017.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  David W Shattuck, Mubeena Mirza, Vitria Adisetiyo, Cornelius Hojatkashani, Georges Salamon, Katherine L Narr, Russell A Poldrack, Robert M Bilder, and Arthur W Toga. Construction of a 3d probabilistic atlas of human cortical structures. Neuroimage, 39(3):1064–1080, 2008.
-  Nikhil Singh, Jacob Hinkle, Sarang Joshi, and P Thomas Fletcher. A vector momenta formulation of diffeomorphisms for improved geodesic regression and atlas construction. In 2013 IEEE 10th International Symposium on Biomedical Imaging, pages 1219–1222. IEEE, 2013.
-  Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. Deep complex networks. arXiv preprint arXiv:1705.09792, 2017.
-  François X Vialard, Laurent Risser, Daniel Rueckert, and Colin J Cotter. Diffeomorphic 3d image registration via geodesic shooting using an efficient adjoint calculation. International Journal of Computer Vision, 97(2):229–241, 2012.
-  Jian Wang, Wei Xing, Robert M Kirby, and Miaomiao Zhang. Data-driven model order reduction for diffeomorphic image registration. In International Conference on Information Processing in Medical Imaging, pages 694–705. Springer, 2019.
-  Xiao Yang, Roland Kwitt, Martin Styner, and Marc Niethammer. Quicksilver: Fast predictive image registration–a deep learning approach. NeuroImage, 158:378–396, 2017.
-  Laurent Younes, Felipe Arrate, and Michael I Miller. Evolutions equations in computational anatomy. NeuroImage, 45(1):S40–S50, 2009.
-  Miaomiao Zhang and P Thomas Fletcher. Finite-dimensional lie algebras for fast diffeomorphic image registration. In International Conference on Information Processing in Medical Imaging, pages 249–260. Springer, 2015.
-  Miaomiao Zhang and P Thomas Fletcher. Fast diffeomorphic image registration via fourier-approximated lie algebras. International Journal of Computer Vision, 127(1):61–73, 2019.
-  Miaomiao Zhang, Ruizhi Liao, Adrian V Dalca, Esra A Turk, Jie Luo, P Ellen Grant, and Polina Golland. Frequency diffeomorphisms for efficient image registration. In International Conference on Information Processing in Medical Imaging, pages 559–570. Springer, 2017.
-  Miaomiao Zhang, William M Wells, and Polina Golland. Low-dimensional statistics of anatomical variability via compact representation of image deformations. In International conference on medical image computing and computer-assisted intervention, pages 166–173. Springer, 2016.