We present HyperMorph, a learning-based strategy for deformable image registration that removes the need to tune important registration hyperparameters during training. Classical registration methods solve an optimization problem to find a set of spatial correspondences between two images, while learning-based methods leverage a training dataset to learn a function that generates these correspondences. The quality of the results for both types of techniques depends greatly on the choice of hyperparameters. Unfortunately, hyperparameter tuning is time-consuming and typically involves training many separate models with various hyperparameter values, potentially leading to suboptimal results. To address this inefficiency, we introduce amortized hyperparameter learning for image registration, a novel strategy to learn the effects of hyperparameters on deformation fields. The proposed framework learns a hypernetwork that takes in an input hyperparameter and modulates a registration network to produce the optimal deformation field for that hyperparameter value. In effect, this strategy trains a single, rich model that enables rapid, fine-grained discovery of hyperparameter values from a continuous interval at test-time. We demonstrate that this approach can be used to optimize multiple hyperparameters considerably faster than existing search strategies, leading to a reduced computational and human burden and increased flexibility. We also show that this has several important benefits, including increased robustness to initialization and the ability to rapidly identify optimal hyperparameter values specific to a registration task, dataset, or even a single anatomical region - all without retraining the HyperMorph model. Our code is publicly available at http://voxelmorph.mit.edu.READ FULL TEXT VIEW PDF
Recent deep learning-based methods have shown promising results and runt...
Classical deformable registration techniques achieve impressive results ...
Deep-learning-based registration methods emerged as a fast alternative t...
We introduce a learning-based strategy for multi-modal registration of i...
We introduce an improved version of Random Search (RS), used here for
Stress and driving are a dangerous combination which can lead to crashes...
Integrating ML models in software is of growing interest. Building accur...
Deformable image registration aims to find a set of dense correspondences that accurately align two images. Classical optimization-based techniques for image registration have been thoroughly studied, yielding mature mathematical frameworks and widely used software tools [2, 4, 8, 21, 50, 54]. Learning-based registration methods employ image datasets to learn a function that rapidly computes the deformation field between image pairs [7, 20, 48, 52, 56, 58, 59]
. These methods involve choosing registration hyperparameters that dramatically affect the quality of the estimated deformation field. Optimal hyperparameter values can differ substantially across image modality and anatomy, and even small changes can have a large impact on accuracy. Choosing appropriate hyperparameter values is therefore a crucial step in developing, evaluating, and deploying registration methods.
Tuning these hyperparameters most often involves grid or random search techniques to evaluate separate models for discrete hyperparameter values (Figure 1). In practice, researchers typically perform a sequential process of optimizing and validating models with a small subset of hyperparameter values, adapting this subset, and repeating. Optimal hyperparameter values are selected based on model performance, generally determined by human evaluation or additional validation data such as anatomical annotations. This approach requires considerable computational and human effort, which may lead to suboptimal parameter choices, misleading negative results, and impeded progress, especially when researchers might resort to using values from the literature that are not adequate for their specific dataset or registration task.
In this work, we introduce a substantially different approach, HyperMorph, to tackle registration hyperparameters: amortized hyperparameter learning for image registration. Our contributions are:
Method. We propose an end-to-end strategy to learn the effects of registration hyperparameters on deformation fields with a single, rich model, replacing the traditional hyperparameter tuning process (Figure 1). In effect, a HyperMorph model is a hypernetwork that approximates a landscape of registration networks for a range of hyperparameter values, by learning a continuous function of the hyperparameters. Users only need to learn a single HyperMorph model that enables rapid test-time image registration for any hyperparameter value. This eliminates the need to train a multitude of separate models each for a fixed hyperparameter, since HyperMorph accurately estimates their outputs at a fraction of the computational and human effort. In addition, HyperMorph enables rapid, accurate hyperparameter tuning for registration tasks involving many hyperparameters, in which computational complexity renders grid-search techniques ineffective.
Properties. By exploiting weight-sharing, a single HyperMorph model is efficient to train compared to training the many individual registration models it is able to encompass. We show that HyperMorph is also significantly more robust to initialization than standard registration models, indicating that it better avoids local minima and reducing the need to retrain models with different initializations.
Utility. HyperMorph enables rapid finding of optimal hyperparameter values at test-time, either through visual assessment or automatic optimization in the continuous hyperparameter space. We demonstrate the substantial utility of this approach by using a single HyperMorph model to identify the optimum hyperparameter values for different datasets, different anatomical regions, or different registration tasks. HyperMorph also offers more precise tuning compared to grid or sequential search.
Image Registration. Classical approaches independently estimate a deformation field by optimizing an energy function for each image pair. These include elastic models , b-spline based deformations , discrete optimization methods [16, 23], Demons , SPM , LDDMM [8, 13, 27, 31, 46, 60], DARTEL , and symmetric normalization (SyN) 
. Recent learning-based approaches make use of convolutional neural networks (CNNs) to learn a function that rapidly computes the deformation field for an image pair. Supervised models learn to reproduce deformation fields estimated or simulated by other methods[20, 37, 48, 52, 59]
, whereas unsupervised strategies train networks that optimize a loss function similar to classical cost functions and do not require the ground-truth registrations needed by supervised methods[7, 15, 28, 36, 56].
Generally, these methods rely on at least one hyperparameter that balances the optimization of an image-matching term with that of a regularization or smoothness term. Additional hyperparameters are often used in the loss terms, such as the neighborhood size of local normalized cross-correlation  or the number of bins in mutual information . Choosing optimal hyperparameter values for classical registration algorithms is a tedious process since pair-wise registration typically requires tens of minutes or more to compute. While learning-based methods enable much faster test-time registration, individual model training is expensive and can require days to converge, causing the hyperparameter search to consume hundreds of GPU-hours [7, 28, 56].
Hyperparameter Optimization. Hyperparameter optimization algorithms jointly solve a validation objective with respect to model hyperparameters and a training objective with respect to the model weights . The simplest approach treats model training as a black-box function, including grid, random, and sequential search . Bayesian optimization is a more sample-efficient strategy, leveraging a probabilistic model of the objective function to search and evaluate hyperparameter performance . Both approaches are most often inefficient, since the algorithms involve repeated optimizations for each hyperparameter evaluation. Enhancements to these strategies have improved performance by extrapolating learning curves before full convergence [19, 34] and evaluating low-fidelity approximations of the black-box function . Other adaptations use bandit-based approaches to selectively allocate resources to favorable models [30, 38]. Gradient-based techniques differentiate through the nested optimization to approximate gradients as a function of the hyperparameters [40, 42, 47]. These approaches are computationally costly and require evaluation of a metric on a comprehensive, labeled validation set, which may not be available for every registration task.
Hypernetworks. Hypernetworks are networks that output weights of a primary network [26, 35, 51]. Recently, hypernetworks have gained traction as efficient methods of gradient-based hyperparameter optimization since they enable easy differentiation through the entire model with respect to the hyperparameters of interest. For example, SMASH uses hypernetworks to output the weights of a network conditioned on its architecture . Similar work employs hypernetworks to optimize weight decay in classification networks and demonstrates that sufficiently sized hypernetworks are capable of approximating its global effect [39, 41]. HyperMorph extends hypernetworks, combining them with learning-based registration to estimate the effect of hyperparameter values on deformations.
Deformable image registration methods find a dense, non-linear correspondence field between a moving image and a fixed image
, and can employ a variety of hyperparameters. We follow current unsupervised learning-based registration methods and define a networkwith parameters that takes as input the image pair and outputs the optimal deformation field .
Our key idea is to model a hypernetwork that learns the effect of loss hyperparameters on the desired registration. Given loss hyperparameters of interest, we define the hypernetwork function with parameters that takes as input sample values for and outputs the parameters of the registration network (Figure 2). We learn optimal hypernetwork parameters using stochastic gradient methods, optimizing the loss
where is a dataset of images,
is a prior probability over the hyperparameters, andis a registration loss involving hyperparameters . For example, the distribution can be uniform over some predefined range, or it can be adapted based on prior expectations. At every mini-batch, we sample a set of hyperparameter values from this distribution and use these both as input to the network and in the loss function for that iteration.
Unsupervised Model Instantiations. Following unsupervised leaning-based registration, we use the loss function:
where represents warped by , . The loss term
measures image similarity and might involve hyperparameters, whereas quantifies the spatial regularity of the deformation field and might involve hyperparameters . The regularization hyperparameter balances the relative importance of the separate terms, and .
When registering images of the same modality, we use standard similarity metrics for : mean-squared error (MSE) and local normalized cross-correlation (NCC). NCC includes a hyperparameter defining the neighborhood size. For cross-modality registration, we use normalized mutual information (NMI), which involves a hyperparameter controlling the number of histogram bins .
We parameterize the deformation field with a stationary velocity field (SVF) and integrate it within the network to obtain a diffeomorphism, which is invertible by design [1, 2, 15]. We regularize using where is the displacement field of the deformation .
Semi-supervised Model Instantiation. Building on recent learning-based methods that use additional volume information during training [7, 28, 29], we also apply HyperMorph to the semi-supervised setting by modifying the loss function to incorporate existing training segmentation maps:
where is a segmentation similarity metric, usually the Dice coefficient , weighted by the hyperparameter , and and are the segmentation maps of the moving and fixed images, respectively.
Given a test image pair , a trained HyperMorph model can efficiently yield the deformation field as a function of important hyperparameters. If no external information is available, optimal hyperparameters may be rapidly tuned in an interactive fashion. However, landmarks or segmentation maps are sometimes available for validation subjects, enabling rapid automatic tuning.
Interactive. Sliders can be used to change hyperparameter values in near real-time until the user is visually satisfied with the registration of some image pair . In some cases, the user might choose different settings when studying specific regions of the image. For example, the optimal value of the hyperparameter (balancing the regularization and the image-matching term) can vary by anatomical structure in the brain (see Figure 7). This interactive tuning technique is possible because of the HyperMorph ability to efficiently yield the effect of values on the deformation .
Automatic. If segmentation maps are available for validation, a single trained HyperMorph model enables hyperparameter optimization using
where is a set of validation segmentation maps and , as before. We implement this optimization by freezing the learned hypernetwork parameters , treating the input as a parameter to be learned, and using stochastic gradient strategies to rapidly optimize (4).
The hypernetwork we use in the experiments consists of four fully connected layers, each with 64 units and ReLu activation except for the final layer, which uses Tanh activations. The proposed method applies to any registration network architecture, and we treat the hypernetwork and the registration network as a single, large network. The only trainable parameters
are those of the hypernetwork. We implement HyperMorph with the open-source VoxelMorph library, using a U-Net-like  registration architecture. The U-Net in this network consists of a 4-layer convolutional encoder (with 16, 32, 32, and 32 channels), a 4-layer convolutional decoder (with 32 channels for each layer), and 3 more convolutional layers (of 32, 16, and 16 channels). We use the ADAM optimizer  during training.
We demonstrate that a single HyperMorph model performs on par with and captures the behavior of a rich landscape of individual registration networks trained with separate hyperparameter values, while incurring substantially less computational cost and human effort. Next, we illustrate considerable improvements in robustness to initialization. We then demonstrate the powerful utility of HyperMorph for rapid hyperparameter optimization at validation — for different subpopulations of data, registration types, and individual anatomical structures. Finally, we analyze the effects of hypernetwork size and hyperparameter sampling. Our experiments span within-modality and cross-modality as well as within-subject and cross-subject tasks.
Datasets. We use two large sets of 3D brain magnetic resonance (MR) images. The first is a multi-site dataset of 30,495 T1-weighted (T1w) scans gathered across 8 public datasets: ABIDE , ADHD200 , ADNI , GSP , MCIC , PPMI , OASIS , and UK Biobank . We divide this dataset into train, validation, and test sets of sizes 10,000, 10,000, and 10,495, respectively. The second dataset involves a multi-modal collection of 1,558 T1w, T2-weighted (T2w), multi-flip-angle, and multi-inversion-time images gathered from in-house data and the public ADNI and HCP  datasets. We divide this dataset into train, validation, and test sets of sizes 528, 515, and 515, respectively. All MRI scans are conformed to a 256256256 1-mm isotropic grid space, bias-corrected, and skull-stripped using FreeSurfer , and we also produce automated segmentation maps for evaluation. We affinely normalize and uniformly crop all images to 160192224 volumes.
Evaluation. For evaluation, we use the volume overlap of anatomical label maps using the Dice metric .
Baseline Models. HyperMorph can be applied to any learning-based registration architecture, and we seek to validate its ability to capture the effects of hyperparameters on the inner registration network . To enable this insight, we train standard VoxelMorph models with architectures identical to as baselines, each with its fixed set of hyperparameters.
We aim to evaluate if a single HyperMorph is capable of encapsulating a landscape of baseline models.
Setup. We first assess how the accuracy and computational cost of a single HyperMorph model compare to standard grid hyperparameter search for the regularization weight . We separately train HyperMorph as well as VoxelMorph baselines using the similarity metrics MSE (scaled by a constant estimated image noise) and NCC (with window size ) for within-modality registration and NMI for cross-modality registration, for which we train 13, 13, and 11 baseline models, respectively. We validate the trained networks on 100 random image pairs for visualization. For hyperparameter optimization after training, we use a subset of 20 pairs.
Additionally, we assess the ability of HyperMorph to learn the effect of multiple hyperparameters simultaneously. We first train a HyperMorph model treating and the local NCC window size as hyperparameters. We also train a semi-supervised HyperMorph model based on a subset of six labels, and hold out six other labels for validation. In this experiment, the hyperparameters of interest are and the relative weight of the semi-supervised loss (3). Training baselines requires a two-dimensional grid search on 3D models and is computationally prohibitive. Consequently, we conduct these experiments in 2D on a mid-coronal slice, using baselines for 25 combinations of hyperparameter values.
|Robustness (SD across inits)||Runtime (total GPU-hours)|
|HyperMorph||1.97e-1||2.46-1||146.9 (32.0)||4.2 (0.6)|
|Baseline||5.50e-1||5.32e-1||765.3 (249.1)||44.0 (4.6)|
Comparison between HyperMorph and baseline grid search techniques for model variability across random initializations (left) and for runtimes (right), with standard deviations in parentheses.
Results. Computational Cost. A single HyperMorph model requires substantially less time to convergence than a baseline-model grid search. For single-hyperparameter tests, HyperMorph requires times fewer GPU-hours than a grid search with baseline models (Table 1). For models with two hyperparameters, the difference is even more striking, with HyperMorph requiring times fewer GPU-hours than the baseline models.
Performance. Figures 3 and 4 show that HyperMorph yields optimal hyperparameter values similar to those obtained from a dense grid of baseline models despite the significant computational advantage. An average difference in the optimal hyperparameter value of only across single-hyperparameter experiments results in a negligible maximum Dice difference of (on a scale of to ). Similarly, multi-hyperparameter experiments yield a maximum Dice difference of only . In practice, fewer baselines might be trained at first for a coarser hyperparameter search, resulting in either suboptimal hyperparameter choice or sequential search leading to significant manual overhead.
Overall, a single HyperMorph model is able to capture the behavior of a range of baseline models individually optimized for different hyperparameters, facilitating optimal hyperparameter choices and accuracy at a substantial reduction in computational cost. We emphasize that the goal of the experiment is not to compare HyperMorph to a particular registration tool, but to demonstrate the effect that this strategy can have on an existing registration network.
Setup. We evaluate the robustness of each strategy to network initialization. We repeat the previous, single-hyperparameter experiment with MSE and NMI, retraining four HyperMorph models and four sets of baselines each trained for five values of hyperparameter . For each training run, we re-initialize all kernel weights using Glorot uniform  with a different seed. We evaluate each model using 100 image pairs and compare the standard deviation (SD) across initializations of the HyperMorph and baseline networks.
Results. Figure 5 shows that HyperMorph is substantially more robust (lower SD) to initialization compared to the baselines, suggesting that HyperMorph is less likely to converge to local minima. Across the entire range of , the average Dice SD for HyperMorph models trained with MSE is times lower than for baseline SD, and for NMI-trained models, HyperMorph SD is times lower than baseline SD (Table 1). This result further emphasizes the computational efficiency provided by HyperMorph, since in typical hyperparameter searches, models are often trained multiple times for each hyperparameter value to negate potential bias from initialization variability.
Setup. Interactive Tuning. We demonstrate the utility of HyperMorph through an interactive tool that enables visual optimization of hyperparameters even if no segmentation data are available. The user can explore the effect of continuously varying hyperparameter values using a single trained model and choose an optimal deformation manually at high precision. Interactive tuning can be explored at http://voxelmorph.mit.edu.
Automatic Tuning. When anatomical annotations are available for validation, we demonstrate rapid, automatic optimization of the hyperparameter across a variety of applications. In each experiment, we identify the optimal regularization weight given 20 registration pairs and use 100 registration pairs for evaluation. First, we investigate how differs across subpopulations and anatomical regions. We train HyperMorph on a subset of image pairs across the entire T1w training set, and at validation we optimize separately for each of ABIDE, GSP, PPMI, and UK Biobank. With this same model, we identify separately for each of 10 anatomical regions. Second, we explore how differs between cross-sectional and longitudinal registration; for HyperMorph trained on both within-subject and cross-subject pairs from ADNI, we optimize separately for validation pairs within and across subjects.
Results. Figures 6 and 7 show that varies substantially across subpopulations, registration tasks, and anatomical regions. For example, PPMI and ABIDE require a significantly different value of than GSP and the UK Biobank. Importantly, with a suboptimal choice of hyperparameters, these datasets would have yielded considerably lower registration quality (Dice scores). The variability in the optimal hyperparameter values is likely caused by differences between the datasets; the average age of the ABIDE population is lower than those of other datasets, while the PPMI scans are of lower quality. Similarly, cross-subject and within-subject registration require different levels of regularization. Finally, Figure 7 illustrates that varies by anatomical region, suggesting that regularization weights should be chosen by users depending on their tasks downstream from the registration. On average, the automatic hyperparameter optimization takes just minutes using 20 validation pairs.
The vast majority of existing registration pipelines assume a single hyperparameter value to be optimal for an entire dataset, or even across multiple datasets. Our results highlight the importance of HyperMorph as a rapid, easy-to-use tool for finding optimal hyperparameters, interactively or automatically, for different subpopulations, tasks, or even individual anatomical regions, without the need to retrain models.
We evaluate the impact of different hypernetwork sizes and hyperparameter sampling methods on HyperMorph accuracy. We carry out these experiments in the context of 3D registration, using MSE as and evaluating models on 100 image pairs.
Setup. Hypernetwork Size. To evaluate the effect of hypernetwork capacity, we train four separate HyperMorph models, with 16, 32, 64, and 128 nodes at all hypernetwork layers, respectively, and validate model accuracy against the baselines results.
Hyperparameter Sampling. In our experiments, we observe that sampling regularization weights
from a uniform distribution during HyperMorph training results in accurate estimations of baseline models for most of the hyperparameter range, especially near, but less accurate estimations at the extreme values of or (values corresponding to similarity-only or regularization-only loss functions). To investigate if even these extreme values can be approximated, we over-sample end-point values of the hyperparameter at a fixed rate . To asses the influence of this rate on registration accuracy, we train and validate 3 separate HyperMorph models for different values of and compare the final accuracy against VoxelMorph baselines.
Results. Hypernetwork Size. Figure 8A shows that HyperMorph registration accuracy increases with hypernetwork size, up to a Dice point: a hypernetwork with 64 or more nodes per layer is sufficient for learning the effect of the regularization weight in 3D registration. Surprisingly, we find essentially no difference in total training or inference time across hypernetwork sizes. We use 64 nodes per hypernetwork layer in the previous experiments.
Hyperparameter Sampling. HyperMorph models trained for large values of closely match the expected registration accuracy at end-point values of but sacrifice registration accuracy across all values of (Figure 8B). For example, when training HyperMorph with (no over-sampling), the mean deviation from the baseline Dice is at , compared to at . However, with , the mean deviation from baseline Dice is at , compared to at . We emphasize that over-sampling is only necessary to estimate appropriate representations at extreme hyperparameter values, and, in most cases, uniform sampling will suffice. We use an intermediate value of in our previous experiments.
The accuracy of deformable image registration algorithms greatly depends upon the choice of hyperparameters. In this work, we present HyperMorph, a learning-based strategy that removes the need to repeatedly train models to quantify the effects of hyperparameters on model performance. HyperMorph employs a hypernetwork which takes the desired hyperparameter as input and predicts the parameters of a registration network tuned to that value. In contrast to existing learning-based methods, HyperMorph estimates optimal deformation fields for arbitrary image pairs and any hyperparameter value from a contiuous interval by exploiting weight-sharing across the landscape of registration networks. A single HyperMorph model then enables fast hyperparameter tuning at test-time, requiring dramatically less compute and human time compared to existing methods. This is a significant advantage over registration frameworks that are optimized across discrete, predefined hyperparameter values to find the optimal configuration.
We demonstrate that a single HyperMorph model facilitates discovery of continuous optimal hyperparameter values for different dataset subpopulations, registration tasks, or even individual anatomical regions. This last result indicates a potential benefit and future direction of estimating a spatially varying field of smoothness hyperparameters for simultaneously optimal registration of all anatomical structures. HyperMorph also provides the flexibility to identify the ideal hyperparameter for an individual image pair. For example, a pair of subjects with very different anatomies would benefit from weak regularization allowing warps of high non-linearity. We believe HyperMorph will drastically alleviate the burden of retraining networks with different hyperparameter values and thereby enable efficient development of finely optimized models for image registration.
Support for this research was provided in part by the BRAIN Initiative Cell Census Network grant U01MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41EB015896, 1R01EB023281, R01EB006758, R21EB018907, R01EB019956, P41EB030006), the National Institute on Aging (1R56AG064027, 1R01AG064027, 5R01AG008122, R01AG016495), the National Institute of Mental Health (R01 MH123195), the National Institute for Neurological Disorders and Stroke (R01NS0525851, R21NS072652, R01NS070963, R01NS083534, 5U01NS086625, 5U24NS10059103, R01NS105820), the NIH Blueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Connectome Project, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (K99HD101553), Shared Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043, and the Wistron Corporation. In addition, BF has a financial interest in CorticoMetrics, and his interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.
Bajcsy, R., Kovacic, S.: Multiresolution elastic matching. Computer Vision, Graphics, and Image Processing46, 1–21 (1989)
Cao, Y., Miller, M.I., Winslow, R.L., Younes, L.: Large deformation diffeomorphic metric mapping of vector fields. IEEE TMI24(9), 1216–1230 (2005)
Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., Paragios, N.: Dense image registration through mrfs and efficient linear programming. MedIA12(6), 731–741 (2008)
Wu, G., Kim, M., Wang, Q., Munsell, B.C., Shen, D.: Scalable high-performance image registration framework by unsupervised deep feature representations learning. IEEE Transactions on Biomedical Engineering63(7), 1505–1516 (2015)