Leveraging Unsupervised Image Registration for Discovery of Landmark Shape Descriptor

by   Riddhish Bhalodia, et al.

In current biological and medical research, statistical shape modeling (SSM) provides an essential framework for the characterization of anatomy/morphology. Such analysis is often driven by the identification of a relatively small number of geometrically consistent features found across the samples of a population. These features can subsequently provide information about the population shape variation. Dense correspondence models can provide ease of computation and yield an interpretable low-dimensional shape descriptor when followed by dimensionality reduction. However, automatic methods for obtaining such correspondences usually require image segmentation followed by significant preprocessing, which is taxing in terms of both computation as well as human resources. In many cases, the segmentation and subsequent processing require manual guidance and anatomy specific domain expertise. This paper proposes a self-supervised deep learning approach for discovering landmarks from images that can directly be used as a shape descriptor for subsequent analysis. We use landmark-driven image registration as the primary task to force the neural network to discover landmarks that register the images well. We also propose a regularization term that allows for robust optimization of the neural network and ensures that the landmarks uniformly span the image domain. The proposed method circumvents segmentation and preprocessing and directly produces a usable shape descriptor using just 2D or 3D images. In addition, we also propose two variants on the training loss function that allows for prior shape information to be integrated into the model. We apply this framework on several 2D and 3D datasets to obtain their shape descriptors, and analyze their utility for various applications.



There are no comments yet.


page 5

page 8

page 12

page 14

page 16

page 18

page 19

page 22


Self-Supervised Discovery of Anatomical Shape Landmarks

Statistical shape analysis is a very useful tool in a wide range of medi...

DeepSSM: A Deep Learning Framework for Statistical Shape Modeling from Raw Images

Statistical shape modeling is an important tool to characterize variatio...

Learning Deep Features for Shape Correspondence with Domain Invariance

Correspondence-based shape models are key to various medical imaging app...

Deep Learning for End-to-End Atrial Fibrillation Recurrence Estimation

Left atrium shape has been shown to be an independent predictor of recur...

DeepSSM: A Blueprint for Image-to-Shape Deep Learning Models

Statistical shape modeling (SSM) characterizes anatomical variations in ...

Using compatible shape descriptor for lexicon reduction of printed Farsi subwords

This Paper presents a method for lexicon reduction of Printed Farsi subw...

Uncertain-DeepSSM: From Images to Probabilistic Shape Models

Statistical shape modeling (SSM) has recently taken advantage of advance...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical shape modeling (SSM) is an indispensable tool for the analysis of anatomy and biological structures. Such models can be viewed as a composite of two distinct steps: shape representation and shape analysis. Shape representation is a quantifiable description of the shape/structure of sample from a population of anatomies (usually given as a cohort of images or surface meshes) that is consistent with the population statistics and is easy to use for subsequent analysis. There are two prominent families of algorithms for shape representation, (i) landmarks, which express shapes as point clouds that define an explicit correspondence map from one shape to another using invariant points across populations that vary in their form, and (ii) deformation fields, which rely on transformations between images to encode implicit shape information. Shape analysis then uses these shape representations to analyze the population’s statistics; in most cases, the representation is projected onto a low-dimensional space via principal component analysis (PCA). This low-dimensional representation is used as a

shape descriptor for subsequent shape analysis. Outside of analysis of different modes of shape variations captured by this descriptor, it can also be subsequently utilized in different applications. For instance, the shape descriptor can serve as features to perform classification of different morphological classes [27], can quantify the severity of a particular deformity [5], or employed to interpret and discover shape characteristics that are associated with a particular disease [12]. We consider such downstream applications that are dependent on how well the shape descriptors characterize the given shape to showcase the efficacy of the shape descriptor.

Due to their simplicity and computational efficiency, correspondence-based models are the most prominently used models for shape representation. Correspondences is a term used to describe landmarks on the anatomy that are geometrically consistent across the samples of the population. In the earliest works, [43] correspondence was achieved by handpicked landmarks corresponding to distinguishable features. The field has come a long way with many state-of-the-art correspondence discovery algorithms [41, 14]. However, many of these algorithms require segmentation of the anatomy from images as well as heavy pre-processing. Such segmentation and or pre-processing often come with a significant computational overhead as well as cost human resources. Segmentation of some anatomies is prone to subjective decisions and hence requires domain expertise. These problems fail to make the automated correspondence discovery model fully end-to-end, i.e., an automated pipeline that for inference just inputs images to produce shape descriptors for analysis.

In recent years, deep learning and neural networks models have had a significant impact on both image registration and shape analysis. With their ability to learn complex functions, several methods [6, 36]

have proposed learning correspondence from images, bypassing the need for segmentation and preprocessing. However, these methods are supervised and are data-hungry, they require considerable training data with correspondences, which is not always possible in clinical applications. They also need anatomy segmentation and preprocessing for the training set that might not be readily available. Deep networks have also played an essential role in developing computationally fast and unsupervised learning-based algorithms for image registration (e.g.,

[3]) that perform equivalently to the state-of-the-art, optimization-based registration methods. However, transformations are not as friendly as correspondences for shape analysis; they often require the development of a fixed atlas [30]. The systems that process image-to-image transformations express shape information in a high-dimensional space. Typically for shape analysis, a low-dimensional space is preferred, and therefore, these representations are projected onto a low-dimensional space via PCA (or some equivalent for nonlinear spaces), and the modes of shape variation need to be analyzed by domain experts to check for their usability in downstream applications.

To address the above-stated challenges, we propose an end-to-end system for extracting a shape descriptor from only a population of input images. Ideally, this shape descriptor would not require any post-processing for subsequent analysis. This paper proposes a self-supervised deep learning approach for landmark discovery that uses image registration as the primary task. The proposed method alleviates the need for segmentation and heavy preprocessing (even during model training) to obtain a landmark-based shape descriptor. The discovered landmarks are relatively low in number; hence, they can be directly used for shape analysis and bypass the post-processing required to convert the representation into a low-dimensional space. The work presented here is an extension of the preliminary work presented in [8]. This work significantly extends and improves on the previous paper in the following ways:

  • Additional experiments, results, and analysis on several different datasets with associated downstream applications for shape descriptors.

  • We propose two different model variants that can incorporate prior information about shape into the model during training and can implicitly enforce the landmarks to encode such information.

  • We propose an additional image matching loss function that preserves the local structure and allows for cross-modality registration or usage of datasets with a lot of intensity variations.

2 Related Work

Since the groundbreaking work of D’Arcy Thompson [43] who utilized manually placed landmarks to study variations in shapes of fishes, statistical shape modeling (SSM) has become an indispensable tool for medical researchers and biologists. SSM finds applications in various fields such as cardiology [21], neurology [22], growth modeling [17], orthopaedics [26], and instrument design [23]. Shape representation for SSM can be achieved via explicit representation of points on surfaces [18, 40], direct usage of surface meshes or distance transforms [35] or their features [11], or, implicitly via functional maps [37] or deformation fields [4].

Correspondence-based models, or particle distribution models (PDMs) [24] place a dense set of particles onto the shapes’ surfaces. Automatic PDM algorithms rely on non-linear optimization that reduces the complexity of the generative model [14, 18]. In most cases, PCA is used to project the high dimensional shape space to a low dimensional shape descriptor [43, 5]. Since these algorithms require heavy pre-processing/segmentation, deep learning has been used to learn correspondences directly from a population of 2D/3D images [6, 36]. These methods being supervised still require pre-processing overhead during training and also need large datasets/data-augmentation methods to learn effectively. Both these requirements are not available in many cases, especially with medical data.

Also relevant is the work on deformable registration of images that have been used as a tool for shape representation [4] and atlas building [30]. The implicit deformation fields are hard to interpret. Therefore, having a fixed atlas to which each image in the population will be deformed helps make the fields standardized. Deep learning-based unsupervised registration (e.g., [3]) has attracted a lot of attention in recent years. This unsupervised registration framework have been extended in modeling diffeomorphic registration [16], constructing image atlas [15] and leveraging empirical information about the shape population [7].

Another relevant body of works stems from computer vision literature that uses image alignment to obtain dense feature maps

[19, 39], in a similar vein to the widely popular scale-invariant feature transform (SIFT) [34]

features. Other works focus on utilizing convolutional neural networks (CNNs) to learn surface features and use them to obtain correspondence points

[10] or as shape features for subsequent correspondence optimization [1]. All these works concentrate on discovering the surface/shape features using CNNs, whereas our work proposes an unsupervised approach for landmark discovery.

3 Methods

This section covers the necessary background for statistical shape modeling and image registration, the proposed model architecture and training, loss functions and optimization, and generalized model variants.

3.1 Shape Analysis and Image Registration

Statistical shape modeling (SSM) can be broadly categorized into two parts (i) shape representation and (ii) shape analysis. Shape representation entails using the raw data (can be in the form of images, meshes, label maps, etc.) and expressing it in a usable, quantifiable form for subsequent shape analysis. Shape analysis then finds relevant statistics from the shape representation pertinent to the downstream application. To reiterate, in this paper, we refer to downstream applications as the applications that utilize the shape descriptors to perform a task on given data, for example, using shape descriptors as features for shape classification. In this paper, we restrict our shape representation to be in the form of point correspondences, which are a geometrically consistent set of 2D/3D points of the population of shapes. Hence, each shape from a population of shapes can be expressed via , where is the number of landmarks/correspondences per shape and is the space dimension ( for 2D and 3D shapes, respectively). We can use these ’s for shape analysis, which usually involves performing PCA and using the low-dimensional representation for analysis of shape modes of variation.

Landmarks also play an important role in image registration. A common underlying assumption in image registration is that a well-registered image will also match the landmarks placed at anatomically relevant features. Furthermore, landmarks can be used to perform image registration; for instance, radial basis functions (RBF) can be used to parametrize the image deformation based on landmarks (used as control points) provided on the source and the target images. Mathematically, if

represents the image transformation, we can model the deformation as follows:


Here, the represents the RBF function used, represents the coordinates of the image, and, represents the control points. If we are given the control points on source and target images, we can solve the linear system of equations to find

and can apply the transformation to the entire image coordinate grid. The transformed coordinates is interpolated to obtain the warped image from the source image.

Figure 1: Network Architecture

3.2 Model Description

Here, we propose a model to obtain anatomically relevant landmarks directly from images. To achieve this, we rely on the assumption that a good image registration between a pair of images should indicate good anatomical feature correspondence. This assumption perfectly ties in with self-supervised learning, where we use image registration as a primary task and, in turn, obtain key landmark locations on the input images.

The resulting architecture is broken into three components, broadly described as follows:

  1. Landmark Encoder: This is a CNN network that operates on an image and outputs a vector of landmark points. Since we work with the source () and target () images, landmark encoder is used twice, but share weights similar to a Siamese architecture [33].

  2. RBF Linear Solver: We use the landmarks to construct a linear matrix and its associated output vector . For clarification, we note here that the landmarks discovered by the landmark encoder function as control points for the RBF-based registration. For the transformation parameters , we have . The RBF linear solver module consists of formulating this system of equations and solving them to find the transformation parameters. The matrix construction and linear solver are described in 0.A.

  3. Spatial Warp Module: We use the transformation parameters and interpolate the source image and obtain the registered image . This can be easily performed using a spatial transform unit [28].

A detailed description of the architecture with layer description is given in 0.B. Throughout this work, we perform all of our experiments using thin-plate splines (TPS) as the kernel basis function, i.e., . The network architecture is described in Figure 1.

3.3 Loss Function and Regularization

The training loss function of the proposed network can be given as follows:


The first term is the image matching or the registration loss between the target image and the registered (i.e., warped source) image. The second term is the regularization of the registration system applied on the matrix from the RBF linear solve module. We shall describe both these terms in detail in the following paragraphs.

3.3.1 Image Matching Loss

In classical literature for image registration, there is a research emphasis on interpretation as well as effect of utilizing different loss functions in context of image matching optimization [42]. In the context of deep learning, the methods are restricted to loss functions that allow for back-propagation. The two most commonly used image-matching functions used in deep leaning based image registration are the 2 and the normalized cross-correlation (NCC) loss. These are given as follows:


Here, represents an image patch on image centered at voxel location with patch size of . denotes the set of all patches/possible patch centers. The function represents the normalized cross-correlation between two patches.

Both the 2 loss and NCC work on the pixel intensities and work well when the data population has a consistent intensity profile across the population. CT scans are a good example of images with a consistent intensity profile. However, in datasets such as cardiac MRI, the intensity histograms of individual MRI scans are highly variable, and these intensity-driven loss functions will fail to capture structural matching. These losses will also fail when the input data comes from two different sources or modalities, such as two different scanners/centers or a dataset containing both T1 weighted MRI and T2 weighted MRI images. In such scenarios, pixel intensities are not the correct measure to quantify image matching, and we need losses that can capture structural correlation. Therefore, we also use the modality independent neighborhood descriptor (MIND) features to formulate a registration loss; several other recent registration works have used MIND features as loss [45]. MIND features rely on image patches; for a given image at a pixel/voxel location , its image patch is denoted as , with being the patch size. The MIND feature for an image at a pixel/voxel is given as:


Here, is the displacement vector and

is the local variance of an image patch. The match loss function using these MIND features is given as:


is the set of voxel locations and is the set of displacements used. We generally use a set of displacement vectors describing a local neighborhood (such as 4-neighbor for 2D images or 6-neighbor for 3D). The loss function is parametrized by (i) the patch size , and (ii) the distance value, i.e. . In all of our experiments using MIND loss both on 2D and 3D, we use an isotropic patch of size 3 and the displacement is kept as .

3.3.2 Regularization

The linear system required to solve the RBF warp parameters requires that matrix is a non-singular matrix. However, the positions of the landmarks coming from the landmark encoder are arbitrary. Hence, during the optimization, the matrix can be poorly conditioned or even singular. A singular matrix has an infinite condition number, and a poorly conditioned matrix has a large condition number. Such a scenario can result in infinite number of solutions to the linear system and can cause optimization to break. To ensure stable optimization, we introduce a regularization term that minimizes the condition number of the matrix , given as:


denotes the Frobenius norm of the matrix; this allows us to easily differentiate the regularizer. Minimization of the condition number of provides an additional benefit. We note that a poorly conditioned occurs when one or more pair of landmarks (used for the construction of ) are very close together. Hence, making the matrix well-conditioned will force the landmarks to be spread out more throughout the image, allowing us to span larger regions. This effect is showcased in Figure 2, demonstrated on diatoms, which is a simple example. One can note that the landmarks spread on the diatom boundary are achieved even with no regularization. However, we see a lot of landmarks being close together that can cause the optimization to break. This is indeed the case as the optimization without regularization is observed to break (generate NaNs) because of the singularity of the kernel matrix . With points close together, the overall shape descriptor is not informative. The regularized version spreads landmarks more uniformly over the image domain. The landmarks that are far from the boundary can be removed by redundancy removal described in the next section. The effect of regularization on the registration loss is described in the results section 4.1.

Figure 2: Effect of regularization (right), where the top image is un-regularized and the bottom one is regularized. Effect of redundancy removal (left), where top figure is before redundancy removal and bottom figure is after.

3.4 Training Procedure

The entire model is trained jointly with Adam [32], and network parameters are initialized randomly in showcased experiments. However, one can imagine a starting initialization for the output layer of the landmark encoder. One such initialization could be by using the mean landmarks from a pre-compute PDM on the population. The choice of hyper-parameter

controls the amount of regularization and can be chosen via cross-validation. For a given set of images, the training is performed on a set of image pairs. For smaller datasets, we use all possible image pairs, and for larger datasets, we can either randomly choose image pairs or employ a sampling heuristic. As the model is trained on pairs of images, even small datasets (as small as  50 images) can be effectively used for training the model. Such a low-resource learning is imperative for medical images where data is scarce and cannot be effectively used to train neural networks.

In this work, we treat the number of landmarks to be predicted as a hyper-parameter. As the landmark locations can be arbitrary, there could be redundant landmarks in characterizing the structure of interest. To remove such landmarks, we use a simple heuristic, based on the assumption that removing such redundant landmarks will not affect the registration. Using this heuristic, we compute the change in mean registration accuracy by removing one landmark at a time (this is performed on an already trained model as post-processing). This difference is the importance value attached to each landmark, as more significant the difference more is the importance of the landmark in performing accurate registration. We remove these landmarks in a greedy fashion, that is, we follow these steps.

  1. For remaining set of landmarks compute the registration loss. Initially the set of remaining landmarks is same as the starting set of landmarks.

  2. Temporarily remove one landmark at a time and compute the registration loss and its difference from one computed in step 1. Do this for all remaining landmarks.

  3. Select the landmark with least difference (least importance) and remove it.

  4. Repeat steps 1-3 till either desired number of landmarks are reached or a difference threshold is reached.

This removal allows for a smaller and more informative landmark-based shape descriptor, and the effect is shown in Figure 2.

Regularization and redundancy removal: The regularization term tends to spread the particles evenly across the image and is applied as a soft constraint with the image matching loss. The regularization acts in conjunction with the registration loss, i.e., if a feature in an image exhibits a higher registration loss, the particles will be distributed to match that feature better. In cases where the anatomy of interest is localized with lower registration loss in that region would cause the redundancy removal to disregard the spread-out particles. In such scenarios, the registration loss must be spatially-weighted to introduce a preference to the localized anatomy; this model variant is introduced in the following section. Furthermore, the redundancy removal needs to be applied carefully with quality control to remove particles from the region of interest. We can also modify the redundancy removal process to only look at selective regions in the image while computing the registration accuracy.

Note on Training and Inference Time: We trained the network (2D architecture described in Appendix 0.B) with 30 2D landmarks on a dataset of 100 toy images of

dimensions (that is 10000 pairs – actual training size). We utilize a single 12GB NVIDIA TITAN V GPU. The training time for one epoch with batch size of 20 is 6.2 minutes, and the inference time for a single scan is 0.4 seconds. Another important aspect is the GPU memory requirement, which for this experiment is 4600MB utilizing Pytorch deep learning framework.

3.5 Model Variants

Weak Supervision Variant: The proposed model can be easily modified for a weakly supervised learning framework to introduce a prior that informs the landmark positioning via changes to its loss function. In many cases, medical data can be presented along with some form of shape description. The most common of these forms is via segmentation of the anatomy of interest. However, in a typical scenario, only a limited number of data have segmentation associated with the image. In such cases, segmentation can be used to improve the landmark positioning by the following changes to the model:

  • We use the learned transformation parameters to transform a source segmentation image () to the registered segmentation ().

  • Introduce the matching function between the registered segmentation to the target segmentation () and optimize the model with the updated loss. This loss function will only be activated when both the source and target segmentations are present, providing weak supervision for the landmark (shape descriptor) discovery task.

The loss function can thus be expressed as:


Here, is an indicator variable that is 1 when both and exists and zero otherwise.

There are two other aspects to note here: (i) the input to the landmark encoder are still images, and therefore during testing, we do not need an additional segmentation input, and (ii) instead of binary segmentation, any other forms of shape information that can be deformed can be used as well, such as signed distance transforms or correspondences.

Localized Variant: In some instances, the anatomical area of interest is localized, and we would like the landmarks to be expressive of that particular location. For instance, in CT scans of the Femur, the diagnosis and characterization cam-type femoroacetabular impingement [38] is done by analyzing the localized region below the femoral head and how its difference from a representative femur shape or a healthy patient. Such localized characterization requires the shape descriptor to contain sufficient information about this localized anatomy of interest. To achieve this, we again assume that image registration enforces landmark location, i.e., for the landmarks to be more expressive of the localized region, we want to achieve the best image registration in that region. Therefore, we propose a simple modification to the loss function.


Here, represents a mask representing the location of the localized region of interest. In a special case, these masks are fixed (i.e., the same) for the given dataset if the anatomy of interest occupies a common space across the image population, i.e., images are roughly aligned.

4 Results

This section shows the results of the proposed methods on different 2D/3D datasets and is divided into subsections corresponding to each dataset. We also demonstrate the usefulness of the landmark-based shape descriptor obtained in each case paired together with a downstream application. This section also includes an analysis of regularization, redundancy removal, and the application of different proposed framework variants. In most cases, the number of epochs are chosen via early stopping based on the best validation loss.

4.1 Diatom Dataset

Diatoms are a biological group of algae found in different water bodies and fossilized deposits. Diatoms are unicellular and are categorized into different classes based on their shape/structure. Any characterization of diatoms is based on their size and shape; therefore, it is a perfect dataset for applying the proposed method and finding landmark-based shape descriptors. The diatoms dataset 111Downloaded from the DIADIST project page: rbg-web2.rbge.org.uk/DIADIST/ contains 2D isolated cellular images from four different diatom morphologies, namely, Eunotia (68 samples), Fragilariforma (100 samples), Gomphonema (100 samples), and Stauroneis (72 samples). This dataset is collected as part of the automatic diatom identification and classification project [20]. The data is split into 80%, 10%, 10% for training, validation, and testing datasets.

We train the proposed network with 2 loss for image matching, with a regularization parameter of (found using cross-validation as described ahead), and with 16 landmarks. We also keep four pre-determined landmarks on the corners while computing the warp; these are not learned via the network. We train the network (using 2D image architecture as given in 0.B) for 20 epochs on all possible image pairs, with no additional data augmentation. As a post-processing step, we perform the redundancy removal as described in Section 3.4 to retain 11 landmarks. Results shown in Figure 3 highlight the structural correspondence between different diatoms classes. We can notice that some of the landmarks are not precisely on the border of the shape. Such positioning of landmarks arises from the fully unsupervised training of the model with respect to the landmarks. The network has no prior on how and where to place the landmarks, and hence, the placement of the landmarks is the result of having the best possible registration loss. For instance, the landmark number 9 in Figure 3 is dictated via the registration error that placing it inside the image border in that reasonably similar intensity level will reduce the image matching loss.

Downstream Application:

These landmarks are easy to use as shape-based features of diatoms. We perform spectral clustering

[44] using these landmarks as features, and it performs well to separate the classes into clusters except for fragilariforma and eunotia

that exhibit very similar shapes when their scales match. For classification, a multi-layer perceptron (with a single hidden layer) can distinguish between these four classes using these landmarks as inputs with 100% test accuracy.

Figure 3: Diatoms results The left images show landmarks (after redundancy removal) on test images from four different classes and are in correspondence. The top right plot shows the effect of the regularization parameter on the registration accuracy. The two scatter plots on the bottom right shows the results of performing unsupervised clustering using landmarks as features compared to ground truth labels.

Regularization parameter: Using this simple dataset, we also want to showcase the selection of regularization parameter via cross-validation. We perform three-fold cross-validation with different lambdas and compute the average registration loss at every fold. The plot for this experiment is shown on the top-right of Figure 3. It highlights that there is an optimal that minimizes the registration loss and is a notable result. Since the regularization does not act on network parameters and limit the space of solutions, there is no guarantee that the generalization in registration performance should improve. The regularization term is only introduced for optimization stability, ensuring uniform particle spread tasks that it performs well. However, empirically, we see an optimal that achieves the best generalization, showing that the proposed regularization term also improves the generalization of the network.

4.2 Metopic Craniosynostosis Dataset

Metopic Craniosynostosis [29] is a morphological condition of the skull/cranium that affects infants. It occurs due to premature fusion of the metopic suture in the skull, and the subsequent brain growth causes deformed head shapes. The morphological symptoms include a triangular-shaped forehead (trigonocephaly) [31] and compensatory expansion of the back of the head. In severe cases, along with abnormal morphology, patients are affected by the increased intracranial pressure causing several neurological complications. In current practice, the severity of metopic craniosynostosis is gauged subjectively by surgeons, affecting the subsequent treatment protocol. The usual treatment entails a risky surgical procedure for severe cases or continued observation for milder ones. In recent research, the skull shape of metopic patients and its deviation from normal has been used for devising an objective severity measure [31, 6]. These methods use CT scans that underwent segmentation and/or are processed for shape representation; these steps involve manual and computational overhead.

We use the proposed method directly on the CT scans and aim to obtain a shape descriptor that can be subsequently used for severity quantification of metopic craniosynostosis. Our dataset comprises cranial CT scans of infants between 5-15 months of age, these scans were acquired at UPMC Children’s Hospital of Pittsburgh between 2002-2016. Out of which 27 are scans of patients diagnosed with metopic craniosynostosis, this diagnosis was performed by a board-certified craniofacial plastic surgeon utilizing the CT images as well as a physical examination. The remaining 93 patients were trauma patients that underwent CT scans; however, they demonstrated no morphological abnormalities, and these scans form our control set. All the CT scans were acquired as a part of routine clinical care protocol and under an IRB-approved registry. We collect this data with HIPAA protocols and de-identification of the scans. In the following analysis, we refer to the CT scans from the control set as controls and the CT scans that are diagnosed with metopic craniosynostosis as metopic. We train the network only on the control set with 80%, 10%, 10% data split for training, validation, and testing, respectively. We use 2 as our image matching loss function, because the dataset has minimal intensity variation across different samples, and the primary structure of the cranium is well-defined. We use 100 3D landmarks (8 constant on corners), and a regularization parameter of (discovered via cross-validation). We perform redundant landmark removal by selecting the best 50 landmarks as post-processing. In addition to the test set, we also evaluate the model on the CT scans diagnosed with metopic craniosynostosis, these CT scans are not observed by the model during training as it is only trained on the control set. Figure 4 showcases qualitative results where we see that the registration performance between a single source image (metopic) and two targets, one metopic other from the control set.

Figure 4: Metopic craniosynostosis results where we show a single source image (metopic) with two different target images (A:metopic, B:control). The top row showcases the landmark positions shown in 3D space with an overlay of segmentation mask and mix-axial,mid-sagittal, and mid-coronal slices. The bottom row showcases the registration via overlayed segmentation masks.

Downstream Application:

Using the landmarks from the proposed method as a shape descriptor, we can compute the Mahalanobis distance/Z-scores of each scan with respect to a population of control scans. If we consider

be the shape descriptors of N scans from the control set, whose mean and covariance is given as and respectively, the Mahalanobis distance/Z-score for a data with shape descriptor is given by:


This Z-score represents the deviation of a scan from the control population (this set does not have any symptoms of cranial deformity). Hence, a metopic skull would have a larger Z-score than a control scan. Figure 5 (left) shows that this is indeed the case. We also want to compare the efficacy of the obtained landmarks as shape descriptors with respect to dense correspondences from a points distribution model(PDM). For this, we utilize ShapeWorks [13] package, a state-of-the-art method for automatic correspondence discovery. As a PDM method, ShapeWorks has been used in the context of metopic severity prediction [5]. ShapeWorks produces a set of 2048 correspondences on surfaces of cranial CT scans using their segmented images, which is represented as a low-dimensional PCA loading vector to be used as a shape descriptor. We compare the efficacy of both the shape descriptors in characterizing the severity of metopic craniosynostosis. For this, we compute the Mahalanobis Distance (or Z-score) of each shape (landmark shape descriptor) with respect to control data distribution. The Pearson’s correlation between two scores is 0.81, and its scatter plot (after normalizing each set of Z-scores) is given in Figure 5. The Z-scores are normalized by dividing with the maximum Z-score from the set, this allows them to lie between 0 and 1 and provide better visualization in the scatter plot, this normalization does not affect the correlation score. Such a significant correlation showcases that both methods capture similar shape information required to characterize the severity of the condition. Additionally, we compare the Z-scores from the proposed method with aggregate severity scores from 16 craniosynostosis experts’ ratings. Each rating uses a Likert scale between 0-5, with 5 being the most severe, and only 27 metopic scans are rated. The Z-scores and the aggregate ratings show a positive correlation with Pearson’s coefficient of 0.64 (see Figure 5). In comparison, Pearson’s correlation between the Mahalanobis distance from ShapeWorks and the expert ratings is only 0.28.

Figure 5: Metopic severity analysis the plot on the left shows the histogram of Z-scores using the landmarks as a shape descriptor. The figure in the middle shows the correlation with Z-score from the state-of-the-art correspondence model. The figure on the right shows the correlation between the Z-scores of the metopic scans and the aggregate rating of these scans given by experts.

4.3 Cam-type Femoroacetabular Impingement (cam-FAI)

Figure 6: Cam-FAI The left figure shows location of cam-FAI and the right figure showcases the location of mask onto a median femur anatomy denoting location of interest to gauge shape descriptor of pathology.

Femoroarticular impingement (FAI) [2] is an orthopedic disorder that affects movement in the hip joint. Cam-FAI is a primary cause of hip osteoarthritis and is characterized by an abnormal bone growth of the the femoral head. Cam-FAI affects a localized region as shown in Figure 6. The deviation of a femur anatomy diagnosed with Cam-FAI to representative femur anatomy from a healthy patient population is of interest. This deviation can inform operative decisions and subsequent treatment planning. With the localized variant of the proposed method (see Section 3.4), we aim to discover the shape descriptor that captures the localized structure of the cam-FAI pathology. In this study, we use a dataset of 59 CT scans of the femur bone, out of which 50 scans are of patients without any diagnosed morphological defect in their femurs, we call this the control set. Additionally, we also have another 9 CT scans of the femur bone that are from patients diagnosed with Cam-FAI deformity. All data was originally collected for research purposes, and specifically for the evaluation of hip bio-mechanics [26, 25] at Orthopaedics Research Laboratory, School of Medicine, University of Utah. All participants provided informed consent prior to participation in this University of Utah IRB-approved study. The data contains femur scans of both left and right femur, and all the right femur have been reflected from the mid-saggital plane to have consistent orientation across the dataset. We use the median CT scan of the control set to define a common/fixed mask image for the image matching loss (Eq 9). This is defined by selecting a bounding box around the anatomy of interest and then blurring it using a convolution with a Gaussian filter ( Figure 6). We apply the method with (found by cross-validation) and train the model for 20 epochs on the joint set (controls and cam-FAI diagnosed scans) dataset with 80 landmarks, and perform redundancy removal by selecting the top 40 landmarks. Figure 7 showcases the landmarks on a single source-target image pair and their corresponding registration output. We notice that the registration error is low in the region of interest, as stressed by the closeup section of the registration.

Figure 7: Femur results where we show landmarks discovery and registration on a single source image (a Cam-FAI diagnosed subject) with and target image (normal). The bottom right shows zoomed-in version of the registration error near the location of interest.

Downstream Application: We want to characterize how well is mean pathology showcased. In this experiment, we will use the segmentation masks of femur anatomy for each image available to us. We want to discover the mean anatomy of the femurs from the control set and mean of the femurs diagnosed with Cam-FAI, and we follow these steps for a given image set:

  1. We compute the mean landmarks (using the landmarks discovered by the proposed model) on a set of CT scans.

  2. We use these points to compute a mean image. We discover landmarks on each image, and then we register each segmentation mask to the mean landmarks, giving us a single approximation of the mean image.

  3. We perform such approximation for all the images in a population, and taking an average of all these approximations will give us the mean segmentation image.

We follow the above steps for the control set, obtaining a representative shape for healthy femur anatomy (normal mean shape), and for the set of cam-FAI scans obtaining a representative femur shape with Cam-FAI (Cam-FAI mean shape). We compare the normal mean shape with Cam-FAI mean shape using the surface-to-surface distance from the Cam-FAI mean to normal; this distance is projected on the mesh of the normal mean shape in Figure 8. The negative values showcase the regions where average Cam-FAI pathology is outward, whereas positive values showcase it is inwards from the average normal. We see that visualized Cam-FAI pathology (outward regions) is similar to the clinically plausible location for the Cam-FAI-affected region on femur bone. We believe this behavior will be even more pronounced with more pathological scans in training. This experiment and the subsequent results are promising as the Cam-FAI deformity is very subtle and difficult to capture. It would require full segmentation and dense correspondences to capture such a subtle variation [2]. Furthermore, we also compute the Mahalanobis distance using the landmarks as shape descriptors and normal CT forming the base distribution to verify whether CT scans with Cam-FAI diagnosis deviate from the population of controls. We clarify that the number of scans is less than the number of landmarks; we need to compute PCA using landmarks with dimension as the number of scans that captures 100% of shape information. This is necessary; otherwise, the covariance is a singular matrix, and Mahalanobis distance will produce non-interpretable values. From the histogram shown in Figure 8, we see that the scans with Cam-FAI diagnosis significantly deviate from the control set, indicating that the shape descriptor is capturing the localized pathology well.

Figure 8: CAM-FAI comparison with normal population: (Left) shows the surface-to-surface difference between mean segmentation of normal mean shape to the mean segmentation of Cam-FAI mean shape, this is projected onto the mesh from normal mean shape. (Right) showcases the logarithm of Mahalanobis distance of each scan from the normal population using the landmarks as features.

4.4 Cardiac LGE Dataset

This dataset consists of 3D late gadolinium enhancement (LGE) images of left-atrium (LA) for 207 patients. All these scans are from patients diagnosed with irregular heartbeats or atrial fibrillation (AF). These scans are acquired after the first ablation, a procedure used for the treatment of AF. Cardiac MR imaging was performed on AF patients presenting at the University of Utah Hospital’s Electrophysiology Clinic. Image sequences include a respiratory and ECG-gated MRA, acquired during continuous gadolinium contrast agent injection (0.1 mmol/kg, Multihance [Bracco Diagnostic Inc.]), followed by a 15-minute post-contrast LGE sequence. Images were acquired on either a 1.5 T or 3 T clinical MR scanner (Siemens Medical Solutions) using phased-array receiver coils. LGE-MRI scans were acquired approximately 15 minutes after the contrast agent injection and were acquired at the end-diastole phase of the cardiac cycle. The scanning protocol utilized a 3D inversion recovery, respiration navigated, ECG-gated, gradient echo pulse sequence. Typical image acquisition parameters include the following: free-breathing using navigator gating, a transverse imaging volume with voxel size = 1.25 × 1.25 × 2.5 mm (reconstructed to 0.625 × 0.625 × 1.25 mm), and inversion time = 270–320 ms. Inversion times for the LGE-MRI scan were identified using a TI scout scan. Other parameters for the 1.5 T scanner included a repetition time of 5.4 ms, echo time of 2.3 ms, and a flip angle of 20°. Scans performed on the 3 T scanner were done using a repetition time of 3.1 ms, echo time of 1.4 ms, and a flip angle of 14°. ECG gating was used to acquire a small subset of phase encoding views during the diastolic phase of the LA cardiac cycle.

It is a very challenging dataset due to two reasons, (i) these LGE images are low resolution and noisy, and very varied in terms of intensity profiles (see the image pair in Figure 9), and (ii) the LA shape is not very distinguishable intensity-wise from the neighboring anatomies. We employ our model on this dataset to test its limits. The goal of this experiment is not to achieve accurate registration but rather to find usable shape descriptors. The downstream task for this application is the prediction of atrial fibrillation recurrence from LA shape, which is expressed via the proposed landmark shape descriptor. Due to the challenging nature of the data, we use the distance transforms via the weak supervision variant of the framework, and we use all the distance transforms (207 in number). We apply MIND loss for both the image matching terms on LGE and distance transforms, with a and 100 landmarks. Additionally, we perform a center of mass alignment on the images (using the corresponding segmentations) followed by cropping. This step is necessary to highlight the LA shape and isolate it from the neighboring anatomies. We train the network for 15 epochs and perform redundancy removal by retaining 50 landmarks per image. A single source image pair with landmarks are shown in Figure 9. The registration performance is not as good as other datasets described due to the low-quality images and high-variability nature of LGE intensities and LA shapes.

Figure 9: Results on LGE Images showcasing landmarks on a source image pair, and the registration of their segmentation. We also show mid axial slices of source and image to showcase the poor quality of the data.

Downstream Application: The LA scans are acquired post-ablation, however, even after ablation, AF can occur again, this is known as AF recurrence. Therefore, we also monitor patients post-ablation for a recurrence of AF. The shape of LA and the left-atrium appendage is shown to be successful in predicting AF recurrence [9] . For recurrence prediction, we again use a simple multi-layer perceptron with three hidden layers, which yields a test accuracy of . In comparison, we also use ShapeWorks to place a dense set of 2048 particles on these left atrium shapes perform PCA on it to reduce to a 20-dimensional shape descriptor. Using this with the same MLP architecture for atrial fibrillation recurrence classification, we get a test accuracy of . This showcases that the discovered landmark shape descriptor captures almost the same amount of information captured by a dense correspondence model for this particular downstream task. A better initialization of landmarks that introduces a degree of supervision and domain expertise in the model will improve the shape descriptor. However, such experiments take away from the complete unsupervised and domain-independent framework of the model.

5 Conclusions

This paper proposes an end-to-end framework of obtaining usable shape descriptor directly from a set of 2D/3D images. The proposed model is a self-supervised network that works under the assumption that anatomically consistent landmarks will register a pair of images well under a particular class of transformations. The model consists of a landmark encoder, an RBF solver, and a spatial transformer during training. For testing, we only use the landmark encoder to obtain a set of landmarks on a given image. The methodology has been initially proposed in our previous work [8]. This paper provides detailed explanations and significantly extends it by introducing different image matching loss functions, two variants of loss functions that incorporate prior shape information, and extensive experimentation on several different 2D and 3D datasets. We find that the landmark shape descriptor obtained via the proposed model can be used directly for shape analysis and subsequent downstream tasks such as disease classification and severity quantification.

6 Acknowledgements

The National Institutes of Health supported this work under grant numbers NIBIB-U24EB029011, NIAMS-R01AR076120, NHLBI-R01HL135568, NIBIB-R01EB016701, NIBIB-R21EB026061, and NIGMS-P41GM103545. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We would also like to thank Dr Jesse Goldstein, Dr Andrew Anderson, Dr Nassir Marrouche and Dr Penny Atkins for making their data available to be used in this manuscript.

Appendix 0.A RBF Solver for Image Registration

If we consider the 2D image case now with specifying a coordinate on the image grid. Let and be transformations acting on each coordinate, therefore the entire transformation is given as . The individual transformations follow the RBF equation (Eq. 1), that results in:


Here, and are the unknown parameters for x and y coordinates respectively. Now given a set of M control points on source and target image ( and )we have . These gives us 2M linear equations, we need six more equation to perfectly solve for equations. We arrive at this via the constraint that the RBF part of the transformation should have no linear or constant term, i.e. , and, . We have three similar equation for y coordinate. There we have a system of equation given as:


Where, and , and


Now for solving the transformation parameters we just solve the following linear equation with , where and .

Appendix 0.B Architectural Details

The architecture of landmark encoder used for all 2D experiments is given by bottom Figure 10

, it is comprised of four blocks, first two blocks having two convolution layers and the last two having four. Each block is followed by a max pooling with factor 2. Each convolutional layer has a kernel size of

. The activation function used throughout is ReLU, except at the output layer we use hyperbolic tangent. The output points are in the range -1 to 1 which are then scaled to the original coordinates by using the image dimensions.

Figure 10: Detailed Network Architectures (Top) For 3D data, (Bottom) for 2D data.

For 3D we have a smaller architecture due to memory constraints, it consists of four blocks with two convolutional layers each. Each block is followed by a max pooling with factor 2. Each convolutional layer has a kernel size of . The activation function used here is a leaky-ReLU except at the output layer where we use hyperbolic tangent.


  • [1] P. Agrawal, R. T. Whitaker, and S. Y. Elhabian (2017)

    Learning deep features for automated placement of correspondence points on ensembles of complex shapes

    In Medical Image Computing and Computer Assisted Intervention − MICCAI 2017, Cham, pp. 185–193. Cited by: §2.
  • [2] P. R. Atkins, S. Y. Elhabian, P. Agrawal, M. D. Harris, R. T. Whitaker, J. A. Weiss, C. L. Peters, and A. E. Anderson (2017) Quantitative comparison of cortical bone thickness using correspondence-based shape modeling in patients with cam femoroacetabular impingement. Journal of Orthopaedic Research 35 (8), pp. 1743–1753. Cited by: §4.3, §4.3.
  • [3] G. Balakrishnan, A. Zhao, M. Sabuncu, J. Guttag, and A. V. Dalca (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE TMI: Transactions on Medical Imaging 38, pp. 1788–1800. Cited by: §1, §2.
  • [4] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International journal of computer vision 61 (2), pp. 139–157. Cited by: §2, §2.
  • [5] R. Bhalodia, L. A. Dvoracek, A. M. Ayyash, L. Kavan, R. Whitaker, and J. A. Goldstein (2020) Quantifying the severity of metopic craniosynostosis: a pilot study application of machine learning in craniofacial surgery. Journal of Craniofacial Surgery 31 (3), pp. 697–701. Cited by: §1, §2, §4.2.
  • [6] R. Bhalodia, S. Y. Elhabian, L. Kavan, and R. T. Whitaker (2018) DeepSSM: a deep learning framework for statistical shape modeling from raw images. Shape in Medical Imaging : International Workshop, ShapeMI, MICCAI 2018 11167, pp. 244–257. Cited by: §1, §2, §4.2.
  • [7] R. Bhalodia, S. Y. Elhabian, L. Kavan, and R. T. Whitaker (2019)

    A cooperative autoencoder for population-based regularization of cnn image registration

    In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, Cham, pp. 391–400. Cited by: §2.
  • [8] R. Bhalodia, L. Kavan, and R. T. Whitaker (2020) Self-supervised discovery of anatomical shape landmarks. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, Cham, pp. 627–638. Cited by: §1, §5.
  • [9] E. T. Bieging, A. Morris, B. D. Wilson, C. J. McGann, N. F. Marrouche, and J. Cates (2018) Left atrial shape predicts recurrence after atrial fibrillation catheter ablation. Journal of cardiovascular electrophysiology. Cited by: §4.4.
  • [10] D. Boscaini, J. Masci, E. Rodoià, and M. Bronstein (2016) Learning shape correspondence with anisotropic convolutional neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3197–3205. Cited by: §2.
  • [11] S. Bouix, J. C. Pruessner, D. L. Collins, and K. Siddiqi (2005) Hippocampal shape analysis using medial surfaces. Neuroimage 25 (4), pp. 1077–1089. Cited by: §2.
  • [12] J. Cates, E. Bieging, A. Morris, G. Gardner, N. Akoum, E. Kholmovski, N. Marrouche, C. McGann, and R. S. MacLeod (2014) Computational shape models characterize shape change of the left atrium in atrial fibrillation. Clinical Medicine Insights: Cardiology 8, pp. CMC–S15710. Cited by: §1.
  • [13] J. Cates, S. Elhabian, and R. Whitaker (2017) ShapeWorks: particle-based shape correspondence and visualization software. In Statistical Shape and Deformation Analysis, pp. 257–298. Cited by: §4.2.
  • [14] J. Cates, P. T. Fletcher, M. Styner, M. Shenton, and R. Whitaker (2007) Shape modeling and analysis with entropy-based particle systems. In Proceedings of Information Processing in Medical Imaging (IPMI), pp. 333–345. Cited by: §1, §2.
  • [15] A. Dalca, M. Rakic, J. Guttag, and M. Sabuncu (2019) Learning conditional deformable templates with convolutional networks. In Advances in Neural Information Processing Systems, Vol. 32, pp. . Cited by: §2.
  • [16] A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu (2018) Unsupervised learning for fast probabilistic diffeomorphic registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 729–738. Cited by: §2.
  • [17] M. Datar, J. Cates, P. T. Fletcher, S. Gouttard, G. Gerig, and R. Whitaker (2009) Particle based shape regression of open surfaces with applications to developmental neuroimaging. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 167–174. Cited by: §2.
  • [18] R. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor (2002-05) A minimum description length approach to statistical shape modeling. IEEE Transactions on Medical Imaging 21 (5), pp. 525–537. Cited by: §2, §2.
  • [19] D. DeTone, T. Malisiewicz, and A. Rabinovich (2018) Superpoint: self-supervised interest point detection and description. In

    Proceedings of the IEEE conference on computer vision and pattern recognition workshops

    pp. 224–236. Cited by: §2.
  • [20] H. Du Buf, M. Bayer, S. Droop, R. Head, S. Juggins, S. Fischer, H. Bunke, M. Wilkinson, J. Roerdink, J. Pech-Pacheco, et al. (1999) Diatom identification: a double challenge called adiac. In Proceedings 10th International Conference on Image Analysis and Processing, pp. 734–739. Cited by: §4.1.
  • [21] G. Gardner, A. Morris, K. Higuchi, R. MacLeod, and J. Cates (2013-04) A point-correspondence approach to describing the distribution of image features on anatomical surfaces, with application to atrial fibrillation. In IEEE 10th International Symposium on Biomedical Imaging, Vol. , pp. 226–229. Cited by: §2.
  • [22] G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman (2001) Shape analysis of brain ventricles using spharm. In Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001), Vol. , pp. 171–178. External Links: Document, ISSN Cited by: §2.
  • [23] A. Goparaju, I. Csecs, A. Morris, E. Kholmovski, N. Marrouche, R. Whitaker, and S. Elhabian (2018) On the evaluation and validation of off-the-shelf statistical shape modeling tools: a clinical application. In International Workshop on Shape in Medical Imaging, pp. 14–27. Cited by: §2.
  • [24] U. Grenander, Y. Chow, and D. M. Keenan (1991) Hands: A pattern theoretic study of biological shapes. Springer, New York. Note: Cited by: §2.
  • [25] M. D. Harris, S. P. Reese, C. L. Peters, J. A. Weiss, and A. E. Anderson (2013) Three-dimensional quantification of femoral head shape in controls and patients with cam-type femoroacetabular impingement. Annals of biomedical engineering 41 (6), pp. 1162–1171. Cited by: §4.3.
  • [26] M. D. Harris, M. Datar, R. T. Whitaker, E. R. Jurrus, C. L. Peters, and A. E. Anderson (2013) Statistical shape modeling of cam femoroacetabular impingement. Journal of Orthopaedic Research 31 (10), pp. 1620–1626. External Links: ISSN 1554-527X Cited by: §2, §4.3.
  • [27] H. Hufnagel, X. Pennec, J. Ehrhardt, H. Handels, and N. Ayache (2007)

    Shape analysis using a point-based statistical shape model built on correspondence probabilities

    In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 959–967. Cited by: §1.
  • [28] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu (2016) Spatial transformer networks. External Links: 1506.02025 Cited by: item .
  • [29] D. Johnson and A. O. M. Wilkie (2011) Craniosynostosis. European Journal of Human Genetics 19 (4), pp. 369–376. Cited by: §4.2.
  • [30] S. Joshi, B. Davis, M. Jomier, and G. Gerig (2004) Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage; Supplement issue on Mathematics in Brain Imaging 23 (Supplement1), pp. S151–S160. Cited by: §1, §2.
  • [31] R. Kellogg, A. C. Allori, G. F. Rogers, and J. R. Marcus (2012) Interfrontal angle for characterization of trigonocephaly: part 1: development and validation of a tool for diagnosis of metopic synostosis. Journal of Craniofacial Surgery 23 (3), pp. 799–804. Cited by: §4.2.
  • [32] D. P. Kingma and J. Ba (2017) Adam: a method for stochastic optimization. External Links: 1412.6980 Cited by: §3.4.
  • [33] G. Koch, R. Zemel, and R. Salakhutdinov (2015) Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Cited by: item .
  • [34] D. G. Lowe (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60 (2), pp. 91–110. Cited by: §2.
  • [35] C. S. Mendoza, N. Safdar, K. Okada, E. Myers, G. F. Rogers, and M. G. Linguraru (2014) Personalized assessment of craniosynostosis via statistical shape modeling. Medical Image Analysis 18 (4), pp. 635–646. Cited by: §2.
  • [36] F. Milletari, A. Rothberg, J. Jia, and M. Sofka (2017) Integrating statistical prior knowledge into convolutional neural networks. In Medical Image Computing and Computer Assisted Intervention, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne (Eds.), Cham, pp. 161–168. Cited by: §1, §2.
  • [37] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas (2012) Functional maps: a flexible representation of maps between shapes. ACM Transactions on Graphics (TOG) 31 (4), pp. 30. Cited by: §2.
  • [38] S. Pun, D. Kumar, and N. E. Lane (2015) Review: femoroacetabular impingement. Arthritis & Rheumatology 67 (1), pp. 17–27. Cited by: §3.5.
  • [39] I. Rocco, R. Arandjelovic, and J. Sivic (2017) Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6148–6157. Cited by: §2.
  • [40] M. Styner, C. Brechbuhler, G. Szekely, and G. Gerig (2000-03)

    Parametric estimate of intensity inhomogeneities applied to MRI

    IEEE Transactions on Medical Imaging 19 (3), pp. 153–165. Cited by: §2.
  • [41] M. Styner, I. Oguz, S. Xu, C. Brechbuehler, D. Pantazis, J. Levitt, M. Shenton, and G. Gerig (2006-07) Framework for the statistical shape analysis of brain structures using spharm-pdm. Note: http://hdl.handle.net/1926/215 Cited by: §1.
  • [42] H. D. Tagare and M. Rao (2014) Why does mutual-information work for image registration? a deterministic explanation. IEEE transactions on pattern analysis and machine intelligence 37 (6), pp. 1286–1296. Cited by: §3.3.1.
  • [43] D. Thompson (1917) On growth and form. Cambridge University Press. Cited by: §1, §2, §2.
  • [44] U. Von Luxburg (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §4.1.
  • [45] Z. Xu, J. Luo, J. Yan, R. Pulya, X. Li, W. Wells, and J. Jagadeesan (2020) Adversarial uni-and multi-modal stream networks for multimodal image registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 222–232. Cited by: §3.3.1.