1 Introduction
Statistical shape modeling (SSM) enables the quantitative study of anatomical forms in medical and biological sciences. Shape models provide a geometrical description of each shape that is statistically consistent across the studied shape population. A classical approach for SSM is to use landmark points to represent shapes, where landmark points often correspond to specific anatomical or morphological features. Landmarkbased SSM was first described in the pioneering work of D’Arcy Thomson [49]
. These landmark correspondences provide a flexible and easytouse shape representation to capture population variability using statistical methods such as principal component analysis (PCA) to derive shape descriptors that capture anatomically relevant morphological features. More recent works have shown a push towards obtaining dense sets of correspondences across the population automatically
[14, 16, 46]. Another method of quantifying shape differences is using the coordinate transformations that align the population of images/shapes to a predefined atlas [4]. These automated methods for representing shapes and their subsequent analysis finds application in several medical domains, such as orthopedics [26, 5], implant design [23, 34], neuroscience [19, 56], and cardiology [18, 8].The current pipeline for shape analysis relies on obtaining a shape representation (e.g., correspondences, transformations) and the associated shape descriptors, followed by an applicationdependent use of these shape descriptors. This shape descriptor is typically a lowdimensional representation learned from the population shape data; PCA scores is an example of such shape descriptor. Subsequent analyses are applicationdependent and may vary from binary prediction of a pathology [25], a localized detection of a morphological defect [26], or understanding subtleties of the morphology and testing the associated scientific hypotheses [47].
Point distribution models (PDMs) [24] relies on dense correspondences for shape representation, which are geometrically consistent set of particles placed on each anatomical surface in the shape population. The position of these correspondences can be optimized across the population via using different metrics, such as entropy [14] or minimum description length [16]. These metrics aim to reduce the statistical complexity of the model and ensure a compact statistical representation. SPHARMPDM [46], on the other hand, utilizes a parametric representation of the surface using spherical harmonics. These stateoftheart correspondence models require anatomy segmentation followed by a lengthy pipeline of preprocessing steps such as image resampling, rigid alignment, and smoothing, followed by a computationally expensive optimization to establish shape correspondences. These time and resourcesintensive steps are not only necessary for learning the shape model, but also for inferring shape descriptors for a new image given a pretrained shape model, rendering such models ineffective to be used in a fully automated pipeline.
On the other hand, deformation fields can represent shapes directly from images via imagebased registration, and hence alleviate the need for anatomy segmentation. However, the shape representation via implicit transformations between images is not intuitive and can be challenging for clinicians to understand and interpret. Additionally, to obtain a concrete shape descriptor for a single shape, we need a notion of an atlas image, which is not always easily obtained. Furthermore, to accurately represent only the shape information, we need to have anatomy segmentation, or we need a way to disentangle shape and background components from the deformation fields. Due to these drawbacks, especially the lack of easily interpretable shape representation, this paper, focuses only on PDM based shape representation models.
This paper addresses the problems in existing PDM models by proposing a novel framework, DeepSSM. The framework proposes deep convolutional neural network (CNN) architectures that learn to discover the lowdimensional shape descriptor and the associated correspondences directly from images. DeepSSM allows for a fully endtoend inference pipeline that bypasses the intensive preprocessing and segmentation that requires manual supervision. Additionally, the model’s inference is computationally fast compared to the heavy optimization in stateoftheart PDM approaches. To train the deep network robustly with limited data —a typical challenge in medical imaging applications— we also introduce a modelbased dataaugmentation strategy that enables the generation of thousands of training samples that preserve populationspecific statistics. Furthermore, this paper provides a comprehensive and systemic approach for developing supervised deep learningbased models for imagetoshape inference tasks using two different architectural variants along with loss functions and strategies tailored for individual datasets and subsequent clinical applications.
A preliminary variant of this work has been presented in conference papers [8] and [7]. Here, we significantly extend and complement this framework by proposing the following novel additions:

A modified, more general approach for dataaugmentation.

Proposed a novel architectural variant for obtaining a nonlinear shape descriptor for complex datasets. Modification to the original training loss function, and a novel alternative loss function utilized for certain datasets.

Comprehensive experimentation on three different datasets with improved evaluations  both quantitatively and via downstream usage of the shape descriptors. This experimental spectrum also provides a detailed analysis of which architecture/loss function to be used for each dataset or application.
In summary, DeepSSM provides a blueprint for inferring populationspecific statistical representation of shapes from images. This blueprint proposes data augmentation to handle limited data, different neural network architectures for various applications, and training strategies to learn shape descriptors. Finally, it includes a set of examples and guidelines for using this proposed architecture in realworld SSM applications, providing a launching point for all further imagetoshape models. Figure
1 illustrates DeepSSM as a framework and its inference pipeline compared to the standard correspondencebased method.2 Related Work
The DeepSSM framework learns the function that maps input images onto a landmark/correspondencebased shape space. In related literature, explicit correspondences have been expressed using geometric parametrizations [16, 45] as well as functional maps [42]. Furthermore, particle distribution models [14, 16, 46] are cumbersome and require manual input. DeepSSM learns a PDM model and can be supervised with any given choice of PDM and here we choose to work optimizationbased PDM ShapeWorks [13] since they observe the entire population and lead to less biased models compared to pairwise approaches such as SPHARM [46]. Extensive comparison between such PDMs is presented in [23] which clearly puts ShapeWorks as the stateoftheart PDM model.
Research on atlas construction and image registration is also relevant, these techniques typically involve the use of different parametrization for imagetoimage transformations, such as bsplines [44], deformable fields with elastic penalty [2], and diffeomorphic static or dynamic flow fields [4]. These deformation models have proven useful for shape analysis [31, 17], but their use has mostly been exploited for image alignment [4]. In recent years, there has been a lot of interest in deep learning based unsupervised registration models [3, 55] that provide a computationally fast alternative to classical models while maintaining registration accuracy. While these models are applicable directly on images, to successfully utilize them for shape analysis they need to be applied on the segmentation of the anatomy [6]. Additionally, expressing the shape representation for a single sample requires a notion of an atlas image that might not be always available.
Convolution neural networks (CNNs) [35] for 3D volumes has been explored extensively in medical imaging community [54, 43]. Deep learning has paved way for several application in detection [57], classification [36], and segmentation [1, 43]. There are other works that incorporate shape priors such as [41] regularizes anatomy segmentation by learning lowdimensional nonlinear shape representations as a constraint. While [15]
utilizes a variational autoencoder as a prior generative model on brain label maps for unsupervised segmentation. Regression from 3D volumes, which is a more relevant application for DeepSSM, has been used in limited capacity. In
[29] they regress the orientation and position of the heart from 2D ultrasound, and [12]utilizes regression network to find a displacement vector between nearby patches in order to improve registration model. Another related recent work
[39] demonstrates the efficacy of using PCA scores in regression of the landmark positions used for 2D ultrasound segmentation task. Similarly, [51] use PCA scores as a prior for probabilistic 2D surface reconstruction and [50] extend this to probabilistic 3D surface reconstruction from sparse 2D images. The proposed framework of DeepSSM in this work and the proof of concept described in [8, 7] extends the idea of using PCA scores for 3D images, as well as explores the construction and use of a nonlinear low dimensional shape descriptor to replace PCA scores. DeepSSM also proposed a novel dataaugmentation scheme to counter the problem of data availability in medical imaging applications, and provides strategies for using the model under different scenarios involving data or downstream applications.3 Methods
This section focuses on the proposed DeepSSM framework that entail the dataaugmentation and the different DeepSSM variants. The specific details of network architectures and the associated implementation can be found in 0.A.
3.1 Data Augmentation
Deep learning models are datahungry, and data is usually scarce in medical imaging applications; therefore, dataaugmentation methods are typically used to significantly increase the training sample size. For statistical shape modeling, the given population size is typically in the range of 100200 samples; this is insufficient to train a CNN as the network will overfit. To handle this data scarcity issue, DeepSSM frameworks include a modelbased data augmentation approach to generate augmented training samples that follow the statistics of the given population.
We start with a set of original images and obtain a point distribution model (PDM), also called a correspondence model, on these shapes. We use ShapeWorks [13] to generate the PDMs used in the experiments presented in this work. In a recent benchmarking study [23, 22], ShapeWorks was shown to discover clinically relevant shape differences. However, any set of correspondences (i.e., a PDM or even manual landmarks) can be utilized for the proposed model. This PDM will give us a set of surface correspondence points for the anatomy associated with each image. Let’s denote this set of original correspondence points .
The first step in data augmentation is to define a generative model over the existing PDM space (also known as shape space). This allows us to sample plausible shapes from the distribution. In previous iterations of DeepSSM, we performed PCA on the shape space and modeled it parametrically using a Gaussian model [7]
or a Gaussian mixture model (GMM)
[8]. However, when the underlying distribution of the original shape space is nonlinear, the Gaussian model will ignore vital subtleties in the distribution. Also, the GMM model with too few clusters is prone to overestimating the variance and sampling implausible shapes.
To counteract these problems, we propose a nonparametric kernel density estimate (KDE) for modeling the PCA space of the correspondences. We found this to be a robust augmentation strategy that generates plausible shapes while still capturing all the subtleties of the original shape space. Augmentation is performed in PCA space and not directly on the shape space as it preserves the smoothness of the anatomy making it a more plausible shape without noise. Additionally, the augmentation is performed by utilizing all available PCA modes, i.e. N1, which along with proving smoothing will also ensure that we do not destroy any inherent characteristic that shape distribution may exhibit. Let
represent the PCA scores for the correspondences, then we define as:(1) 
where is the dimension of the PCA subspace. Here the kernel bandwidth, , is computed as the average nearest neighbor distance in the PCA subspace. In this case, when a new PCA score is sampled from the kernel of the training sample (), we can obtain the sampled correspondence, , via PCA reconstruction. Once we have the sample correspondence representation, we need the associated image to complete the data augmentation process. We find the original correspondence, , from whose kernel is sampled from and use the corresponding raw image . Using the Thin plate spline (TPS) warp between (serving as accurate landmarks on the anatomy surface), we warp the image . Using this technique, we can generate thousands of training data pairs of images and their corresponding PCA score vectors/correspondence points, while preserving the shape population statistics. The entire pipeline is depicted in Figure 2.
3.2 DeepSSM Variants
Using the above data augmentation techniques, we form the training data composed of (i) raw 3D images, (ii) correspondences associated with each scan, and (iii) the PCA scores associated with the correspondences. In the following section, we present the DeepSSM architecture and associated loss functions. We also introduce a novel alternate architecture that utilizes a nonlinear representation of correspondences in place of the PCA scores.
3.2.1 Basic Framework
The DeepSSM framework predicts a PCA score representation of the correspondence points from their associated 3D volumes/images. Each input is the image and correspondence pair given by , and is the PCA score of correspondence . We use a CNNbased architecture with five convolution layers and two fully connected layers to perform a regression task from 3D raw CT/MRI volume to correspondences (see Figure 3 [A]). Detailed architectural description is provided in 0.A. Throughout the paper, we refer to this architecture as BaseDeepSSM. This CNN block is the score encoder given as that learns the PCA score shape descriptor . We recover the correspondences from the PCA scores by passing them through a final fully connected layer. This final layer does not have a nonlinear activation and is given as . The weight and bias are kept fixed (they are not updated) during the training and are set to the PCA basis and mean shape, respectively. This pair of set weights and bias performs the PCA reconstruction and we obtain the predicted correspondences () as the output as . Therefore, we obtain both the correspondence shape representation as well as the associated shape descriptor (in the form of PCA scores) from the image. Theoretically, the parameters of the last layer can be updated along with the network; however, PCA score as a shape descriptor is more interpretable and is useful in further visualization of modes of shape variation.
For the training loss, we can directly apply supervision on the correspondences as the network architecture itself handles the PCA encoding. This is given as:
(2) 
Note about loss functions: Initial iterations of DeepSSM provided in [7, 8], utilized supervision on the PCA scores directly and defined the loss function as follows:
(3) 
However, there is a key drawback to learning the PCA scores directly as they are inherently ordered. In the typical setting, the PCA scores will have an arbitrary scale per component. Components that express major modes of variation exhibit numerically larger PCA scores than those that express small and subtle shape variations. Given the nature of neural networks and the loss function given in Eq. 3, the network tends to generalize better on the larger PCA modes than smaller ones, missing subtle shape statistics that might be important for subsequent analysis. The correspondencebased loss function (Eq. 2) solves this problem of nonuniform error across the PCA scores, as the correspondence points are equally weighted, so all the shape variations are equally captured. The correspondencebased loss also relies on the PCA shape descriptor; the only difference is that the PCA encoding is taken care of implicitly by the network architecture. In all the datasets presented in Section 4, we found that models trained with Eq. 2 as training loss outperforms the ones trained with Eq. 3, therefore, we only analyze the models trained with 2 in this paper.
3.2.2 TLDeepSSM
Regression of the shape to its PCA scores is sufficient to capture all the linear shape variations in the population but will fail to capture the subtle nonlinear changes that might be of interest. Although the data augmentation relies on PCA spaces, the correspondences may showcase nonlinear variations (since we utilize all the PCA modes for augmentation), and the PCA scorebased shape descriptor may be limiting in describing the anatomical variations. For this, we propose a variant architecture of DeepSSM that does not rely on PCA score as a shape descriptor. We take inspiration from TL network [20] architecture, which is applied for 2D to 3D reconstruction and relies on discovering a common manifold that adheres to two different representations of the same object. We utilize this idea to find a low dimensional latent representation that is common to the PDM representation and its corresponding raw image. The input is an image and correspondence pair given by . The network architecture of the TLvariant consists of two parts: (i) the autoencoder with encoder , and decoder that learns the latent code as for each correspondence , and (ii) the network that learns the latent code from the image (we call this the Tflank and it is given by ). This is shown in the bottom block of Figure 3, and a detailed description of the network architecture is given in 0.A The loss function of the entire architecture is given as follows:
(4)  
(5)  
(6) 
We break the training routine into three parts (similar to [20]). First, we train the autoencoder using the correspondences, and hence we have by using . Next, we train the Tflank while the autoencoder weights are kept frozen. The loss function is given as , by setting . Finally, we train the entire model jointly with the loss function , and . For inference using a testing sample, one can directly obtain the correspondences from an image via the Tflank and decoder as .
3.3 Focal Loss
We showcased two different architectures of the DeepSSM model; however, both these are trained by utilizing the 2 norm as training loss. Some anatomy datasets need a different loss. Specifically, when the given anatomy exhibits very subtle shape variations in localized regions, i.e., the variance of the particles across the data is only significant for a small subset of correspondences. Amidst this variance imbalance across the particles, if we utilize the 2 correspondence loss as given in Eq. 2, the network will degenerate to producing the mean shape as the output, independent of the input image. To handle such datasets, we utilize the focal loss [37], which was initially proposed for handling class imbalance in object detection tasks but have since been extended for regression applications [38]. For a pair of input () and prediction () the focal loss is given as:
(7) 
Here, and
are hyperparameters where
is a scaling factor and decides the threshold. If the particle difference is below , the contribution of the particle to the overall loss is reduced. The scaling parameter is used to dampen or accentuate the loss based on the magnitude of . This focal loss allows for better generalization on datasets with subtle variations and prevents the model from degenerating to learning the mean shape. To incorporate focal loss with BaseDeepSSM we modify the loss in Eq. 2 to the following:(8) 
For the TLDeepSSM the focal loss is applied both for autoencoder training as well as training the latent space, and hence the loss function is modified as follows:
(9)  
(10)  
(11) 
The hyperparamenter
is estimated using a heuristic. We compute the absolute difference per point for each sample in the training data with respect to the mean shape, i.e.,
, for sample and correspondence. We choose as the score corresponding to the 90 percentile value. We set the scaling hyperparameter to 10 in all our experiments for particle based focal loss (eq. 8 and eq. 10), as in most cases lies between 12. The TLDeepSSM with the focal loss, has focal loss applied at two different places and each of them will have its own a and c parameters. First, for the correspondences autoencoder, whose a and c parameters are discovered similarly to that of the baseDeepSSM, as the loss is applied on the correspondences. Second, for the latent space, we use the trained autoencoder and the latent space for the training data to get an estimate of c as described above, here the estimates of c are in orders of 10 for all dataset and hence we set our a as 1 for all cases.4 Results
In this section, we describe the application of DeepSSM on three different datasets provided in separate subsections. We also describe the specific training methodology and hyperparameters used for each dataset in their respective sections. Some training and implementation details are common across experiments; these are compiled together in 0.A for clarity.
Evaluation: In the upcoming subsections, we evaluate DeepSSM to showcase that it performs similarly to a stateoftheart PDM model by comparing the dense correspondences. We reiteratate here that for a stateoftheart PDM we need to segment the raw images and process them before feeding them into the PDM algorithm. For each dataset, we evaluate both BaseDeepSSM and TLDeepSSM with and without focal loss. Error is captured via average relative mean squared error (RMSE) between the predicted 3D correspondences and stateoftheart. We compute this by averaging the RMSE for ,, and coordinates as follows:
(12) 
Where, for N 3D correspondences, , and similarly for y and z coordinates. We also compute the RMSE error for each correspondence point, i.e.,
. The mean and standard deviation of these perpoint RMSE are visualized as a heatmap on top of the mean shape. Perpoint RMSE informs us how well DeepSSM is able to model different local anatomy. Along with the RMSE, we also report the surfacetosurface distance between the ground truth mesh and the mesh reconstructed from the particles predicted by DeepSSM. The surfacetosurface distance gives a more accurate portrayal of particles adhering to the shape and gives insight into whether or not they can be used for facilitating anatomy segmentation. Additionally, to validate the efficacy of DeepSSM as an estimator of shape representation, we use the DeepSSM correspondences for different downstream analysis applications and compare the results to that of a stateoftheart PDM. These downstream applications are different for each dataset and are described separately in corresponding subsections.
Inference Time: One of the contribution of the DeepSSM model is fast inference on new samples, i.e., using a trained model to predict correspondences from a new image. Comparison of the inference time using DeepSSM and stateoftheart PDM, ShapeWorks [13] for a single scan is shown in Table 1, and we see the drastic improvement that DeepSSM provides. The DeepSSM time provided are for BaseDeepSSM, and should be noted that the TLDeepSSM has very similar inference times. In these results ShapeWorks was executed on a system with 192 Intel Xeon Gold 6252 CPU @ 2.10GHz(HT) cluster, with multithreading. The DeepSSM results are computed on the same machine utilizing the CPU for a fair comparison. These results are only reporting the inference time and ignores processing time required for the input, which is very high for Shapeworks (or any PDM) as it includes segmentation and heavy preprocessing, and is zero for DeepSSM, since it acts directly on images.
Data  ShapeWorks [13]  DeepSSM  

Metopic  2048  39 minutes  0.56 seconds 
LA  1024  31 minutes  0.28 seconds 
Femur  1024  65 minutes  1.25 seconds 
4.1 Metopic Craniosynostosis
Craniosynostosis is a morphological defect that affects infants, where one of the sutures of the skull is fused prematurely, and the subsequent brain growth leads to abnormal head shapes. In severe cases, craniosynostosis can lead to increased intracranial pressure, causing symptoms such as headaches, drowsiness, vision changes, and more. It can also cause several neurological problems. Metopic craniosynostosis is craniosynostosis where the metopic suture (also called frontal suture) is fused prematurely. It leads to an unnaturally triangular head shape, a morphological symptom known as trigonocephaly
[32]. Compensatory growth in metopic craniosynostosis also causes expansion at the back of the skull. Metopic craniosynostosis is difficult to identify with CT scans as the suture fuses typically at an early age, making it difficult to ascertain whether or not the suture fused prematurely. The current diagnosis protocol includes subjective observation of the head shape. The diagnosis affects the subsequent treatment protocol, namely to perform corrective craniofacial surgery or not. This treatment is risky by nature and requires an objective measure of the severity of the condition. This objective severity measure for craniosynostosis will guide the surgeons to make a betterinformed diagnosis. In this section, we will utilize DeepSSM to construct a shape model on cranial data. The end goal is to quantify the deviation of a given shape diagnosed with metopic craniosynostosis from the population of normal cranial shapes.
Data Description and Processing: Cranial CT scans of infants between 515 months of age were acquired at UPMC Children’s Hospital of Pittsburgh. A subset of 28 scans were diagnosed with metopic craniosynostosis by a boardcertified craniofacial plastic surgeon utilizing the CT images as well as a physical examination. The remaining 92 scans are CT scans of trauma subjects acquired as a byproduct of standard clinical practice; however, these scans do not exhibit any deformities and can be treated as a set of control patients. All the data is IRB approved and acquired by utilizing standard deidentification and HIPAA protocols. Along with the CT scans, we have corresponding segmentations of the cranium. We separate this dataset into 105 training and 15 testing images. We process images and segmentations for the training data and utilize the ShapeWorks [13] package to generate the initial PDM on the data. To satisfy the GPU memory requirements, all of the images are downsampled isotropically by a factor of 4. This results in volume size of and resulting voxel spacing of 4mm in each direction. The PDM consists of 1024 3D points and uses 20 PCA components (capturing 99% of variability). We perform data augmentation and generate 5000 additional data points. The total 5000 augmented plus 105 original training images are divided into training and validation set with a 90%10% split.
Training Specifics: We train the BaseDeepSSM with 20 PCA components (explaining 99% of data variability), using loss on correspondences (Eq. 2
) for 100 epochs with early stopping with respect to validation loss. Similarly, we also train the BaseDeepSSM with focal loss as given by Eq.
8 for 100 epochs. Finally, we train a TLDeepSSM with a bottleneck of 64 dimensions with and without the focal loss. We train the autoencoder for 5000 epochs, the Tflank for 100 epochs, and the joint model for 50 epochs. The values of the hyperparameter for the focal loss was discovered by the heuristic described in Section 3.3, and is 1.09 for particles (in base and TL DeepSSM) and 10.9 for TL latent space.Evaluation and Analysis: The average RMSE and the perpoint RMSE are given in Figure 4, and the average surfacetosurface distance for each test sample is provided via boxplots in Figure 6. These metrics, namely RMSE, perpoint RMSE, and surfacetosurface distance are computed as described in the beginning of the Section 4. Qualitative results are showcased in Figure 5, the heatmap is the surfacetosurface distance between the original mesh and the mesh reconstructed from the BaseDeepSSM (without focal loss) predicted particles. From Figure 4 we can see that the prediction error is centered around the eye sockets or frontal ridge, however the average RMSE is well within the subvoxel margin, as the input images have spacing of 4mm. We can see that both normal and metopic test scans capture the shape well and with a subvoxel accuracy in most areas. The leftmost metopic scan in Figure 5 is the result with the largest error, and the maximum error is localized in the region near the orbital rim. The oribital rim is a sharp and thin structure, and the original PDM on which DeepSSM is trained does not provide very accurate correspondences to the region. These inaccuracies persists in DeepSSM training and the trained model also exhibits this error in the localized area. This is a data problem, and the neural network does not learn the structures that are not well expressed by data. The improvement provided by focal loss is minimal both quantitatively as well as qualitatively, however, we can see from Figure 4 that the distribution of RMSE error is more spread out and less localized in specific regions. Because focal loss does not provide significant improvement on this dataset, we the BaseDeepSSM and TLDeepSSM trained without focal loss for the downstream application and analysis.
Downstream Task – Severity Quantification: Our end goal is to utilize DeepSSM to provide clinicians with an endtoend tool where they can input an image and acquire a severity score.
The severity score is designed to measure the abnormality of a subject with respect to the normal population (samples with a negative diagnosis for metopic craniosynostosis). We utilize an unsupervised learning method that models the variations in the normal population and learns the deviation between a test image and this population
[48]. The method employs Mahalanobis distance/Zscore, which uses the variance in the normal population to normalize the multidimensional difference of the target subject. It can be regarded as a Zscore for multiple dimensions and effectively scales up rare characteristics in the shape statistics of the normal population.
The sample number is usually limited and much lower than the shape dimension; we account for the variability outside the sample subspace using Probabilistic Principal Component Analysis (PPCA). PPCA estimates the full rank covariance as given by Equations (13) and (14). Here, is the shape dimension, is the subspace dimension, is the empirical mean,
are the columns of eigenvectors and
is the diagonal matrix of eigenvalues of
. We can use the eigendecomposition to whiten a shape deviation, project back to the shape space and get the pointwise Mahalanobis Distance which explains the contributions across the shape domain to the final severity score (as shown in Equation (15)).(13)  
(14)  
(15) 
Evaluation result for the severity scores is shown in Table 2. For each shape model, we fitted PPCA (with 95% variance subspace) on 72 control scans and calculated the severity scores for the remaining 48 scans (these include 28 metopic and 20 normal scans). The latter group is provided with aggregated expert ratings of the degree of metopic craniosynostosis, collected through an online portal with 36 craniofacial experts. We displayed the 3d segmentation and asked the physicians to estimate the deformity on a 5 point Likert scale. We modeled the expert ratings by Latent trait theory [52, 40] which parameterized potential raters’ bias and inconsistency and obtained the aggregated latent severity using Maximum Likelihood Estimation.
We use both Pearson and Spearman correlation coefficients to measure the similarity between the predicted severity scores and the discovered latent trait using the expert ratings. The results presented in Table 2 suggest that both base and TLDeepSSM have severity scores that are highly correlated with the expert consensus. In addition, we also calculate the Area Under Curve (AUC) using the binary diagnosis to evaluate the efficacy of the predicted severity scores for classification. The AUC also suggest that DeepSSM based severity is a good indicator to identify scans with metopic craniosynostosis. Both AUC and correlation scores for the DeepSSM variants are comparable to results using ShapeWorks [13], hence verifying that DeepSSM can perform comparably with stateoftheart PDM at this downstream task.
Metric & Model  Base  TL  ShapeWorks [13] 

PearsonR  0.82  0.82  0.85 
SpearmanR  0.80  0.81  0.83 
AUC  0.946  0.98  0.932 
We displayed the pointwise Mahalanobis distance across the shape domain as is shown in Figure 7. The pointwise Mahalanobis distance is obtained by averaging the whitened deviations of all metopic scans. It is then projected onto the surface normals and visualized using the heap map on the top of the mean control shape in both top and side views. The heat maps show that the center of the frontal bone protrudes while the sides shrink; this morphological deviation showcases trigonocephaly, which is a clinically expected observation. Therefore, DeepSSM shape model captures the characteristic deformation pattern for metopic craniosynostosis.
4.2 Camtype Femoroacetabular Impingement
Camtype femoroacetabular impingement (camFAI) is characterized by an abnormal bone growth of the femoral head. The abnormal bone growth causes excessive friction between the hip socket(part of the large pelvis bone) and the femoral head [26]. This friction can cause eventual joint damage leading to pain and restricted movement, which is also a leading cause of hip osteoarthritis. Modeling the shape from a joint population of femurs diagnosed with and without CAMFAI can characterize the local femoral head deformity (camlesion) with respect to the femoral shape statistics of healthy subjects. Such a characterization can provide surgeons an anatomical map of the deformity and therefore driving the surgical intervention. It can also help understand the pathology more intuitively, and correct for local abnormal morphology and preserve normal bone function. We intend to utilize DeepSSM models to achieve this characterization and compare it to the stateoftheart PDM models, showcasing the efficacy of utilizing DeepSSM for such tasks while mitigating the need for segmentation and added preprocessing.
Data Description and Processing: For training the DeepSSM models, we have 49 CT images of the femur bone. Out of these 49 CT, 42 subjects are healthy and not diagnosed with any morphological abnormality in the femur bone, and we call these control scans. The remaining 7 scans are diagnosed with CAMFAI. For the initial shape model, we use ShapeWorks [13] and generate 1024 correspondences on these shapes. Next, we apply data augmentation on these 49 scans producing 5000 additional data points. This combined 5049 scans are split into 9010% split for training and validation. We utilize 20 PCA modes (capturing 99% variability) for training the DeepSSM. In addition, to the training and validation data, we also have 7 controls and 2 CAMFAI scans reserved for testing the DeepSSM model. Each image is downsampled by a factor of 4 to fit the GPU memory requirements easily. This downsampling results in the 3D volumes with dimensions of voxels, with isotropic voxel spacing of 2mm.
Training Specifics: We train the femur model for the BaseDeepSSM and TLDeepSSM with and without the focal loss. We train the baseline DeepSSM with 20 PCA modes without focal loss via Eq. 2 and with focal loss by Eq. 8. We train each of these models for 50 epochs with early stopping based on the validation loss. Finally, we train a TLDeepSSM with and without the focal loss, where we train the autoencoder for 10000 epochs followed by training the Tflank for 50 epochs and joint training for 25 epochs. The values of the hyperparameter c for the focal loss was discovered by the heuristic described in Section 3.3, and is 1.32 for particles (in base and TL DeepSSM) and 6.3 for TL latent space.
Results and Analysis: The average and perpoint RMSE are given in 4. Surfacetosurface distance is computed between the mesh reconstructed from the DeepSSM predicted correspondences and the mesh coming from groundtruth segmentation of the femur bone, and is given in Figure 6. We have two key observations, first, we notice that focal loss improves the RMSE scores marginally, however this improvement is crucial in capturing the subtle variability in femur anatomy. We also see from Figure 4 that the models with focal loss have a more diffused loss and is less persistent in greater trochanter area. Second, we notice that the TL variant outperform the BaseDeepSSM but not by a huge margin indicating that the PCA scores as a shape descriptor captures most of the variability. In this dataset we notice that the models trained without the focal loss degenerate to producing a similar predictions in areas which have large variations in shape and as they are very limited in number the loss is still low in value. However, this is meaningless as the areas with larger variance are the most important in characterizing the anatomy and we want DeepSSM to capture them. Qualitative results using the BaseDeepSSM with focal loss are shown in Figure 8, it showcases the surfacetosurface distance on testing images (both controls and CAMFAI).
We notice that the error on the lesser trochanter and the CAM lesion is relatively high in some cases; this is due to two reasons. First, the plane at which the femurs are chopped off is arbitrary, and in some cases (both in training and testing images), the entire lesser trochanter is not captured. This inconsistency in cutting planes introduces a high and uncertain shape variation in that region that is hard for the network to learn. This is a data issue, and can be fixed by standardizing the scans fed as an input to the model. Second, the error in CAM lesion occurs due to data imbalance, i.e., there are many more controls than the CAMFAI diagnosed femurs. This results in extreme cases of CAM (such as the second CAM case in Figure 8) not completely being captured by the DeepSSM model. DeepSSM model can only encode the variations that are present in training data, and if the dataset were larger and well balanced between two classes, the performance of the model would increase.
Downstream Analysis – Group Differences: As mentioned earlier, it is of interest to the clinical community to capture the statistical morphological difference between the CAMFAI shape and the representative control femur bone shape. This characterization is better represented by models trained with focal loss and hence, for this downstream analysis we use base and TLDeepSSM trained with the focal loss. We construct two groups, one for controls and the other for pathology (camFAI), and compute the difference between their means ( and ) and showcase this difference on a mesh. This is termed as a group difference [26]. We compute this group difference utilizing the DeepSSM predicted particles, as well as via ShapeWorks PDM model. The entire data is used (both testing and training) for these group differences, and the figures are shown in Figure 9. Each group difference showcases the difference going from pathological mean shape to control the mean shape and is overlayed on top of the mean pathological scan. We can see that the group differences for both the stateoftheart PDM model and DeepSSM are very similar. Therefore showcasing that DeepSSM can be used to get correspondences without undergoing heavy preprocessing and segmentation steps, and these correspondences can be used to characterize the CAM deformity.
TLDeepSSM predicts a latent space that captures nonlinear variations, but at the same time, can also capture the morphological differences between normal and CAMFAI femoral shape in its latent space. To show this we move along the line connecting mean normal () and mean pathological () in the latent space, and then at each sample we utilize the decoder() to get the corresponding correspondences, i.e . Figure 10 shows the difference between each of these samples and mean normal shape, i.e., , overlaid on . We observe the clinically expected outcome, as femur anatomy smoothly exhibiting inward motion around the CAM lesion as increases.
4.3 Atrial Fibrillation
Atrial Fibrillation (AF) is irregular heartbeat (arrhythmia) that can lead to several problems such as stroke, blood clots, and even cardiac arrest. Catheter ablation is a therapeutic procedure that is commonly utilized to treat AF; however, many patients can exhibit a recurrence of AF after ablation. This AF recurrence is demonstrated to be related to the shape of the left atrium [9, 8]. We utilize DeepSSM models and their predicted shape descriptors to validate this hypothesis and estimate AF recurrence based on the shape of the left atrium.
Data Description and Processing: For the left atrium (LA) experiments, we utilize 206 late gadolinium enhancement (LGE) MRI images from patients that have been diagnosed with atrial fibrillation (AF). These scans are acquired after the first ablation. Of the 206 scans, 122 are used in training, and additional 84 scans are kept as testing set curated carefully, ensuring even distribution of intensity and shape variation in the training versus testing data. A PDM of 1024 correspondences is generated using ShapeWorks [13] for the training set, then using this as a base shape model, correspondences are also found for the test set for evaluation. We extract 19 modes from the PDM via PCA, which captures 95% of the shape variability. We then run dataaugmentation on the training set to create 5000 additional samples, 80% of which are used in training and 20% in validation. For training purposes, the MRIs are downsampled from (0.625mm isotropic voxel spacing) with a factor of 2, resulting in images of size (1.25mm voxel spacing).
Training Specifics: We train the BaseDeepSSM with and without the focal loss for 100 epochs with 19 PCA scores that capture ( 95%) data variability. We use early stopping to evaluate performance on the best epoch. For the TLDeepSSM (with and without the focal loss), we train the autoencoder for 10000 epochs, followed by 100 epochs to train the latent space and 25 epochs for joint training. The autoencoder bottleneck of 64 was chosen for TLDeepSSM. The hyperparameter c used in focal loss is 2.54 for the particle loss (base and TLDeepSSM), and is 13.76 for TLDeepSSM latent space loss.
Results and Analysis: This is a challenging dataset, firstly in terms of the intensity variations present in the LGE images (Figure 11), with some images being significantly degraded and noisy. Secondly, in terms of shape variation, the left atrium anatomy exhibits natural clustering based on the shape of the atrium and based on the different numbers of pulmonary veins and their orientation [28]. Additionally, determining where the pulmonary veins should end in a segmentation is arbitrary across samples and across annotators. Due to this highly nonlinear shape distribution, the TLDeepSSM outperforms the other variants considerably. The RMSE results are shown ion Figure 4, and the surfacetosurface results are shown in Figure 6. Similar to the craniosynostosis dataset, here the results with and without the focal loss does not make much difference quantitatively as well as qualitatively. The TLnetwork provides considerable improvement over the average RMSE as seen in Figure 4, with majority of the error still being concentrated around the pulmonary veins similar to that of BaseDeepSSM. Qualitative results are showcased in Figure 11, which depicts surfacetosurface distance between the ground truth mesh and the mesh reconstructed from the TLDeepSSM predicted particles. We notice that the surfacetosurface distance shown in these scans is as large as 11 mm, which ideally is a significant error. However, this error is primarily localized in the pulmonary vein region that has inconsistent initial segmentation. The left atrium body, whose shape is indicative of AF recurrence, exhibits near subvoxel accuracy.
Downstream Task  AF Recurrence Prediction: As the focal loss does not make much difference, this downstream analysis is performed using the base and TLDeepSSM trained without the focal loss. Left atrium shape is indicative of AF recurrence [26]
. In our dataset, we are also provided with binary outcome data that indicate whether or not the patient had AF recurrence postablation. Using this binary information, we aim to train a supervised predictor to obtain AF recurrence probability from the shape descriptor. Stateoftheart PDM models provide us with correspondences, and an ideal choice of shape descriptor is to use PCA scores. However, not all the modes of PCA are useful in exhibiting anatomical changes that discriminate left atrium with and without AF recurrence. Hence, we perform feature trimming via a heuristic; we compute the absolute difference between the shape descriptor of the normal and AF recurrence left atrium and select ten dimensions with the highest magnitude. Using this trimmed PCA scores shape descriptor, we use a multilayer perceptron for classification with 122 training and 84 testing data. We perform this for PCA scores from
ShapeWorks [13] and BaseDeepSSM. Furthermore, we also perform the same analysis utilizing the latent space of TLDeepSSM as the shape descriptor. Both these models used are the ones without the focal loss. The results are showcased in Table 3. We see that PDM and BaseDeepSSM perform similarly, indicating that the shape model from DeepSSM can be directly used for providing a useful shapedescriptor that performs comparably with the stateoftheart PDM. We also see that TLDeepSSM space outperforms the PCA scores space of other models; this can be attributed to the nonlinear nature of the anatomical variation that is better captured by an autoencoder in place of PCA. Hence, the TLDeepSSM learns a richer shape descriptor for the left atrium.Model  ShapeWorks  BaseDeepSSM  TLDeepSSM 
Accuracy  61.9%  59.5%  71.4% 
Since TLDeepSSM produces a more discriminatory shape descriptor for AF recurrence prediction, we move along the latent space of the TLDeepSSM similar to analysis performed in Section 4.2 in Figure 10. Reiterating, we find the mean of the normal left atrium and mean of left atrium diagnosed with AF recurrence in the latent space. We move along the line (using parameter ) joining these means and reconstruct the samples back to the correspondence space (). We take the difference of each of these samples from the normal mean and project the variation on the normal mean. Note that here we visualize on shown in Figure 12, which is reverse of the visualization performed in Figure 10, this is done for ease of visualization. Both ways still demonstrate the shape variation from normal to pathological shape. From Figure 12 we see that as the increases, i.e., moving towards pathology, the left atrium shows outward movement, indicating increased bulging of the left atrium as a morphological change observed in AF recurrence patients. This observation clearly shows the left atrium shape is more spherical in cases diagnosed with AF recurrence, this aligns with observations in previous studies [11, 10] that showcases that sphericity of the left atrium body is associated with recurrence, and the feature can be used to predict AF recurrence after ablation. Another work [53] observes a larger asymmetry in left atrium shapes of patients with AF recurrence, this also aligns with our observations in Figure 12 that exhibits more expansion on one side than the other. This alignment with previous works and their observations strengthens the argument that TLDeepSSM’s latent space captures the anatomical variations that differentiate the left atrium shapes with and without AF recurrence after ablation, and can be effectively used as a predictor for the same.
5 Discussion
The goal of DeepSSM is to arrive at a shape descriptor that adequately captures the shape variability in a population so that it can be used for downstream application without the need for resourceheavy processing. In this manuscript, via four different datasets and applications, we showcase that the correspondence model provided by DeepSSM and its variants is comparable in performance with stateoftheart PDM, if not better in some scenarios. It validates the efficacy of DeepSSM and provides a novel way to achieve shape modeling tasks on new images without the need for segmentation, preprocessing, and optimization. Additionally, DeepSSM inference computationally surpasses the current methods by obtaining correspondences on a new scan in seconds as compared to 12hrs when using ShapeWorks[13] and other PDMs. This processingfree and fast inference also allows shape model based applications to have a fully endtoend pipeline that could be easily deployed, for instance, on cloud services. DeepSSM is not a single network; it is a customizable framework/methodology to get statistical shape models directly from images; it can be customized in architectural variant and loss functions. This manuscript aims to provide a comprehensive description of linear and nonlinear architectures and related loss functions of DeepSSM and how they can be adapted for individual datasets/applications.
5.1 When to use each variant?
We have proposed two different loss functions as well as two different architectures. Throughout our experimentation, we have discovered that the utility of these variants differs based on the data/application the model is being used for. This discussion sheds light on where each variant might be helpful over others.
As mentioned above, the choice for variant depends on the data or applications or both. Let us start by looking at applications of the shape descriptor. These can be broadly divided into two parts, (i) where the shape descriptor is being used as a feature for subsequent prediction/classification/clustering model, and (ii) where the feature is examined to understand the morphology of a sample/group of samples. In (i), we can use any shape descriptor, and it is on the subsequent model to extract the shape information. In case of (ii), the interpretability of the shape descriptor plays an important role, and PCA scores provide this, and hence, base DeepSSM will work better for such applications.
Now let us look at the data, and here is where most of the subjectivity comes into play. For most datasets, TLDeepSSM outperforms the base model (Figure 4 and Figure 6). For datasets with limited variability in shapes and no natural clustering of morphology, the TLDeepSSM performs marginally better quantitatively. This marginal difference does not affect the downstream application, and hence, it is better to use base DeepSSM, which provides us with a more interpretable shape descriptor. Whereas with anatomy like left atrium where we have highly nonlinear shape variations, we see that the quantitative performance of TLDeepSSM outshines other variants significantly. Furthermore, TLDeepSSM’s shape descriptor performs better at downstream tasks, showcasing that TLDeepSSM should be used with such datasets. In some other datasets, the variation across the population is localized in some small regions and the rest are mostly constant. Such a scenario occurs with femur dataset (Section 4.2). In such cases utilizing focal loss, break the network out of the localminima of learning data mean or few cluster means.
6 Acknowledgments
This work was supported by the National Institutes of Health under grant numbers NIBIBU24EB029011, NIAMSR01AR076120, NHLBIR01HL135568, NIBIBR21EB026061, and NIGMSP41GM103545. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The authors would like to thank the Division of Cardiovascular Medicine (data were collected under Nassir Marrouche, MD, oversight and currently managed by Brent Wilson, MD, PhD) and the Orthopaedic Research Laboratory (ORL) (data were collected under Andrew Anderson, PhD, oversight) at the University of Utah for providing the MRI/CT scans and the corresponding segmentation of the left atrium and femur datasets, with a special thanks to Evgueni Kholmovski for assisting in MRI image acquisition.
Appendix 0.A Network and Implementation Details
0.a.1 Network Architecture
The architecture for the base DeepSSM the last fully connected layer that performs the PCA reconstruction from the predicted loadings is kept fixed. The architecture for the base model is shown in Figure 13
[A]. The latent encoder has five convolution layers with the number of output channels as [12, 24, 48, 96, 192], as shown in the figure, followed by three fully connected layers with output sizes of 384,96 and K. Here, K is the number of PCA loadings being used for a given model. We have isotropic maxpooling by a factor of 2 after the first, third, and fifth convolution layers; this is denoted by the size factor shown at each block as (1), (1/2), and (1/4). The output of this latent encoder is the predicted PCA loading shape descriptor. It is passed through a linear, fully connected layer, whose weights are initialized by the PCA basis and the bias by the mean shape. Again the parameters of this layer are
fixedfor base DeepSSM; hence, it performs PCA reconstruction giving us the predicted correspondences. We use parametric ReLU
[27]for as our nonlinear activation function, and each layer is followed by batch normalization
[30].For the TLDeepSSM (shown in Figure 13[B]), the Tflank network is the same as the latent encoder for the base DeepSSM, and instead of producing the predicted PCA loadings, it will predict a K dimensional latent vector. The second part of the TLDeepSSM is an autoencoder that consists of two fully connected layers for encoder and decoder each as shown in the figure, and the nonlinear activation used here is leakyReLUs.
0.a.2 Implementation Details
The weights of the network as initialized by Xavier initialization [21]
. The implementation of the networks was done in Pytorch, and experiments were performed on NVIDIA TITAN V GPU with 12GB memory. In all cases, the batch size was selected to be between 510, and our experiments found that results are not sensitive to the batch size. Adam
[33] is used for optimization in all cases. For base DeepSSM experiments, we use an initial learning rate of 0.001 with a cosine annealing learning rate scheduler, along with a weight decay of 0.00005. For TLDeepSSM, the autoencoder training was performed with a fixed learning rate of 0.0001 and a weight decay of 0.00005, followed by Tflank and joint training using an initial learning rate of 0.001 with cosine annealing and weight decay of 0.00005. Finally, we use early stopping for convergence by detecting the lowest validation loss found in the max number of epochs. All these optimization parameters are same for both the base particle loss Eq. 2 and for focal loss Eq 8.References
 [1] (2015) SegNet: a deep convolutional encoderdecoder architecture for robust semantic pixelwise labelling. arXiv preprint arXiv:1505.07293. Cited by: §2.
 [2] (1989) Multiresolution elastic matching. Computer vision, graphics, and image processing 46 (1), pp. 1–21. Cited by: §2.
 [3] (2018) An unsupervised learning model for deformable medical image registration. In CVPR, pp. 9252–9260. Cited by: §2.
 [4] (2005) Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International journal of computer vision 61 (2), pp. 139–157. Cited by: §1, §2.

[5]
(2020)
Quantifying the severity of metopic craniosynostosis: a pilot study application of machine learning in craniofacial surgery
. The Journal of craniofacial surgery 31 (3), pp. 697. Cited by: §1.  [6] (2019) A cooperative autoencoder for populationbased regularization of cnn image registration. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 391–400. Cited by: §2.
 [7] (2018) DeepSSM: A deep learning framework for statistical shape modeling from raw images. In Shape In Medical Imaging at MICCAI, Lecture Notes in Computer Science, Vol. 11167, pp. 244–257. Cited by: §1, §2, §3.1, §3.2.1.
 [8] (2018) Deep learning for endtoend atrial fibrillation recurrence estimation. In Computing in Cardiology, CinC 2018, Maastricht, The Netherlands, September 2326, 2018, Cited by: §1, §1, §2, §3.1, §3.2.1, §4.3.
 [9] (2018) Left atrial shape predicts recurrence after atrial fibrillation catheter ablation. Journal of cardiovascular electrophysiology. Cited by: §4.3.
 [10] (2014) Reversal of spherical remodelling of the left atrium after pulmonary vein isolation: incidence and predictors. Europace 16 (6), pp. 840–847. Cited by: §4.3.
 [11] (2013) Left atrial sphericity: a new method to assess atrial remodeling. impact on the outcome of atrial fibrillation ablation. Journal of cardiovascular electrophysiology 24 (7), pp. 752–759. Cited by: §4.3.
 [12] (2018) Deformable image registration using a cueaware deep regression network. IEEE Transactions on Biomedical Engineering 65 (9), pp. 1900–1911. Cited by: §2.
 [13] (2017) ShapeWorks: particlebased shape correspondence and visualization software. In Statistical Shape and Deformation Analysis, pp. 257–298. Cited by: §2, §3.1, §4.1, §4.1, §4.2, §4.3, §4.3, Table 1, Table 2, Table 3, §4, §5.
 [14] (2007) Shape modeling and analysis with entropybased particle systems. In IPMI, pp. 333–345. Cited by: §1, §1, §2, Table 2.

[15]
(2018)
Anatomical priors in convolutional networks for unsupervised biomedical segmentation.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 9290–9299. Cited by: §2.  [16] (200205) A minimum description length approach to statistical shape modeling. IEEE Transactions on Medical Imaging 21 (5), pp. 525–537. External Links: Document, ISSN 02780062 Cited by: §1, §1, §2.
 [17] (2014) Morphometry of anatomical shape complexes with dense deformations and sparse parameters. NeuroImage 101, pp. 35–49. Cited by: §2.
 [18] (201304) A pointcorrespondence approach to describing the distribution of image features on anatomical surfaces, with application to atrial fibrillation. In 2013 IEEE 10th International Symposium on Biomedical Imaging, Vol. , pp. 226–229. External Links: Document, ISSN 19457928 Cited by: §1.
 [19] (2001) Shape analysis of brain ventricles using spharm. In Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001), Vol. , pp. 171–178. External Links: Document, ISSN Cited by: §1.
 [20] (2016) Learning a predictable and generative vector representation for objects. CoRR abs/1603.08637. Cited by: §3.2.2, §3.2.2.

[21]
(201013–15 May)
Understanding the difficulty of training deep feedforward neural networks.
In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, Proceedings of Machine Learning Research, Vol. 9, pp. 249–256. Cited by: §0.A.2.  [22] (2020) Benchmarking offtheshelf statistical shape modeling tools in clinical applications. arXiv preprint arXiv:2009.02878. Cited by: §3.1.
 [23] (2018) On the evaluation and validation of offtheshelf statistical shape modeling tools: a clinical application. In International Workshop on Shape in Medical Imaging, pp. 14–27. Cited by: §1, §2, §3.1.
 [24] (1991) Hands: A pattern theoretic study of biological shapes. Springer, New York. Note: Cited by: §1.
 [25] (2018) Deep shape analysis on abdominal organs for diabetes prediction. In International Workshop on Shape in Medical Imaging, pp. 223–231. Cited by: §1.
 [26] (2013) Statistical shape modeling of cam femoroacetabular impingement. Journal of Orthopaedic Research 31 (10), pp. 1620–1626. External Links: ISSN 1554527X, Link, Document Cited by: §1, §1, §4.2, §4.2, §4.3.

[27]
(2015)
Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification
. CoRR abs/1502.01852. External Links: Link, 1502.01852 Cited by: §0.A.1.  [28] (2012) Left atrial anatomy revisited. Circulation: Arrhythmia and Electrophysiology 5 (1), pp. 220–228. Cited by: §4.3.
 [29] (2017) Temporal heartnet: towards humanlevel automatic analysis of fetal cardiac screening video. In MICCAI 2017, pp. 341–349. External Links: ISBN 9783319661858 Cited by: §2.
 [30] (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning  Volume 37, ICML’15, pp. 448–456. External Links: Link Cited by: §0.A.1.
 [31] (2012) Diffeomorphic sulcal shape analysis on the cortex. IEEE transactions on medical imaging 31 (6), pp. 1195–1212. Cited by: §2.
 [32] (2012) Interfrontal angle for characterization of trigonocephaly: part 1: development and validation of a tool for diagnosis of metopic synostosis. Journal of Craniofacial Surgery 23 (3), pp. 799–804. Cited by: §4.1.
 [33] (2017) Adam: a method for stochastic optimization. External Links: 1412.6980 Cited by: §0.A.2.
 [34] (2010) Optimisation of orthopaedic implant design using statistical shape space analysis based on level sets. Medical image analysis 14 (3), pp. 265–275. Cited by: §1.
 [35] (199811) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Document, ISSN 00189219 Cited by: §2.
 [36] (201412) Medical image classification with convolutional neural network. In 2014 13th International Conference on Control Automation Robotics Vision (ICARCV), Vol. , pp. 844–848. External Links: Document, ISSN Cited by: §2.
 [37] (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §3.3.
 [38] (2018) Deep regression tracking with shrinkage loss. In Proceedings of the European conference on computer vision (ECCV), pp. 353–369. Cited by: §3.3.
 [39] (2017) Integrating statistical prior knowledge into convolutional neural networks. In Medical Image Computing and Computer Assisted Intervention, M. Descoteaux, L. MaierHein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne (Eds.), Cham, pp. 161–168. Cited by: §2.
 [40] (1998) Statistical analysis with latent variables. Mplus User’s guide 2012. Cited by: §4.1.
 [41] (2017) Anatomically constrained neural networks (ACNN): application to cardiac image enhancement and segmentation. CoRR abs/1705.08302. External Links: Link, 1705.08302 Cited by: §2.
 [42] (2012) Functional maps: a flexible representation of maps between shapes. ACM Transactions on Graphics (TOG) 31 (4), pp. 30. Cited by: §2.
 [43] (2015) Unet: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597. External Links: Link, 1505.04597 Cited by: §2.
 [44] (1999) Nonrigid registration using freeform deformations: application to breast mr images. IEEE Transactions on Medical Imaging 18 (8), pp. 712–721. Cited by: §2.
 [45] (200003) Parametric estimate of intensity inhomogeneities applied to MRI. IEEE Transactions on Medical Imaging 19 (3), pp. 153–165. Cited by: §2.
 [46] (200607) Framework for the statistical shape analysis of brain structures using spharmpdm. Note: http://hdl.handle.net/1926/215 Cited by: §1, §1, §2.
 [47] Left atrial shape predicts recurrence after atrial fibrillation catheter ablation. Journal of Cardiovascular Electrophysiology 0 (0), pp. . External Links: Document Cited by: §1.
 [48] (2020) Unsupervised shape normality metric for severity quantification. arXiv preprint arXiv:2007.09307. Cited by: §4.1.
 [49] (1942) On growth and form.. On growth and form.. Cited by: §1.
 [50] (2020) Probabilistic 3d surface reconstruction from sparse MRI information. CoRR abs/2010.02041. External Links: Link, 2010.02041 Cited by: §2.
 [51] (2018) Uncertainty quantification in cnnbased surface prediction using shape priors. CoRR abs/1807.11272. External Links: Link, 1807.11272 Cited by: §2.
 [52] (1993) A latent trait finite mixture model for the analysis of rating agreement. Biometrics, pp. 823–835. Cited by: §4.1.
 [53] (2017) Novel computational analysis of left atrial anatomy improves prediction of atrial fibrillation recurrence after ablation. Frontiers in physiology 8, pp. 68. Cited by: §4.3.
 [54] (2019) Deeply supervised 3d fully convolutional networks with group dilated convolution for automatic mri prostate segmentation. Medical physics 46 (4), pp. 1707–1718. Cited by: §2.
 [55] (2017) Quicksilver: fast predictive image registration–a deep learning approach. NeuroImage 158, pp. 378–396. Cited by: §2.
 [56] (2008) Hippocampus shape analysis and latelife depression. PLoS One 3 (3), pp. e1837. Cited by: §1.
 [57] (2015) 3D deep learning for efficient and robust landmark detection in volumetric data. In MICCAI 2015, pp. 565–572. External Links: ISBN 9783319245539 Cited by: §2.
Comments
There are no comments yet.