Unsupervised Shape Normality Metric for Severity Quantification

07/18/2020 ∙ by Wenzheng Tao, et al. ∙ 0

This work describes an unsupervised method to objectively quantify the abnormality of general anatomical shapes. The severity of an anatomical deformity often serves as a determinant in the clinical management of patients. However, experiential bias and distinctive random residuals among specialist individuals bring variability in diagnosis and patient management decisions, irrespective of the objective deformity degree. Therefore, supervised methods are prone to be misled given insufficient labeling of pathological samples that inevitably preserve human bias and inconsistency. Furthermore, subjects demonstrating a specific pathology are naturally rare relative to the normal population. To avoid relying on sufficient pathological samples by fully utilizing the power of normal samples, we propose the shape normality metric (SNM), which requires learning only from normal samples and zero knowledge about the pathology. We represent shapes by landmarks automatically inferred from the data and model the normal group by a multivariate Gaussian distribution. Extensive experiments on different anatomical datasets, including skulls, femurs, scapulae, and humeri, demonstrate that SNM can provide an effective normality measurement, which can significantly detect and indicate pathology. Therefore, SNM offers promising value in a variety of clinical applications.



There are no comments yet.


page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many medical conditions are indicated by pathological shapes, such as abnormal skull shape (craniosynostosis), hip deformities (femoroacetabular impingement), and shoulder dislocation (scapula and humerus injury). Current medical imaging methods aim to identify abnormal anatomical configurations and guide physicians toward diagnosis and treatment plans; severe cases may warrant surgical intervention, while observation might suffice for mild deformities. But how can one objectively quantify the severity of a pathological three-dimensional (3d) shape deformity? There is no easy answer, because ”normal” anatomy represents a spectrum, comprised of a variety of different shapes and sizes due to genetic and environmental factors, that does not necessarily result in pathology or compromised function.

Current clinical practice frequently relies on severity assessment by medical professionals. However, despite the best efforts of clinicians and their rigorous medical training, there is a concern for bias in individual clinicians[25],due to variables such as specific training sites or years of practice. For example, there is a wide disparity of opinion regarding the diagnosis of mild nonsyndromic metopic synostosis. [31]

. The concern of bias applies also to supervised machine learning approaches


, because the bias in the training data may translate to bias in the model predictions. Some types of biases are well studied (e.g., discrimination due to gender or ethnicity), but the biases in evaluating shape pathologies are more subtle and are poorly understood. Limited number of pathological samples, especially with relatively uncommon diseases, such as craniosynostosis, poses a problem of data skewness in supervised analysis.

To circumvent these issues, in this paper we explore an unsupervised approach to quantify the severity of shape pathologies. By unsupervised, we mean that our method relies solely on examples of shapes of normal anatomical variations. They are provided as point correspondences that describe normal shape variations in a population (as in point distribution models [9, 6, 23]). We don’t use any example shapes of a given pathology (such as metopic craniosynostosis), which means that our method is fully general and agnostic to any specific type of pathologies. Because it merely quantifies the distance from normal anatomical variations, we call it “Shape Normality Metric” (SNM).

We found that even without being informed by examples of specific pathologies, SNM is very useful in predicting severity of shape deformity, as we validated by retroactively comparing the SNM results with evaluations by human experts. However, compared with supervised methods, the SNM method has distinct advantages: no pathological training data required, thus immune to any biases present in such data. We foresee a practical utility of SNM in clinical decision making – not as a method to automatically issue a diagnosis, but rather by providing a new tool to the clinicians, akin to a more sophisticated ”measuring tape” (one which measures statistical shape differences).

2 Related work

Unsupervised severity quantification makes heavy use of statistical shape modeling [27]. It involves representing a set of 2D/3D shapes such that it captures the population statistics, followed by application-specific analysis. A popular way of shape representation is via correspondence based methods/particle distribution models [14]. Correspondences are geometrically consistent points placed manually or automatically on the set of shapes, simple to use and convenient to parametrize shape for subsequent statistics. Recent years have seen a rise of methods that automatically places a dense set of correspondences on a population of anatomical shapes [26, 11, 6]

. The usual methodology of analyzing dense correspondence representation is to express them in a low-dimensional space via principal component analysis (PCA)

[5] and use that as the shape descriptor. These methods have proved useful in many fronts of medical imaging such as orthopedics [17] and cardiology [13]. They have also been used in parametrizing skull shape [10], serving as a platform for our proposed modeling on pediatric skulls.

Also related is the field of unsupervised anomaly detection. These methods usually learn the distribution of normal samples and detect the anomaly by measuring its distance/dissimilarity. One of the most common ways to detect anomalies from a given distribution is via the use of Mahalanobis distance. A robust variant of the Mahalanobis distance based on the Minimum Covariance Determinant has been proposed, tackling the sensitivity to outliers in observed normal samples


. Another branch adopts deep learning related techniques while avoiding its preconditions on a large number of pathological samples. AnoGAN is proposed to identify anomalous images, composed of a Generative Adversarial Networks together with a novel anomaly scoring scheme

[24]. Further, Auto-encoder based methods are utilized for the detection of lesion regions by learning data distribution of healthy brain MRIs [7]. Recently, Variational Auto-Encoders have been used to check and localize suspicious parts within images [32].

There have been efforts to quantify the severity of metopic craniosynostosis by parametrizing cranial shapes. Diagnosis is particularly challenging as the metopic suture normally closes before 1 year of age [29] which can limit the methods for detecting suture fusion (used for other forms of craniosynostosis) [12, 21]. Common approaches thus far involve defining a simple geometric measure on the skull and correlate that with the craniosynostosis severity, the most popular of these being the interfrontal angle (IFA) [18], which captures the degree of trigonocephaly (triangular frontal head shape). IFA can be calculated from the CT scans and has been effective in distinguishing normal and metopic head shapes [30, 18], though its utility is limited in predicting the severity among abnormal shapes. Recent work used expert supervision with an expressive shape descriptor to measure the craniosynostosis severity [2], however, concerns exist about biases in the export ratings.

3 Materials and methods

By denoting the -th 3D shape as , where indicates whether the shape is normal or pathological , we have the shape observations .

Shapes are continuous surfaces and thus inconvenient for calculation. With Shapeworks [4], we obtain the particle-based representation. Specifically, each shape is described by a cloud of ordered particles serving as correspondences across different shape samples. This particle cloud is denoted as

, a flattened vector of these

correspondences’ coordinates .

Figure 1: Flow chart of SNM with training and testing pipelines.

As shown in Figure 1

, there is 1) training, and 2) inference (testing) pipeline. In training, we used Shapeworks to obtain correspondences of present normal shapes which will be frozen during inference. Meanwhile, Shapeworks estimates density function of these correspondences

. Next for each new shape where is unknown, we run Shapeworks on jointly with previously frozen normal correspondences . This way we obtain the correspondences for shape , as the optimal landmark points under the density function of , which both faithfully encode the shape information of and are consistent with .

With the shapes represented in the form of coordinate vectors, we can apply probabilistic methods to model the shape variations. To consider both seen and unseen variations with limited normal samples for training, we adopt the Probabilistic Principal Component Analysis (PPCA) [3].

3.1 Probabilistic Principle Component Analysis

Assume we have normal observations , where the -th observation is a -dimensional feature vector . In the generative view of PPCA, we assume that the principle normal shape variations originate from a low-dimensional vector in a compact latent space , with . Specifically,


where obeys standard multivariate Gaussian distribution ,

is the weighting matrix controlling the linear transformation,

is the expectation of normal shape’s correspondences, and is the -dimensional noise.

With Maximum Likelihood Estimation, we have the latent parameters estimated from normal observations:


are the descending eigen-values of the empirical co-variance matrix

, , is the diagonal matrix with largest , has the corresponding eigen-vectors as its columns, and

is an arbitrary orthogonal matrix. As

does not affect we simply set to the identity.

3.2 Shape Normality Metric

3.2.1 Overall Mahalanobis Distance

To measure the abnormality of , we calculate its Mahalanobis distance as the Shape Normality Metric (SNM):


The distance in Equation 3 consists of 2 components. First is the Mahalanobis distance in the -dimensional latent space , measuring deformity in normal variations:


The other is the normalized distance to in the -dimensional null space, measuring deformity in abnormal variations undetected by the training normal shapes:


where has columns as orthonormal null vectors to . It generalizes SNM for unseen deformity, with its superiority corroborated in our experiments.

3.2.2 Point-wise Mahalanobis Distance.

Equation 3 can also be regarded as a -norm, of the new shape’s whitened deviations from mean:


With whitened deviations in -th correspondence coordinates expressed as , we use the whitened deviation’s -norm to measure the shape’s local deformity:


Equation 7 normalize the deviation on different correspondences to lie in the same variance scale so that they are comparable.

4 Results

In this section, we present experimental results on various medical conditions demonstrating that SNM provides accurate deformity scores, which are strongly indicative of general pathology deformations. For Metopic Craniosynostosis, SNM outperforms deformity ratings issued by human experts and is comparable with their aggregation, according to Area Under Curve (AUC) [16] with the known diagnosis as ground truth.

4.1 Metopic Craniosynostosis

Metopic Craniosynostosis is characterized by skull deformation due to the premature fusion of the metopic suture. Affected individuals typically exhibit trigonocephaly with the classic triad of a narrow ”keel-shaped” forehead, biparietal widening, and hypotelorism.

4.1.1 Metopic Craniosynostosis dataset

We aggregated 124 head CT scans of patients between 6-16 months old. 30 (24%) of these patients are diagnosed with Metopic Craniosynostosis; while the remaining 94 (76%) are trauma patients with no known intracranial or calvarial abnormality (normal patients). All 124 scans were segmented as outer skull surfaces. We used 74 randomly selected normal skull shapes for training, and left 20 normal shapes, as well as all 30 pathological shapes, for testing.

To collect the ground truth deformity degrees of the 50 testing skulls, we displayed their 3d segmentation on our website and asked physicians to estimate their deformity on a 5 point Likert scale. In total, 31 physicians participated in the survey, among them 14 rated all of the 50 skulls and the other 17 rated 20 skulls randomly. We modeled the expert ratings by Latent trait theory [28, 22]

which accounts for potential raters’ bias and inconsistency and estimated the continuous latent severity of the 50 skulls using Maximum Likelihood Estimation (MLE). The AUC of diagnosis and ratings from the 14 physicians as individuals has mean of 0.8860 and standard deviation of 0.03426, with the highest one being 0.9458. AUC of diagnosis and MLE severity is 0.9850, indicating that aggregation of physicians estimations is more accurate than individuals.

4.1.2 Severity predictions

We trained SNM on 74 normal skulls and predicted deformity scores for the 50 testing skulls. To unify the skulls as inputs for SNM, we cropped the skulls above the plane defined by the nasion and two porion points. Next, they went through 2 pipelines in Figure 1. In the training pipeline, we ran Shapeworks optimization on the 74 normal skulls jointly, obtained their correspondences of 2048 particles, calculated their mean and covariance matrix using PPCA. PPCA requires specification of the latent subspace dimension and by convention we set it as the dimension where it explains 95% of the variance. In the testing pipeline, for every testing skull, we fed it into Shapeworks together with the frozen normal correspondences, got its correspondences and applied deformity analysis by calculating Mahalanobis Distance.

Figure 2: Evaluation of Shape Normality Metric on Craniosynostosis. [PPCA-Column@d] is using the Mahalanobis distance in the -dimensional latent subspace as Equation 4. [PPCA-Null@d] is using the Mahalanobis distance in the null space as Equation 5. [PPCA-Trunc@] is using the Mahalanobis distance in the whole space as Equation 3, where is either principle subspace dimension or explained variance ratio. Our SNM is [PPCA-Trunc@0.95]

4.1.3 Correlation analysis

The agreement between SNM and the continuous latent severity is evaluated by Pearson Correlation Coefficient [1] and Spearman Correlation Coefficient [19]. The agreement between SNM and the diagnosis labels is evaluated by AUC.

The results are shown in Figure 2. Our SNM is [PPCA-Trunc@0.95], which achieves the Pearson Correlation Coefficient of 0.7501, Spearman Correlation Coefficient of 0.7881 and AUC of 0.965, showing strong agreement with the ground truth. Also, AUC is even higher than the highest of our 14 physicians, which is encouragingly showing that our SNM is a better indicator in the case of Craniosynostosis compared with deformity estimations by individual physicians.

We also compare the proposed SNM metric with several baselines. First, [PPCA-Null@0] is essentially calculating the Mahalanobis distance with an isotropic covariance matrix and it has relatively poor performance. This indicates that it is disadvantageous to treat all the deformation components with the same variance. It can also be shown in Figure 3, where different point-wise deviations of the same testing skull are compared using MeshLab [8]

. In the one whitened by SNM, point-wise signed distance is more symmetric and similar to root of Chi-squared distribution, and abnormal variations are accentuated on the forehead (Metopic Craniosynostosis pattern). Figure

2 also shows that SNM is not very sensitive to hyper-parameter settings, as 3 different explained variance ratios generate similar performance which is close to the best presented.

Figure 3: Unified point-wise signed distance on the same pathological skull, femur, scapula and humerus: original (left) and whitened by SNM (PPCA-Trunc@0.95) using Equation 7 (right). Whitened is more symmetric and accentuated the principal pathological pattern (in yellow frame).
Figure 4: Evaluation of Shape Normality Metric on Femoroacetabular impingement, Shoulder Dislocation-Humerus and Scapula. Notation is the same as Figure 2

4.2 Cam-FAI and Shoulder Dislocation

Cam-type femoroacetabular impingement can cause hip pain and is characterized by Pincer morphology with acetabular overcoverage [15]. We collected 37 femur bones with Cam-FAI and 59 normal ones.

The type of shoulder dislocation we studied is characterized by aberrant positioning of the superior portion of the humerus away from its usual location in the glenohumeral joint (the ball and socket joint of the shoulder). It can result in bone deformation located primarily at the joint. We collected 53 humerus scans and 54 scapula scans from patients diagnosed with shoulder dislocation, as well as 41 normal humerus and scapula scans. These respectively formed our humerus and scapula datasets.

For the 3 datasets above, we segmented the scans to extract the outer bone surfaces and cropped them uniformly. We evaluated SNM on each of the 3 datasets using 3-repeat 3-fold cross-validation. Specifically, we randomly shuffled the normal samples data, divided it into 3 folds, trained SNM on each pair of 2 folds, predicted severity on the remaining fold along with pathological samples and calculated the AUC with the predictions and diagnosis. This entire process is repeated 3 times. For each dataset, we report the average of these 9 AUC scores.

As is shown in Figure 4, on all the 3 datasets, [PPCA-Column@d] methods are much worse than the others, indicating that the informative variations mainly lie in the null space which is not captured by the training samples. This indicates it is an advantage of SNM to calculate Mahalanobis distance in the entire space. The results also show that SNM is not sensitive to the hyper-parameters. For the explained variance ratio, all 3 different explained variance ratios are generating nearly the best scores. For the number of particles in Shapeworks, we used 1024 particles for Femur and Humerus, and we did 2 parallel experiments using 2048 and 4096 particles for Scapula. All the numbers are providing visually adequate and smooth coverage of the shape. As shown in Figure 4, SNM is not sensitive to the number of particles as well. And the whitened deviations are shown in Figure 3, where the pathological variations are highlighted after whitening. While it is hard to find a predominant pattern for the scapula, maybe due to its complicated structure and deformation patterns.

5 Conclusions

We proposed Shape Normality Metric (SNM) as an objective and unsupervised method to measure shape deformity. SNM employs particle-based shape representation and applies Probabilistic Principle Component Analysis to calculate Mahalanobis distance in the whole space, properly accounting for both seen and unseen variations. SNM does not require any data or information about pathological samples and is thus unaffected by pathological population size limitations and/or bias in ratings issued by human experts. Extensive experiments on real-world datasets validated that SNM provides an accurate and reproducible metric for a wide range of shape deformity quantification. Therefore, SNM has the potential to be applied to a variety of clinical scenarios, with particular promise in objectifying communication on shape deformations and planning interventions.


  • [1] J. Benesty, J. Chen, Y. Huang, and I. Cohen (2009) Pearson correlation coefficient. In Noise reduction in speech processing, pp. 1–4. Cited by: §4.1.3.
  • [2] R. Bhalodia, L. A. Dvoracek, A. M. Ayyash, L. Kavan, R. Whitaker, and J. A. Goldstein (2020) Quantifying the severity of metopic craniosynostosis: a pilot study application of machine learning in craniofacial surgery. Journal of Craniofacial Surgery. Cited by: §1, §2.
  • [3] C. M. Bishop (2006) Pattern recognition and machine learning. springer. Cited by: §3.
  • [4] J. Cates, S. Elhabian, and R. Whitaker (2017) Shapeworks: particle-based shape correspondence and visualization software. In Statistical Shape and Deformation Analysis, pp. 257–298. Cited by: §3.
  • [5] J. Cates, P. T. Fletcher, M. Styner, H. C. Hazlett, and R. Whitaker (2008) Particle-based shape analysis of multi-object complexes. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 477–485. Cited by: §2.
  • [6] J. Cates, P. T. Fletcher, M. Styner, M. Shenton, and R. Whitaker (2007) Shape modeling and analysis with entropy-based particle systems. In Biennial International Conference on Information Processing in Medical Imaging, pp. 333–345. Cited by: §1, §2.
  • [7] X. Chen and E. Konukoglu (2018) Unsupervised detection of lesions in brain mri using constrained adversarial auto-encoders. arXiv preprint arXiv:1806.04972. Cited by: §2.
  • [8] P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia (2008)

    Meshlab: an open-source mesh processing tool.

    In Eurographics Italian chapter conference, Vol. 2008, pp. 129–136. Cited by: §4.1.3.
  • [9] T. F. Cootes, C. J. Taylor, et al. (2004)

    Statistical models of appearance for computer vision

    Technical report, University of Manchester. Cited by: §1.
  • [10] M. Datar, J. Cates, P. T. Fletcher, S. Gouttard, G. Gerig, and R. Whitaker (2009) Particle based shape regression of open surfaces with applications to developmental neuroimaging. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 167–174. Cited by: §2.
  • [11] R. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor (2002-05) A minimum description length approach to statistical shape modeling. IEEE Transactions on Medical Imaging 21 (5), pp. 525–537. External Links: Document, ISSN 0278-0062 Cited by: §2.
  • [12] K. A. Eley, S. R. Watt-Smith, F. Sheerin, and S. J. Golding (2014) Black Bone MRI: a potential alternative to ct with three-dimensional reconstruction of the craniofacial skeleton in the diagnosis of craniosynostosis. European Radiology 24 (10), pp. 2417–2426. Cited by: §2.
  • [13] G. Gardner, A. Morris, K. Higuchi, R. MacLeod, and J. Cates (2013-04) A point-correspondence approach to describing the distribution of image features on anatomical surfaces, with application to atrial fibrillation. In 2013 IEEE 10th International Symposium on Biomedical Imaging, Vol. , pp. 226–229. External Links: Document, ISSN 1945-7928 Cited by: §2.
  • [14] U. Grenander, Y. Chow, and D. M. Keenan (1991) Hands: A pattern theoretic study of biological shapes. Springer, New York. Note: Cited by: §2.
  • [15] K. Hack, G. Di Primio, K. Rakhra, and P. E. Beaulé (2010) Prevalence of cam-type femoroacetabular impingement morphology in asymptomatic volunteers. JBJS 92 (14), pp. 2436–2444. Cited by: §4.2.
  • [16] J. A. Hanley and B. J. McNeil (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve.. Radiology 143 (1), pp. 29–36. Cited by: §4.
  • [17] M. D. Harris, M. Datar, R. T. Whitaker, E. R. Jurrus, C. L. Peters, and A. E. Anderson (2013) Statistical shape modeling of cam femoroacetabular impingement. Journal of Orthopaedic Research 31 (10), pp. 1620–1626. External Links: ISSN 1554-527X, Link, Document Cited by: §2.
  • [18] R. Kellogg, A. C. Allori, G. F. Rogers, and J. R. Marcus (2012) Interfrontal angle for characterization of trigonocephaly: part 1: development and validation of a tool for diagnosis of metopic synostosis. Journal of Craniofacial Surgery 23 (3), pp. 799–804. Cited by: §2.
  • [19] S. Kokoska and D. Zwillinger (2000)

    CRC standard probability and statistics tables and formulae

    Crc Press. Cited by: §4.1.3.
  • [20] C. Leys, O. Klein, Y. Dominicy, and C. Ley (2018) Detecting multivariate outliers: use a robust variant of the mahalanobis distance. Journal of Experimental Social Psychology 74, pp. 150–156. Cited by: §2.
  • [21] C. S. Mendoza, N. Safdar, K. Okada, E. Myers, G. F. Rogers, and M. G. Linguraru (2014) Personalized assessment of craniosynostosis via statistical shape modeling. Medical Image Analysis 18 (4), pp. 635–646. Cited by: §2.
  • [22] L. K. Muthén and B. O. Muthén (1998) Statistical analysis with latent variables. Mplus User’s guide 2012. Cited by: §4.1.1.
  • [23] I. Oguz, J. Cates, M. Datar, B. Paniagua, T. Fletcher, C. Vachet, M. Styner, and R. Whitaker (2016) Entropy-based particle correspondence for shape populations. International journal of computer assisted radiology and surgery 11 (7), pp. 1221–1232. Cited by: §1.
  • [24] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging, pp. 146–157. Cited by: §2.
  • [25] G. T. Sica (2006) Bias in research studies. Radiology 238 (3), pp. 780–789. Cited by: §1.
  • [26] M. Styner, I. Oguz, S. Xu, C. Brechbuehler, D. Pantazis, J. Levitt, M. Shenton, and G. Gerig (2006-07) Framework for the statistical shape analysis of brain structures using spharm-pdm. Note: http://hdl.handle.net/1926/215 Cited by: §2.
  • [27] D. W. Thompson et al. (1942) On growth and form.. On growth and form.. Cited by: §2.
  • [28] J. S. Uebersax and W. M. Grove (1993) A latent trait finite mixture model for the analysis of rating agreement. Biometrics, pp. 823–835. Cited by: §4.1.1.
  • [29] J. Weinzweig, R. E. Kirschner, A. Farley, P. Reiss, J. Hunter, L. A. Whitaker, and S. P. Bartlett (2003) Metopic synostosis: defining the temporal sequence of normal suture fusion and differentiating it from synostosis on the basis of computed tomography images. Plastic and Reconstructive Surgery 112 (5), pp. 1211–1218. Cited by: §2.
  • [30] B. C. Wood, C. S. Mendoza, A. K. Oh, E. Myers, N. Safdar, M. G. Linguraru, and G. F. Rogers (2016) What’s in a name? accurately diagnosing metopic craniosynostosis using a computational approach. Plastic and Reconstructive Surgery 137 (1), pp. 205–213. Cited by: §2.
  • [31] S. T. Yee, J. A. Fearon, A. K. Gosain, M. R. Timbang, F. A. Papay, and G. Doumit (2015) Classification and management of metopic craniosynostosis. Journal of Craniofacial Surgery 26 (6), pp. 1812–1817. Cited by: §1.
  • [32] D. Zimmerer, F. Isensee, J. Petersen, S. Kohl, and K. Maier-Hein (2019) Unsupervised anomaly localization using variational auto-encoders. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 289–297. Cited by: §2.