DmoGpm
Scalable Dynamic Multi Shape Modelling
view repo
Statistical shape models (SSMs) are stateoftheart medical image analysis tools for extracting and explaining features across a set of biological structures. However, a principled and robust way to combine shape and pose features has been illusive due to three main issues: 1) Nonhomogeneity of the data (data with linear and nonlinear natural variation across features), 2) nonoptimal representation of the 3D motion (rigid transformation representations that are not proportional to the kinetic energy that move an object from one position to the other), and 3) artificial discretization of the models. In this paper, we propose a new framework for dynamic multiobject statistical modelling framework for the analysis of human joints in a continuous domain. Specifically, we propose to normalise shape and dynamic spatial features in the same linearized statistical space permitting the use of linear statistics; we adopt an optimal 3D motion representation for more accurate rigid transformation comparisons; and we provide a 3D shape and pose prediction protocol using a Markov chain Monte Carlo samplingbased fitting. The framework affords an efficient generative dynamic multiobject modelling platform for biological joints. We validate the framework using a controlled synthetic data. Finally, the framework is applied to an analysis of the human shoulder joint to compare its performance with standard SSM approaches in prediction of shape while adding the advantage of determining relative pose between bones in a complex. Excellent validity is observed and the shoulder joint shapepose prediction results suggest that the novel framework may have utility for a range of medical image analysis applications. Furthermore, the framework is generic and can be extended to n>2 objects, making it suitable for clinical and diagnostic methods for the management of joint disorders.
READ FULL TEXT VIEW PDFScalable Dynamic Multi Shape Modelling
Scalable Dynamic Multi Shape Modelling
Finding a minimum subset with patterns that can generally and specifically represent a given set of structures with feature classes is a frequently researched area [1]
. A common strategy is to estimate a subspace of homogeneous patterns (for example, geometric shapes) through models. This inductive learning approach leads to models that allow for automated observation and interpretation of the learned patterns. In the medical image analysis domain, pattern analysis has to contend with two existing problems. First, patterns may present across nonhomogeneous feature classes. An example is morphofunctional analysis where determining the changes in relative spatial dynamics of different anatomical structures adds an additional pattern class on top of the morphological (geometric) pattern class
[2]. Second, patterns in some feature classes may belong to nonlinear spaces. The current trend is to formalise the problem as an inverse problem and use deep learning based generative models
[3, 4]. However deep generative models tend to have high specificity but low generality; in some cases explaining outoftraining data inputs with high confidence while being wrong [5, 6, 7]. Furthermore deep generative models require large numbers of training examples to be successful. Finally, the interpretability of deep generative models remains complex. Thus, nonparametric analysis of nonlinear variations of nonhomogeneous features for different medical image analysis tasks (shape completion, morphometric analysis, segmentation) with the same model, remains desirable.With regard to analysis of shape, a well established and understood formalism for analysing 3D geometric variation in a linearized statistical space exists in the form of statistical shape modelling (SSM). SSMs typically model the data in Euclidean vector space using principal component analysis (PCA) that treats feature variation changes as a linear combination of local translations only
[9, 8, 10, 11]. While limited in capturing the nonlinearity in shape space, the validity of this linearization for rigid shapes has been codified in the literature for single anatomical structures [9, 8, 11]. Efforts for faithfully representing the nonlinearity of shape space have been reported but have not become mainstream due to computational inefficiency and a lack of robustness [12]. Recently, Luthi et al (2018)[11] introduced a generalisation of SSMs, referred to as Gaussian process morphable models (GPMMs). In this framework, the parametric lowdimensional model was represented as a Gaussian process over deformation fields obtained from training examples. In contrast to discrete models that are dependent on artificial discretization, GPMMs are inherently continuous, that is, permitting of the arbitrary discretization of the domain on which the model is defined. However, GPMMs do not embed interobject shape correlation, nor the anatomophysiological relationship between articulating objects. This is because current SSMs and GPMMs are unable to represent, in a statistically robust and intuitive way, an articulating anatomical complex composed of several rigid substructures which can move relative to each other.Similar to the shape case, pose variation using both linear and nonlinear statistical descriptions have been reported [14, 17, 16, 18, 15, 13, 19]. Previous studies have reported the embedding of relative pose variation of articulating bones into SSM frameworks, in the form of statistical shape and pose models (SSPM) [14, 17]. Others have reported statistical pose models (SPM) only; that is, the model of variation across spatial orientations without inclusion of their shapes [20]. A straightforward approach to model shape and pose together is to model joint flexibility implicitly by incorporating joint motion in statistical space [18]. However, this assumes that pose variations are a statistical property of anatomy. Another approach is to concatenate pose and anatomical features into a joint vector and apply linear PCA to obtain joint principal components [16]. However, this describes the pose transformations in Euclidean vector (linear) space; contrary to reality as they belong to a manifold and thus need to be described using nonlinear statistical methods.
Various efforts have been reported in the literature to model shape and pose features together. Bossa et Olmos (2007) proposed parametric lowdimensional models of several subjects in different poses with the pose variation representation obtained through principal geodesic analysis (PGA); a nonlinear extension of PCA [19]. Shape and pose features of the brain cortex were concatenated in a long vector and standard multivariate statistics were extracted [17]. Fitzpatrick et al (2011) [15] reported on a characterisation of the relationship between shape and contact mechanics of the patella and femur using an SSM of the articulating surfaces between both bones. The authors considered the patellofemoral contact space as a new object and modelled it as a standard SSM. Chen et al (2014) proposed an articulated SSM for segmentation of the wrist from computed tomographic (CT) images without concatenating the shape and pose features; their model was a combination of an SSM and an SPM developed using PGA [21]. All the above shape and pose PGA based models used a standard representation (Rodrigues, quaternion and Euler representation) for modelling the rigid transformations describing the pose. Moreau et al (2017) reported that the standard representation is limited in its ability to model rigid transformations of noncompact objects (with respect to a shape compactness measure). They instead proposed a new norm for statistical analysis of rigid transformations that is robust for analysing noncompact objects. However, they only demonstrated a new norm for SPMs without considering shape and pose variability analysis. A nonlinear shape modelling approach based on differential coordinates (thus avoiding global rigid transformation through local rotation and stretching) has been reported [22, 12]. While this approach captures the inherent nonlinearity in shape variation, it still suffers from artificial discretization which prevents marginalization of the resultant probabilistic model. Additionally, it does not allow for modelling multiobject structures.
We hypothesise that for a principled approach to multiobject shape and pose analysis, the following limitations identified in the literature need to be addressed : 1) features of different classes (morphology and pose) need to be properly normalised to obtain an homogeneous analysis space; 2) motion requires an optimal representation to induce an efficient associated norm for comparison of rigid transformations and 3) models need to be built in a continuous domain. We propose a novel framework that addresses all the above limitations by modelling shape and pose variation of complex articulated objects composed of multiple rigid objects. We refer to the proposed modelling framework as Dynamic MultiobjectGaussian Process Modelling (DMOGPM)^{1}^{1}1The code for this project is available online for research use: https://github.com/rassaire/DmoGpm.
The structure of the rest of the paper is as follows. In section 2, we identify and describe previous and related work in the domain of SSPMs and GPMMs. In section 3, we describe the DMOGPM framework and explain 1) how the framework is extended and generalised to a joint consisting of objects and the spatial positions are projected in a linear space using PGA, 2) how the new framework addresses the inconsistent representation of motions typical in modelling joints with noncompact structures such as long bones using the standard representation, and 3) how we adopt an energy displacement representation that is more suitable for reconstruction of the objects of interest. In section 4, we validate the framework using synthetic data with predefined and accurately known shape and spatial dynamics. We also propose an automatic pipeline for dynamic multiobject prediction using a combination of Markov Chain Monte Carlo sampling and DMOGPM approach which allows for registering a model to pathological or incomplete multiobjects while avoiding local minima. In section 5, we apply the framework to complex shoulder joint data to demonstrate the utility of the approach compared to the stateoftheart sample kernelbased GPMM.
Before describing our DMOGPM framework, we summarise the main concepts behind traditional statistical shape models, multiobject statistical shape and pose models, and GPMMs on which we will capitalise to synthesize the dynamic multiobject framework.
Let’s consider a multiobject dataset with examples, where each example is composed of objects. For example, the glenohumeral joint with training data samples and having rigid bony objects  scapula and humerus. The underlying assumption in the development of multiobject statistical shape and pose models (SSPMs) is that there exists a representational space where the relationship between shape and pose can be learnt from the dataset. This concept of SSPMs is welladapted for modelling articulated objects in motion. The object of the data sample can be represented by , a vector of discrete landmarks with known correspondence between the examples of the object, the reference object and a coordinate vector of points of , typically, one selected object from the database that should be closest to the mean. The rigid transformation that aligns the object of the example to its reference object is defined as
where is the space of rigid transformations defining a Lie group with being the sum of distance metric. The Lie group (smooth manifold) defined by the system of objects is the direct product of the rigid transformation groups denoted as . Shape variations are relatively small and are traditionally assumed to belong to a vector space, while pose (rigid transformations) variations belong to a Riemannian manifold which is not a vector space. In contrast to shape models that are built using a classical PCA, the pose models are built using PGA which is in essence a PCA in a space tangential to the manifold[19, 23]. Several studies use PGA to perform linear statistics on a Riemannian manifold [19, 21, 14, 13]. To perform the PGA, the transformation for the object is represented in this tangent space to with respect to the mean rigid transformation using the log mapping:
(1) 
where are the parameters of the rigid transformation representing the Euler’s, Rodrigues’s or quaternion parameters. These representations describe the rotations in 3D. Rodrigues and quaternion represent a D rotation as an axis vector and a rotation angle while Euler represents it with three angles. For the rest of the paper these representations will be called standard representations (SR). To map the transformation back to the manifold, an exponential mapping is used:
To build the SSPMs, shape and pose features of the example are concatenated in joint feature vectors [17] by:
where and , and denote shape and pose weightings, respectively, that balance the relative importance between shape and pose within the model. The resulted norm of the SR is defined as:
(2) 
where and are rotation and translation vectors respectively; and the scaling factor. To build the model, an additional assumption is that the joint feature residual dataset
can be modelled by a normal distribution similar to point distribution models PDMs
[9] as follows:where is the joint mean and , the covariance matrix, are estimated from the dataset as:
The model is obtained by performing PCA which in a probabilistic interpretation leads to:
where is the combined eigenvector of the covariance matrix (associated to the eigenvalue ), obtained from single value decomposition (SVD) and . Then, the SSPM is split into two parts, the shape component, typically a PDM, defined as follows:
(3) 
and the pose component, typically SPMs, defined as:
(4) 
A new instance of a discrete SSPM is obtained as which derives the shape at a relative spatial position.
Gaussian process morphable models[11] are an extension of PDMs in equation 3. In contrast to PDMs, GPMMs model deformations defined on a continuous domain, although they can always be discretized to obtain a model that is mathematically equivalent to a PDM. The advantage of such model is that after training, the statistical descriptions may be marginalized (or customized kernels could be used) to a region of interest which will provide better interpretability of the underlying information. Additionally, having freedom in defining the covariance function may be useful because explicit bias kernels can be added based on expert knowledge. This GPMM flexibility overcomes a typical limitation of traditional SSM whereby insufficient training examples result in loss of model generality [24]. In GPMMs, shape correspondence, established across examples, leads to deformation fields defined from the reference shape to examples in the training data. These deformations are then modelled as Gaussian Processes () on the reference shape domain () thus making them independent of discretization. The GPMMs are formulated as follows:
(5) 
where for all points of , the mean and covariance (kernel function) are defined as:
(6)  
The function is the deformation field associated with the dense correspondence established across the shape parameters between the reference and the example. The deformations can be represented in terms of an orthogonal set of basis functions :
(7) 
where
are the eigenvalue/eigenfunction couples. These couples are obtained using KarhunenLoève expansion and Nyström approximation
[25, 26], as explained in [11]. That leads to a probabilistic PCAbased model of shape directly on the deformations.In this section we develop the dynamic multiobject Gaussian process model (DMOGPM) framework that incorporates:
an homogeneous analysis space (linear) for shape and spatial orientation features,
a kinetic energy based motion representation for more accurate comparison of rigid transformations,
a continuous representation to describe feature variation across the training dataset.
The DMOGPM framework is aimed at modeling multiobjects (with objects) and their relative spatial positions as a deformation field from a reference joint , where a object shape of the joint, at its relative spatial position, is represented as
(8) 
with the rigid transformation defined as
for some deformation , with and being the coordinate vector of the set of points . We model deformations as a Gaussian process with mean function and covariance (kernel) function .
The DMOGPMs are built through 4 steps. First, the anatomical correspondence needs to be established across the multiobjects to guarantee the homology of shapepose features. Second, the optimal representation of spatial dynamics associated with shapes is obtained. Third, linear and nonlinear transformations are normalized to obtain an homogeneous space for linear statistics. Finally, the dynamic multiobject is modelled in a continuous domain.
We define a dynamic multiobject (joint) as a set of several biological structures (or objects) from different family of shapes (e.g humerus, scapula etc.) and their relative spatial position. Establishing correspondence across several structures while keeping their anatomophysiological relationship is not a straightforward task. Here we extend the framework presented in our previous work [27] for two biological structures to () biological structures. For a single biological structure, amongst several different approaches [28, 29, 30], correspondence can typically be achieved through rigidregistration and subsequently nonrigid registration performed using freeform deformation (FFD) models such as GPMMs [11]. For multiobjects, not only shape correspondence should be established across objects, but also the relative spatial position between them should be preserved. To register objects while keeping their anatomophysiological relationship, we assume that one of the objects remains fixed while the others move relative to the fixed one. Individual FFD models ( models) are built for each object family and used to register objects of their respective family. After the registration, each inverse of the rigid registration transformation is used to retrieve the original spatial position of moving objects. This operation retrieves the anatomophysiological relationship between moving objects. Rigid transformation of the fixed object is finally applied to all moving objects to obtain their anatomophysiological relationship with the fixed object. Dense correspondence (shape and spatial dynamics) is established across the dataset by using the same FFD as reference for the registration.
Let be the rigid transformation that transforms the object () to the reference , the rigid transformation that transforms the fixed object () to its reference . The registered object, , with its spatial position retrieved is obtained as:
where is the space of deformation fields, is a norm in , is a similarity measure and a regularisation coefficient.
The literature on SSPMs usually emphasizes on the common shapepose space required in medical image processing. The standard representation of rigid transformations may not always encapsulate the kinetic energy necessary to move points on an object from one position to another, particularly for noncompact shapes. This reduces the efficiency of this parameterization for statistical analysis of rigid transformations [13]. Thus, we propose to extend the kinetic energy based representation by Moreau et al (2017) in a continuous space and use it to encode a rigid transformation in DMOGPM. We refer to this new transformation representation as the energy displacement representation (EDR).
We represent a multiobject complex as a set of objects; each object defined by its shape and its spatial position parameters. A multiobject is represented as a concatenated vector space and denoted as where and with and representing the point domain of the shape and the spatial orientation features for the object of the joint, respectively.
Now let us assume there are multiobjects examples in the training dataset. Let us explicitly define the rigid transformation representation which is the displacement field representation that extends [13] in a continuous domain and will be used to define joint features representations. We define the EDR of the example of the object on its reference spatial position domain with their respective reference shape and reference pose points . The EDR and its associated metric are defined as:
(9)  
where is a lie group of rigid transformations.
To normalise shape and pose features, we define unified representations of shapes and their relative spatial dynamics. These representations are joint deformation fields over which a Gaussian process (GP) will be defined (section 3.4). A reference multiobject is chosen from the training examples. This joint should be representative of the dataset, that is, approximately similar to the mean of the dataset. A joint deformation field of the example in the training dataset is a pair of deformation fields (, ), where maps a point of the reference shape objects to the point of its corresponding shape example; and maps a point of the reference pose objects to the point of its corresponding pose example. The joint deformation field is defined
(10) 
where
(11) 
Before obtaining residual features (around the mean) that will represent normalised features, the joint function of the mean shape and spatial transformations has to be defined. We estimate this mean function, , using the mean pose deformation fields as:
(12) 
with
.
It should be noted that the mean is obtained using the mean of the vector field (thanks to EDR), in contrast to SR where the Fréchet mean is estimated through numerical optimization.
Thus far, pose representations across the training dataset belong to a manifold which is a nonlinear space. Rigid transformations are projected (that is linearization of relative spatial transformations) onto a linearized space to the manifold at the mean rigid transformation (tangent space) using the EDR (eq 9), in order to obtain the unified space of shape and pose features. Relative spatial transformations are linearized through bijective mapping.
Let us define the functions associated to the EDR between two rigid transformations , :
(13)  
The projected joint deformation fields onto the vector space over which the GP will be defined are
(14) 
From a training dataset of examples with objects each, the deformation fields defined on the continuous domain are modelled by a Gaussian process defined as:
(15) 
where is the kernel function defined as:
(16) 
The multiobject deformation fields are represented by an orthogonal set basis functions denoted by , and defined as:
(17) 
where and represent the shape and the energy displacement component of the basis function.
The object is modelled as a combination of shape deformation field and a rigid transformation (obtained using exponential map) as defined below:
(18) 
where and the pair representing the eigenvalue and eigenfunction couple. These couples are obtained using KarhunenLoève expansion and Nyström approximation as in equation 7. The eigenfunction is a continuous multiobject deformation field defined on , in the tangent space to .
For example, figure 2 shows how a object is sampled from the model. A shape instance of the object and its associated spatial position can be generated from our probabilistic model for as:
(19)  
with  
where is the reference shape of the object.
As in GPMMs, our DMOGPM framework allows to operate on a specific part of the domain for a given model using marginal probability properties, DMOGPMs still have these properties.
Let us assume that the DMOGPM is a joint probabilistic model denoted by . The marginal probabilistic model is defined as
obtained from the joint probability in a specific domain.
Another important aspect of the DMOGPM framework is its potential of making the model “class specific”, that is, making the variations across a specific class constant. For example, this model includes static multiobject models, that is, setting the pose parameters constant to obtain a classic SSM of the joint. Conversely, it includes a motion model, that is, setting the shape parameters constant which leads to a poseonly statistical model.
Another advantage of the DMOGPMs is the possibility of increasing the variability around the mean without retraining the model as in GPMMs. The number of training examples may be insufficient to provide a subspace of shapes to represent all possible shape variations within the family, which introduces bias. The extension of the variability using a bias kernel can overcome this limitation. Our framework allows for the understanding of such variability and defining an appropriate kernel that encodes the variability. A new kernel can be added to the trained model kernel without a need of reusing the training examples. An example of kernel that can increase smoothness in a model is the Gaussian kernel defined in a specific domain by
where the parameter defines the scale of the average error and where the range over which deformations are correlated [11]. The new DMOGPM kernel can be a combination of the pretrained kernel with different Gaussian kernels in each domain defined by
where the is a unit matrix.
In order to demonstrate the utility of the framework in prediction from the observation of target examples, we adopted the Monte Carlo Markov chain (MCMC) and Metropolis Hastings algorithms as in [21, 24]. Markov Chain Monte Carlo methods are computational tools to perform approximate inference with intractable probabilistic models by making the use of another distribution from which one can sample, in order to ultimately draw samples from the true posterior distribution model.
Let us consider target objects with arbitrary relative positions to each other. The goal of the MCMC fitting process is to find the best model parameters
(a multiobject deformation field in our case) that optimally represents the target example. The uncertainties related to pose estimation are embedded in the DMOGPM and therefore need not be modelled separately. The target multiobject example may consist of partial objects; for instance, images of bones with slight cracks caused by osteomalacia or any incomplete anatomical structure on images due to pathology or due to image artifacts. The advantage of the DMOGPM is that when confronted with a missing part on one object, the other objects in the target example can be used as additional observations since the correlation between the objects is embedded in the model. The DMOGPM with MCMC fitting could be used to obtain a posterior model of one object while another is missing.
Let be the observation (target multiobject complex), is a set of objects at various relative positions. Note that could be partial i.e. we may have less than objects. The posterior model is estimated using the Bayes rule:
where is intractable, is the model and the likelihood. To estimate the posterior DMOGPM, a multiobject likelihood is proposed.
Full shape and pose estimations require a likelihood that captures withinobject and betweenobject features. We propose the multifeature class comparison likelihood that has a global and local component. The global component ensures the prediction of all feature classes while each local component ensures the predominance of its feature class in the global likelihood. The local likelihood compares the observation of the target with its instance
generated from the model using independent featurewise comparison. It can be seen as a probability distribution of possible shapepose samples evaluated for the target and it is defined as:
(20) 
The global likelihood compares the observation of target with the model instance and it is defined as:
(21) 
A sample is accepted as a new state for the object with the metropolis decision probability and the target with the probability defined by:
(22) 
(23) 
where is the proposal generator. Finally, the proposal is fed through a sequence of Metropolis acceptance decisions:
(24) 
The predicted target is the instance of the posterior model with the highest probability .
To validate the DMOGPM framework, we created synthetic data set with precisely defined modes of shape and pose variation. We avoided real biological data because these typically exhibit complex shape and pose variations making them difficult to use for benchmarking and validation. Our synthetic data consisted of surface mesh data of a ”lollipop” as defined in [31]. The lollipop has a shaft, a neck and head. We used this template to generate surface data with meshes composed of vertices; all with identical surface topology. The only difference in shape between all generated meshes was the length of the major axis defining the ellipsoid of the head of the lollipop. This was varied uniformly in the range from to
, so the shape variation was entirely encoded using one degree of freedom. The shaft and neck areas were identical for all synthetic data.
To evaluate the potential of our framework in modelling shape correlation between different objects, we created artificial joints using two lollipops per scenario. Each joint was composed of two lollipops with major axes for corresponding pairs of lollipops of and , for object 1 and object 2, respectively. The span of the dataset of joints was created by varying and as creating a negative correlation between objects 1 and 2 (figure 3 (left)).
The correlation between samples from the DMOGPM and the one between the synthetically created examples were compared to know whether our model could establish the correlation between objects in the training data if it exists. Our model captured the correlation prescribed in the training dataset. The DMOGPM showed a nonzero slope at significance level with a negative correlation coefficient of as in the training dataset. The regression curve of the training examples and our model examples were the same, illustrating the precision to which our model captured the correlation between objects of the synthetic joints.
Prediction using statistical models remains challenging for medical image processing tasks because it requires observations that may not be available. The models developed have the potential to make predictions of one object without or with limited observations (which are the features available) of that object through leveraging information from the other object in a complex. The assumption is that there is a good correlation between the two objects. For the rest of the study we define four types of observations of target objects (see table I) that will be used for predictions.
Type I  The full object to be predicted is available 
Type II  Partial object to be predicted is available 
Type III  Anatomical features (e.g. landmarks) of the object to be predicted are available 
Type IV  No knowledge of the object to be predicted is available 
To test this hypothesis, we generated joints of lollipops with the parameters and between the minimum and the maximum parameters that were in the training dataset. For each joint, object 1 was predicted from the observation of object 2 only and conversely. We compared the results obtained for each object with Type I observation to the ones obtained with Type IV observation. Table II shows the root mean square (RMS) distance and the Hausdorff distance (HD) errors of the predictions. The average RMS error was and mm for the prediction of object 1 and object 2 with Type I observation, respectively. The average HD error was and mm for the prediction of object 1 and object 2 with Type IV observation, respectively. These errors are relatively large compared to those predicted with Type I observation. However, this result obtained with type IV observation shows that the correlation embedded in the model provides additional information that can be useful in getting accurate prediction.
RMS (mm)  HD (mm)  
Object 1 with Type I observation  0.0022.2e6  0.0141.7e3 
Object 1 with Type IV observation  0.40.06  0.90.22 
Object 2 with Type I observation  0.0012.9e6  0.010.001 
Object 2 with Type IV observation  0.220.02  0.820.2 
To evaluate the potential of our framework in modelling shape and spatial dynamics together, we simulated joint motion. First, we created a dataset where shape does not correlate with motion. For each joint generated above, we rotated the second lollipop (object 2) relative to the first one (object 1) using the Euler’s angle convention for describing rigid transformations . The second lollipop was moved in the corresponding to Euler’s angles ; where only was varied by four angles () defining a motion (figure 3 (right)). Defining the same motion for each joint created a dataset where shape was uncorrelated with motion because each pair of shapes performed the same movement. Secondly, we created a dataset where the motion is constrained by the shape. This was done by simulating the same movement as above but the maximum angle was inversely proportional to ( ), thus creating correlation between shape and motion.
We built a model using the synthetic training dataset with no shape and pose variation correlation. The model behaved as expected; the first two principal components (PC) accounted for of the total variation. The first PC explained shapeonly variation without any pose variation and counted for and the second explained only pose variation and counted for of the total variation. Figure 4 shows the shape variation of the first and second PC from to standard deviations about the mean. It can be observed in the first PC that the pose is fixed at and only the shape parameters and vary from to for the first object and from to for the second which corresponds to the correlation induced in the training dataset. In the first PC, the second lollipop head length () decreases up to its sharp where it starts being unrealistic due to the large sampling range (up to ) as indicate by the star (*) in figure 4. In the second PC the shape parameters are fixed at the mean shape which are for the object 1 and for the second object. The pose parameter varies from to representing abduction motion in the training dataset. The shape and pose variation explained by two different PCs mirrors the shape and the pose variation across the training dataset confirming that the model represents the training dataset.
We also built a model using synthetic training data in which pose variation was constrained by shape. The model also behaved as expected. The first two PCs accounted for of the total variation. The first PC was shape variation as well as pose variation and accounted for and the second explained poseonly variation and accounted for of the total variation as shown in Figure 5. A visual inspection of the first PC reveals that shape parameters and vary from to for the first object and from to for the second object and that the pose variation is inversely proportional to the second shape parameter; mirroring the training dataset. In the second PC the shape parameters are fixed at the mean shape parameters which are for the first object and for the second object and only the pose parameter varies from to . The shape and pose variation explained by the first PC show a correlation between shape and pose variation similar to that in the training dataset.
A DMOGPM using EDR was compared to the one using the SR to know whether the model with EDR better explains the spatial dynamic of the objects. We sampled the PC describing the motion simulated and compared this with motion prescribed in the training data. Figure 6 shows first PC and the third PC which account for motion in the DMOGPM with EDR and DMOGPM with the SR, respectively. The motion described by the DMOGPM is more similar to that prescribed in the training data compared to the motion described by DMO with the SR. For example, the position of the mean shape is at a lower angle in the DMO with SR compared to that in the training motion. The sample poses from and standard deviations about the mean are completely deviated from the training data motion plane. It can be observed that the mean pose for the model with SR is not different from that of the training data and the pose sampls are not distributed around the mean; this is because the SR does not use object features (is independent of the object features).
We also used the DMOGPM built with the EDR and the one built with the SR to predict lollipop joints with the second object at various poses, that is, predicting each with type I observation. The same synthetic training examples used to test object correlation above (see table II) were used. The test dataset includes two positions of object 2 relative to the first object: and for the models with no shapepose correlation, and and for the models with shapepose correlation. Figures 7 and 8 show the boxplots of the surface to surface distance RMS and the HD errors in predicting the second object. The DMOGPM with EDR outperforms DMOGPM with SR when there is shapepose correlation. For no shapepose correlation, the DMOGPM with SR performed slightly better than the one with EDR. This slight outperformance occurs even when the motion described by the model is not statistically derived from the training motion, while some poses still belong to the training poses.
To further demonstrate our framework in clinical and medical image analysis tasks, we applied our method to the shoulder joint; the most complex joint of the human body. The complexity of the shoulder joint can be seen from both the shape and the dynamics points of view. The joint contains the scapula, one of the most complex bones in shape terms; and the humerus, one of the longest bones of the body. This complexity already confounds the separate morphometric analyses of each bone, let alone when both bones need to be analysed together. Furthermore, in motion terms, the shoulder joint has the most degrees of freedom. This leads to difficulties in segmenting its constituent bones at their relative spatial positions.
We applied DMOGPMs to shoulder joint data (composed of two articulating bones; scapula and humerus) to demonstrate different scenarios in which the framework could be used. The data consisted of mesh surfaces segmented from computed tomography (CT) images of the bilateral shoulders of fresh cadavers collected from the Division of Clinical Anatomy and Biological Anthropology, Faculty of Health Sciences, University of Cape Town, South Africa. Institutional ethics approval was granted for this study (Approval No: HREC 546/2017). A total of shoulder joints were used; of these data, were from female and were from male remains. None presented with shoulder pathology at the time of death. The age of the imaged decendents ranged from 21 to 90 years. The CT volumes were imported into the medical modelling tool (Amira v6.2.0, http://www.fei.com/) for segmentation and 3D surface rendering.
In order to establish correspondence across joints within the dataset, we used the surface meshes to create a median virtual shape for the humerus and scapula, separately, using iterative median closest pointGaussian mixture model (IMCPGMM)
[26]. The meanvirtual shapes were used as morphologically unbiased templates to built separate GPMMs. These models were used to register all the samples across the dataset using the joint registration method described in section 3.1. Table III shows the average surface to surface RMS and HD between the original meshes and their estimations after the registration process for both humeri and scapulae. The registration errors were low for both humerus and scapula indicating that adequate anatomical correspondence was established across examples in the training datasets. However, the humerus registration errors were low relative to the scapula registration errors which could be attributed to the greater shape complexity of the scapula relative to the humerus [26].RMS (mm)  HD (mm)  
Humeri  0.730.18  2.620.84 


Scapulae  1.110.2  3.010.8 
Since the CT scans obtained were from cadavers, the humeri were in random positions relative to their corresponding scapulae. Some of these positions were not realistic (physiologically), therefore using them for motion analysis would have led to an unrealistic study. To overcome this issue, we simulated artificial motion through a biomechanics protocol.
To simulate the motion, the humerus was considered to be the mobile object while the scapula was fixed. A random shoulder joint from the incorrespondence dataset was used as a template and artificial pose variations were obtained in four steps [27]: 1) determining the humeral head centre of rotation using a sphere fitting algorithm [32]; 2) defining the International Society of Biomechanics (ISB) recommended coordinates for both scapula and humerus bones [16]; 3) obtaining the neutral position of the humerus by aligning the humerus and scapula coordinate systems and then shifting the scapula ISB origin to the glenoid centre; and 4) generating ten poses by moving the humerus around the glenoid centre in the plane (simulating abduction). Five poses were generated for each joint .
To demonstrate the performance of our model in predicting shoulder anatomy, we compared the results obtained using DMOGPM to those using single structure SSMs built as GPMM of the humerus. The prediction was done using MCMC. For this purpose, a marginalised shapeonly DMOGPM of the shoulder was extracted from the global DMOGPM since only shape features were being predicted. We demonstrated the prediction of the humerus from the observation of the scapula in different scenarios. We created a test dataset, consisting of shoulder joints ( from the training data and outside the training set) for shoulder joint shape feature prediction.
First, we predicted the humerus with Type IV observation, that is, only having the scapula of the patient. Table IV shows the errors in predicting the humeri without any information at all using DMOGPM. Regardless of somewhat large errors, it can be concluded that if there is a strong correlation between the scapula and humerus, a meaningful prediction of one given the other is possible. These predictions result from the correlation that may exist between scapula and humerus shapes captured by our modelling framework. Such predictions would not be possible with single structure models or SSMs. Second, we predicted humeri with Type III observation; two anatomical landmarks (tips of the lateral and medial epicondyle) were selected on the distal humerus targeted for prediction. Our aim here was to predict the humeri when presented with a scapula and the two humerus landmarks using the DMOGPM. Again we compared the results of the DMOGPM predictions of the humeri with predictions made using a humerus SSM. Figure 10 shows the boxplots of prediction errors of the humerus using DMOGPM and single structure humerus SSM. The prediction errors with DMOGPM were low compared to errors for predictions with the SSM. The outperformance of the DMOGPM relative to the single structure SSM could be attributed to the additional humerus knowledge afforded through its corresponding scapula. Finally, humeri were predicted with Type II observation. We simulated humerus fracture scenarios by removing some part of the humeral surfaces and obtained partial humeri. The same testing was performed between DMOGPM and single structure SSM of the humeri for prediction of the full humeri from partial humeri. Figure 11 shows the boxplots of the RMS and the HD prediction errors of the humerus using DMOGPM and SSM. The DMOGPM outperformed the single structure SSM and figure 9 shows a reconstructed scapula with DMOGPM and individuals SSMs. Again this performance could be due the shape correlation that exists between the humerus and the scapula that can be considered to carry latent information about the humerus.
We predicted scapulae from the observation of partial scapulae using DMOGPM and GPM; that is, scapula fractures were simulated on the distal part of the scapula. The DMOGPM and the SSM of the scapula were used to predict those scapulae. Figure 13 shows the prediction errors, where DMOGPM outperformed the scapula SSM.
DMOGPM  
Humerus with Type IV observation  RMS (mm)  HD (mm) 
2.91.18  14.05.23  
Single structure GPMM  
Humerus with Type IV observation  RMS (mm)  HD (mm) 
Not possible 
To evaluate the performance of the DMOGPM in predicting training motion, the simulated abduction motion was predicted. The scapula and the humerus (type I observation) were predicted with the humerus at different poses relative to the scapula. Five poses were generated within the motion range , these test poses simulated abduction as in the training dataset. The prediction was done using DMOGPM with EDR and with SR in order to compare the performance of the two models in predicting shape at their relative positions. Figure 14 shows the boxplots of the prediction errors obtained using pose marginalised DMOGPM with EDR and with SR. The DMOGPM with the EDR outperformed that with SR. For both scapula and humerus, the median error of the prediction with the EDR model was less than that with SR. Figure 12 shows the reconstructed shoulder using the DMOGPM.
Understanding patterns and explaining them in a statistical framework so that we can induce meaningful observations is a vast area of research. This study presents a framework that tackles the existing problems in a combined statistical representation of multiobject shape and pose structures by providing a homogeneous yet simple way of building datadriven generative models. The common aim in statistical modelling of multiobjects is to efficiently explain shape variability and spatial dynamics across a population of multiobjects in a low dimensional space. This study focused bringing the two sets of feature classes involved (shape and pose) into a common linearized space, thus finding the homogeneous analysis space. While multiple methods exist to statistically represent the shape variations in linear space, its pose counterpart is often ignored in the literature, which may lead to incorrect variability of such shapepose models. The proposed framework not only uses a homogeneous and linear statistical representation of both shape and pose features, it also embeds the mode in a continuous domain thereby making the model generative. This type of framework in especially needed in the medical imaging domain where structures of interest can be human articulating joints. Human joint structures represent nonlinear and nonhomogeneous feature classes in terms of its bone shapes and poses. Optimal and homogeneous statistical representation of the variation across these structures using a datadriven modelling framework is of utmost importance to maintain the clinical accuracy of the inferences drawn from the model. In this study, we have illustrated that DMOGPM modeling framework can have valid representations of shape and pose combined for multiobject structures.
We have provided a systematic validation of the DMOGPM framework using closelycontrolled synthetic data (lollipop) and illustrated its application is experimental medical data (shoulder joint). This framework is able to incorporate more than two objects in a compact representation using Gaussian process which means the developed models provide a continuous analysis space with variations of shape and spatial dynamics. Marginalizing DMOGPMs allow us to analyze and/or predict shapeonly or poseonly or combined extractions/generations. This important characteristic of DMOGPMs allow us to implement the framework in multiple applications ranging from presurgical planning to population based implant designs.
Results of validation with a synthetic dataset provide the efficiency of the DMOGPM framework in explaining correlations across a given population of dynamic multiobjects. The DMOGPM framework can accurately explain shape correlation between objects and correlation between shape and spatial variation, an advantage of the multiobject models in prediction compared to single structure SSMs. Furthermore, DMOGPM also preserves the optimal representation of 3D motion using the EDR approach as illustrated on the noncompact lollipop dataset. We also applied our framework on human shoulder data to address one of the challenges in medical image processing, that is, prediction from observations. DMOGPM illustrated the prediction ability of one of the shoulder bones from the observation of the other. The prediction would be more accurate if more observations  partial or full  are available. For example, estimating the humerus from the observation of the clavicle and the scapula would be better than from the observation of the scapula only. This could be of interest in surgical settings such as shoulder arthroplasty, especially when one of the joint bones is partially or completely missing (due to trauma).
The challenge of automatic assessment of pose variation in medical image analysis can be due to the inability to standardize pose during image acquisition. This is true for the same individual with multiple images or different individuals. We evaluated our model in this scenario by prediction of a shoulder joint with the humerus at different relative positions to scapula, around a predetermined joint center, simulating abduction. Results suggest that DMOGPM can be incorporated into such scenarios for automatic segmentation pipelines. Future work would involve extension of DMOGPM with an intensity model [33]. This would allow the use of the model for automatic segmentation of images (MRI and CT scans), especially dynamic MRI, with due consideration to pose variations. Furthermore, DMOGPM could be used to optimize surgery. For example, by exploiting anatomical correlation between joint objects, accurate joint segmentation could be achieved for the design of the better joint implants. Additionally, parametric optimization could be investigated, which is computationally less expensive than MCMC and could improve the clinical utility of the models. Moreover, the proposed framework could be used to answer biological questions.
Gaussian kernel defined in with  
the scale and variance 
normal distribution with  
zero and unit covariance  
object of the example  
in the training data  
number of the training examples  
number of objects constituting a  
training example  
rigid transformation assisted to the  
shape sample from the DMOGPM  
space of similarity transformations  
concatenated vector of shape and pose  
features of the example  
the combined eigenvector of shape ()  
and pose () for SSPMs  
population mean  
population covarinace  
Gaussian random vector  
shape sample from SSPMs  
pose sample from SSPMs  
deformation field associated to  
the example  
reference shape domain  
reference pose domain  
reference shape  
eigenvalue of the GPMMs  
eigenvector of the GPMMs  
object of the fixed object family  
reference of the fixed object family  
multi object deformation field  
object of the training example  
rigid transformation between  
the reference and the object  
rigid transformation between  
the reference and the object  
registered object corresponding  
to the object  
multiobject concatenated vector of shape  
and spatial position  
spatial position displacement field of the  
example  
spatial position component of  
shape component of  
energy displacement representation  
associated to  
reference associated to the  
multiobject deformation field  
associated to example  
DMOGPM mean  
shape component of  
pose component of  
target multiobject example to be predicted  
shape parameter of the synthetic object  
shape parameter of the synthetic object  
Euler angles in Xaxis  
Euler angle in Yaxis  
Euler angle in Zaxis  
DMOGPM  
reference of the ’s objects  
a vector coordinate of the points of  
rigid transformation between  
the reference and the object 
The authors would like to thank Marcel Lüthi (Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland) for invaluable input and discussion in preparation of this study.
Toshev, A. and Szegedy, C., 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 16531660).
Rasmussen, C.E., 2003, February. Gaussian processes in machine learning. In Summer School on Machine Learning (pp. 6371). Springer, Berlin, Heidelberg.