Automatic image segmentation is an important enabling technology in most medical imaging applications that involve soft tissue imaging, e.g., neurology, cardiology, and oncology. In particular, the preoperative anatomical representation of the left atrium (LA) is important for ablation guidance, fibrosis quantification, and biophyiscal modeling in artial fibrillation patients [1, 2, 3, 4]. Difficult segmentation problems typically have two aspects. First, image features, such as intensity and texture are noisy and unreliable. Second, anatomical boundaries are ill-defined, often in the presence of low contrast, background clutter, and partial volumes, while irregular shapes with high variability limit the ability to find invariant features. Nonetheless, if one operates in a context with expectations of particular classes of anatomies (e.g., LA), such challenges can be addressed by means of shape prior information to guide and constrain the segmentation process; motivating a Bayesian formulation. In LA segmentation context, shape-driven methods were found to be the most appropriate in addressing inherent challenges ; thin myocardial wall, surrounding anatomical structures with similar image intensities, and topological variants pertaining to pulmonary viens arrangments . In this paper, we propose a Bayesian surface-based segmentation framework that is based on a mixture-based global shape prior along with a feature-based local intensity prior.
Traditionally, when dealing with low-quality image segmentation, statistical shape information has proved to be helpful in delineating correct object boundaries (e.g., [7, 8]). For example, active shape models  and their variants incorporate over-restrictive shape constraints in the form of statistical shape models (SSM) by limiting the solution to some low-dimensional linear subspace defined via training shape exemplars. However, these approaches are limited in their ability to accommodate shapes that are not represented in a low-dimensional description; a typical situation arises in applications with small and large-scale shape variability (e.g., 
). Further, these methods only handle unimodal Gaussian-like shape densities. When it comes to modeling complex shape distributions, a single Gaussian can not adequately model cases of nonlinear shape variation where the probability distribution is multimodal.
On the other end of the spectrum, multiatlas-based segmentation (MAS) approaches require a large database of atlases to capture wide range of shape variation where label maps are propagated to the testing image through registration. In essence, MAS can be viewed as a nonparametric estimate of the prior probabilities that converges to the true densities with the number of atlases. To reduce the computational burden introduced by registration, atlas selection is usually performed to exclude irrelevant atlases that might misguide the segmentation process . Generally, kernel-based methods (e.g., [8, 13]) are popular strategies for dealing with complex distributions in explicitly or implicitly represented training data. Under mild assumptions, they converge to the true distribution in the limit of infinite sample size . For example, Cremers et al. 
used kernel density estimate (KDE) to derive non-linear shape priors which are based on a shape distance between implicitly embedded training shapes. Nonetheless, such methods are prone to over-fitting due to small sample size in high-dimensional shape spaces, limiting the generality of the resulting models to fit unseen examples. Specifically, in multivariate density estimation, KDE requires larger kernel width to accommodate more exemplars as the dimension of a variable increases. This will eventually result in model under-fit due to high bias. Further, for kernel based methods, the dimension of the parameter space is proportional to the training sample size.
A finite mixture of Gaussians can approximate a full kernel density, to capture nonlinearities in the distribution or subpopulations in the underlying shape subspace  where the expectation-maximization (EM) framework is usually used to find the maximum likelihood estimate of mixture parameters. However, modeling shape distribution after being projected onto low-dimensional subspace (e.g., [14, 16]) will often collapse or mix the subpopulations, which would derail learning of the mixture structure of the underlying shape space 
. Nonlinearity of shape statistics can also be modeled by lifting training shapes to a higher, probably infinite, dimensional feature (a.k.a. kernel) space where the shape distribution is assumed to be Gaussian distributed, yet this approach results in an infinite-dimensional optimization scheme while sacrificing the efficiency of optimizing in low-dimensional subspaces . Further, one can settle for only an approximate solution for the reverse mapping from feature space to shape space.
The proposed bayesian formulation relies on a maximum-a-posteriori estimation from a generative statistical model that incorporates global, nonlinear shape priors modeled as a mixture of Gaussian components, and local, nonlinear intensity prior as automatically learned image features viadeep autoencoders
. The Gaussian mixture model (GMM) takes into account the linear subspace spanned by each mixture component to avoid the classical problem of model over fitting in high-dimensional spaces, and can adequetely model non-linear shape variations. To use these shape priors in segmentation, we treat the identity of the mixture component as a latent variable while marginalizing it within a generalized expectation-maximization framework (GEM) . We present a conditional maximization-based scheme that alternates between a closed-form solution for component-specific shape parameters that provides a global update-based optimization strategy, and an intensity-based energy minimization that translates the global notion of a nonlinear shape prior into a set of local penalties. Preliminary results show improved accuracy from traditional shape-based approaches.
Given an image , where is the total number of voxels and is the intensity value of the voxel located at , the Bayesian formulation of the segmentation problem amounts to finding the optimal surface
that maximizes the log-posterior probability, where a shape’s surface is represented by a dense set of geometrically consistent points (landmarks) . In order to obtain segmentations that preserve the global shape characteristics of the shape population of interest, the segmentation process is influenced by the prior, in the form of the shape probability distribution. We model such a prior distribution as a finite mixture of Gaussians, parameterized by the mixture component identity , which is treated as a latent variable, and the model parameters for components. Hence, given a latent variable , the log-posterior can be written as111For notational simplicity, we will refer to hereafter.,
where defines a conditional probability of the latent variable given the input image and the estimated surface .
2.1 Generalized EM for shape-driven segmentation
The use of log-posterior allows us to marginalize the over the latent variable . Using generalized expectation-maximization (GEM) [19, p. 318], we iterate to find the mode of the marginal posterior by averaging over the latent variable . GEM starts with an initial estimate of the surface and iteratively refines it by marginalizing over . Within each iteration, the posterior is guaranteed to increase, i.e., converge to a local maxima. Given the current surface guess , GEM consists of two steps: (1) E-step
, which computes the latent conditional probability distributionand (2) M-step, which finds a new surface such that . The details of the algorithm are as follows.
E-step: The conditional probability of the latent variable given the input image and its labeling function can be computed as,
where the intensity model in the vicinity of the current surface guess is assumed to be conditionally independent of the shape generating component, leaving the E-step to be fully shape-driven. Note the effect of the prior probability on the expectation step, where the conditional component distribution would favor mixture components with higher support/proportion, the M-step afterwards would pull the new surface towards the most probable mixture component(s) which generated the shape to be segmented as being compatible with the local intensity and global shape priors.
M-step: Taking the expectations on both sides of (1) while treating
as a random variable with the distributionyields,
where the left side of (1) does not depend on . The second term in (3) is maximized when (the key result of EM ). Hence, it is sufficient for the new surface estimate to maximize the first term of (3), which is the expected complete-data log-likelihood.
The intensity term gives rise to an intensity model in the vicinity of an estimated surface that resembles the th component. Intensity features (learned via a deep autoencoder )
are assumed to follow a normal distribution in the learned feature space with parametersthat are estimated from the training data, where is the mean intensity feature at the th point and is the corresponding covariance matrix. Notice that increases the first term of (3) since it is obtained based on its maximization. In addition, decreases the second term of (3), because such a term is maximized only when . Hence, if , then the GEM method has converged to a local maxima of the posterior .
2.2 Mixture modeling of global shape priors
The combination of high dimensional shape space and a small number of example shapes usually makes modeling of shape priors subject to curse-of-dimensionality, which hinders the effectiveness of mixture learning and density estimation in high dimensional spaces. To avoid over fitting, we can parameterize the covariance matrix of a component, in a manner similar to 
, through its eigen decomposition. Hence we can learn the Gaussian mixture model taking into consideration the dominant linear subspace spanned by each mixture component, where the noise variance is assumed to be isotropic and contained in a subspace orthogonal to the component’s subspace.
We first model a dense set of homologous landmarks via particle based modeling. Assume that the anatomy-specific shape space of all shapes defined using the landmarks representation is comprised of mixture components. Each component is parameterized by and learned via high-dimensional EM  where is the component probability/proportion,
is the mean vector,is the intrinsic dimension of the th mixture component, i.e., the number of dominant modes of shape variation, is a diagonal matrix containing the largest eigenvalues of the component’s full covariance matrix , is the orthonormal matrix containing the corresponding eigenvectors, and
is the standard deviation of the off-subspace noise. Hence the component-specific subspacecan be defined as follows:
where encodes the shape parameters w.r.t. the th component subspace .The shape probability distribution is defined as a weighted sum of component-conditional probabilities using the mixture weights . Notice, that the component-wise model beyond the first modes consists of an isotropic variance in the directions orthogonal to the subspace, which has the effect of penalizing point wise differences from the learned, -dimensional subspace. Hence, the surface probability conditional on the th mixture component becomes a product of two marginal and indepepdent Gaussians; within-subpace and off-subspace .
We illustrate this training step in Fig. 1. These data are from the samples selected for training and illustrate a low-dimensional subspace embedding of every sample colored according to their class membership in a mixture model of 3 components along with example LA segmentations from each. One can observe key differences between each component such as the number and shape of pulmonary veins.
2.3 Autoencoding local intensity priors
Similar to the active shape model (ASM) approach of , we model local intensity profiles at the object surface consisting of the normalized first derivative of image intensities oriented along the approximate normal to the surface and centered at each landmark location, with radius of voxels with total length of (Fig. 2 ). Corresponding profiles (i.e., those that share the same landmarks ) are combined across all training images but are treated independently of one another. The traditional approach to segmentation is to use the Mahalanobis distance of candidate profiles to the training data as a cost function to be minimized, however, it assumes the feature space to be first order linear and normally distributed. In order to overcome this limitation, we employ deep autoencoders to adaptively learn the higher order non-linear features. In this case, we use a 2 layer
10 hidden units sparse autoencoder to generate pseudolinear representations of the feature space for each landmark. Due to the computationally expensive nature of this step we determined these parameters through a qualitative empirical analysis but there is potential for improvement via more rigorous cross-validation. To improve the statistics and capture lateral intensity changes, we append additional profiles which are parallel to the main profile and, offset by a subvoxel distance, with interpolated intensity values, to form “thick” profiles. We repeat this for multiple image resolutions and store the autoencoder as well as the encoded output.
2.4 Segmentation via conditional maximization
The marginal posterior optimization problem in (4) involves two types of latent variables, the labeling function to be estimated and the component-specific shape parameters . To seek an optimal solution for , we propose a conditional maximization-based scheme [19, p. 312] that alternates between two optimization phases. In the first phase, we optimize for given , and in the second phase we optimize for given individual shape parameters . Starting with the coarsest image resolution, we take the mean landmark positions from the training data as the initial model and iterate through these steps.
Localized feature-based optimization: For each landmark position, we generate a series of overlapping profiles of dimension that extend above and below the original profile by some specified search length . For each of these candidate profiles, we generate features using pre-trained autoencoders. We then take the Mahalanobis distance of each candidate profile to the encoded training features and update the landmark position to the center of the optimal profile, oen with the smallest distance. This step ensures that landmarks are locally optimal, but they may no longer by globally optimal as there is no shape information encoded into these features.
Component-specific shape parameters optimization: For a given surface (from the intensity-based step), component-specific shape parameters can be obtained by projecting the surface onto the component-specific subpace, leading a closed-form solution . By limiting the model parameters to what is considered a normal contour with respect to shape, the landmark positions become globally optimal. We repeat this a predetermined number of times at each resolution, coarse to fine.
We tested these methods on 100 3D MRI images, separated into 80 for training and 20 for testing. Fig. 3 shows a sample MRI image with the provided segmentation. The segmented volume is also separately displayed in blue along with modeled landmark positions. We first develop a shape model of 2048 homologous landmark positions modeled with a 3 component Gaussian mixture model. We then model local intensity gradients for each position across all samples using an autoencoder on profiles 11 voxels in length (optimized through cross-validation). In order to improve model statistics, we encode additional parallel information to create ”thick” profiles. This model was then used to segment the 20 test images according to the methods described above. To compare the effectiveness of our method we evaluate results both with and without autoencoded local intensity profiles as well as with and without the mixture model. Results are in Table 1 . While there appears to be some improvement in median Dice coefficient, this may not be the best metric for evaluating the efficiency of the proposed method . We also evaluated the euclidean distance between final landmark positions from our segmentations to those of the ground truth. This also allows for a more detailed point by point analysis of error as shown in Fig. 4
. Here we see two examples from the test data set showing clear improvement in segmentation results when using autoencoded profiles. We truncate the color scale at 10 (1 cm) and flag any higher error as an extreme outlier. Error is generally limited to image edges around the pulmonary veins when using autoencoded intensity profiles as opposed to nearly universally high error without. These results could be further improved by initializing the model with the component mean shapes as opposed to the global mean which is how we are presently doing it.
|Median Dice Coefficient||0.743||0.747||0.729||0.739|
This paper proposed a shape/feature-based generative model for left atrium segmentation by modeling non-Gaussian global shape priors as mixture of Gaussians on landmark-based representations of training data and learning feature-based representations for local intensity priors. The mixture parameters were learned in the high dimensional shape space by taking into account component-specific subspaces to avoid over fitting. The method used a variant of ASM-based segmentation framework that relies on a maximum a-posteriori estimation with a marginalization over class membership within a generalized EM framework. While these results are preliminary, they suggest that ASM’s for LA segmentation can be improved through optimized prior modeling. Local optimization through autoencoding of non-linear intensity features and global optimization through mixture modeling of shape parameters increases the accuracy of the original method
Acknowledgment: This work was supported by the National Institutes of Health [grant numbers R01-HL135568-01 and P41-GM103545-19].
-  Calkins, H., Hindricks, G., Cappato, R., Kim, Y.H., Saad, E.B., Aguinaga, L., Akar, J.G., Badhwar, V., Brugada, J., Camm, J., et al.: 2017 hrs/ehra/ecas/aphrs/solaece expert consensus statement on catheter and surgical ablation of atrial fibrillation. Heart Rhythm 14(10), e275–e444 (2017)
-  McGann, C., Akoum, N.W., Patel, A.G., Kholmovski, E.G., Revelo, P., Damal, K., Wilson, B., Cates, J., Harrison, A., Ranjan, R., Burgon, N., Greene, T., Kim, D., Dibella, E.V.R., Parker, D., MacLeod, R.S., Marrouche, N.F.: Atrial fibrillation ablation outcome is predicted by left atrial remodeling on mri. Circulation. Arrhythmia and electrophysiology 7 1, 23–30 (2014)
-  Hansen, B.J., Zhao, J., Csepe, T.A., Moore, B.T., Li, N., Jayne, L.A., Kalyanasundaram, A., Lim, P., Bratasz, A., Powell, K.A., Simonetti, O.P., Higgins, R.S., Kilic, A., Mohler, P.J., Janssen, P.M., Weiss, R., Hummel, J.D., Fedorov, V.V.: Atrial fibrillation driven by micro-anatomic intramural re-entry revealed by simultaneous sub-epicardial and sub-endocardial optical mapping in explanted human hearts. European Heart Journal 36(35), 2390–2401 (2015). https://doi.org/10.1093/eurheartj/ehv233, http://dx.doi.org/10.1093/eurheartj/ehv233
-  Zhao, J., Hansen, B.J., Wang, Y., Csepe, T.A., Sul, L.V., Tang, A., Yuan, Y., Li, N., Bratasz, A., Powell, K.A., Kilic, A., Mohler, P.J., Janssen, P.M.L., Weiss, R., Simonetti, O.P., Hummel, J.D., Federov, V.V.: hree‐dimensional integrated functional, structural, and computational mapping to define the structural “fingerprints” of heart‐specific atrial fibrillation drivers in human heart ex vivo. Journal of the American Heart Association 6(8) (2017). https://doi.org/10.1161/JAHA.117.005922, https://www.ahajournals.org/doi/10.1161/JAHA.117.005922
-  Tobon-Gomez, C., Geers, A.J., Peters, J., Weese, J., Pinto, K., Karim, R., Ammar, M., Daoudi, A., Margeta, J., Sandoval, Z., et al.: Benchmark for algorithms segmenting the left atrium from 3d ct and mri datasets. IEEE transactions on medical imaging 34(7), 1460–1473 (2015)
-  Ho, S.Y., Cabrera, J.A., Sanchez-Quintana, D.: Left atrial anatomy revisited. Circulation: Arrhythmia and Electrophysiology 5(1), 220–228 (2012)
Rousson, M., Paragios, N.: Prior knowledge, level set representations & visual grouping. International Journal of Computer Vision76(3), 231–243 (2008)
-  Cremers, D., Osher, S.J., Soatto, S.: Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision 69(3), 335–351 (2006)
-  Cootes, T., Taylor, C., Cooper, D., Graham, J.: Active shape models-their training and application. Computer Vision and Image Understanding 61(1), 38 – 59 (1995). https://doi.org/10.1006/cviu.1995.1004
Cremers, D., Kohlberger, T., Schnörr, C.: Shape statistics in kernel space for variational image segmentation. Pattern Recognition36(9), 1929–1943 (2003)
-  Awate, S., Whitaker, R.: Multiatlas segmentation as nonparametric regression. Medical Imaging, IEEE Transactions on 33(9), 1803–1817 (Sept 2014)
-  Aljabar, P., Heckemann, R.A., Hammers, A., Hajnal, J.V., Rueckert, D.: Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy. Neuroimage 46(3), 726–738 (2009)
-  Wimmer, A., Soza, G., Hornegger, J.: A generic probabilistic active shape model for organ segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009, pp. 26–33. Springer (2009)
-  Rousson, M., Cremers, D.: Efficient kernel density estimation of shape and intensity priors for level set segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2005, pp. 757–764. Springer (2005)
-  Friedman, J.H., Stuetzle, W., Schroeder, A.: Projection pursuit density estimation. Journal of the American Statistical Association 79(387), 599–608 (1984). https://doi.org/10.1080/01621459.1984.10478086
-  Cootes, T.F., Taylor, C.J.: A mixture model for representing shape variation. Image and Vision Computing 17(8), 567–573 (1999)
-  Dasgupta, S.: Learning mixtures of gaussians. In: Foundations of Computer Science, 1999. 40th Annual Symposium on. pp. 634–644. IEEE (1999)
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Computational Statistics & Data Analysis52(1), 502–519 (2007)
-  Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian data analysis, (chapman & hall/crc texts in statistical science) (2003)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research11(Dec), 3371–3408 (2010)
-  Taha, A.A., Hanbury, A.: Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC Med Imaging 15, 29 (Aug 2015). https://doi.org/10.1186/s12880-015-0068-x, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4533825/, 68[PII]