Image Segmentation with Pseudo-marginal MCMC Sampling and Nonparametric Shape Priors

09/03/2018 ∙ by Ertunc Erdil, et al. ∙ THE UNIVERSITY OF UTAH Sabancı University 0

In this paper, we propose an efficient pseudo-marginal Markov chain Monte Carlo (MCMC) sampling approach to draw samples from posterior shape distributions for image segmentation. The computation time of the proposed approach is independent from the size of the training set used to learn the shape prior distribution nonparametrically. Therefore, it scales well for very large data sets. Our approach is able to characterize the posterior probability density in the space of shapes through its samples, and to return multiple solutions, potentially from different modes of a multimodal probability density, which would be encountered, e.g., in segmenting objects from multiple shape classes. Experimental results demonstrate the potential of the proposed approach.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Incorporating prior shape density into the segmentation process has been widely studied in the literature kass1988snakes , van2002active , milborrow2008locating , de2003adapting , wang2002improved , heimann2009statistical . These methods can usually handle Gaussian-like, unimodal shape prior densities. Kim et al. kim2007nonparametric and Cremers et al. cremers2006kernel

incorporate nonparametric density estimation based shape priors into the segmentation process using level sets. Therefore, these methods and their variants can learn “multimodal" shape densities, which can be encountered in problems involving shape densities containing multiple classes of shapes

foulonneau2009multi , chen2009level , yang2013shape , souganli2014combining , cremers2007review , mesadi2017disjunctive , erdil2017nonparametric , 8051060 . These methods minimize an energy function and find a solution at a local optimum. This does not provide any measure of the degree of confidence/uncertainty in that result and any information about the characteristics of the posterior density.

There are a limited number of Markov chain Monte Carlo (MCMC) based image segmentation methods in the literature. Most of these methods generate samples from the posterior density by assuming the prior density is uniform fan2007mcmc , chang2011efficient , chang2012efficient . The only sampling-based segmentation approach in the literature that uses a shape prior is the one proposed by Chen et al. chen2009markov . However, the method cannot handle topological changes in shapes.

Our major contribution in this paper is a pseudo-marginal Markov chain Monte Carlo (MCMC) sampling-based image segmentation approach that exploits nonparametric shape priors. The proposed approach is able to characterize the posterior density through its samples. Our approach can learn from very large data sets efficiently by using pseudo-marginal sampling. To the best of our knowledge, this is the first approach that performs pseudo-marginal MCMC shape sampling-based image segmentation through an energy functional that uses nonparametric shape priors and level sets. Also, unlike other MCMC sampling-based segmentation approaches in the literature, the proposed approach perfectly satisfies the necessary conditions to implement MCMC sampling which is a crucial step for developing an MCMC sampler.

A precursor to this work was presented in erdil2016mcmc . This paper advances that prior work in two major ways: (1) while erdil2016mcmc approximately satisfies the necessary conditions of MCMC sampling, the approach presented in this paper perfectly satisfies these conditions; (2) unlike erdil2016mcmc we use pseudo-marginal sampling to be able to learn shape densities from very large data sets; the method in erdil2016mcmc becomes inefficient with large training data.

2 Model and problem definition

The image segmentation problem involves estimating an unknown segmenting curve for an object given an observed image where is the set of the values that the pixels of can take. We are particularly interested in problems in which the prior shape distribution has components corresponding to different object classes. We denote the class of the object by where is the total number of classes, which is known. For simplicity we assume that

has a uniform distribution over

so that .

We ultimately aim to estimate a binary segmenting curve where ’s indicate the background and ’s indicate the object. The conditional density of given is independent from and is denoted by . We construct this density based on the piecewise-constant version of the Mumford-Shah functional mumford1989optimal , chan2001active which is a very common data fidelity term for image segmentation. To present the curves, we use level sets, which we define as a mapping . Our choice of level sets is to handle topological changes and use gradient flows effectively in our methodology. We denote the level set of as .

We also have a training set of binary curves that are grouped into classes, , where each is the collection of segmented curves for class . Let us also define the level set representation of the training set as , where each with , the level set representation of . Now we can define the prior distribution for given the class as , where is a Gaussian density with mean and covariance matrix . This prior corresponds to a mixture of kernels with centers with kernel size kim2007nonparametric . To determine the kernel size , we use an ML kernel with leave-one-out silverman1986density . Let us also define as the pseudo-inverse111Note that is not invertible, so we define a pseudo-inverse. of such that . We use to rewrite the conditional density of the data in terms of as . We can also write the joint density of , , and as .

The Bayesian image segmentation problem can be formulated as finding the posterior distribution of given which is . However, estimating can be difficult since the summation over classes makes the distribution hard to infer, e.g., using Monte Carlo sampling methods. Alternatively, we aim for the joint posterior distribution of and given , , whose marginal is still the desired posterior .

3 Methodology

3.1 Metropolis-Hastings within Gibbs

We aim to sample from using Gibbs sampling by sampling from the conditional densities and in an alternating fashion geman1984stochastic , gelfand1990sampling . However, since the full conditional is hard to sample from, we update by using a Metropolis-Hastings (MH) move, which leads to the well known Metropolis-Hastings within Gibbs (MHwG) algorithm geweke2001bayesian .

Both conditional densities in MHwG involve which needs to be evaluated during MH updates. This can be too costly when

is large, which occurs when we have a big training set. Therefore, towards a more computationally efficient MCMC algorithm that scales with the training data size, we consider the following unbiased estimator of

via subsampling: where is a set of subsamples generated via sampling without replacement and . This approximation of the prior leads to the approximation of the conditional posterior densities and . Using this approximation does not generally guarantee that the Markov Chain has an equilibrium distribution that is exactly . To achieve a correct MH algorithm which uses the approximation and still targets , we adopt the pseudo-marginal MH algorithm of andrieu2009pseudo ; the details are described in the following section.

3.2 Pseudo-marginal MHwG

Assume that we have a non-negative random variable

such that given and , its conditional density given and , , satisfies . We choose as an approximation to , in particular where is defined in Section 3.1, and its probability density corresponds to the generation process of this approximation. (It will become clear that in fact we do not have to calculate at all but we should be able to sample from it.) Note that is a random variable since we generate randomly when computing . We can define the extended posterior density with the new variable added as . When we integrate out, we see that samples for and from will admit the desired posterior : . Now, the problem of generating samples from can be replaced with the problem of generating samples from .

We propose a pseudo-marginal MHwG sampling procedure to generate samples from . Note the important remark that this algorithm also targets , hence exactly. There are two major steps of the proposed approach which we explain in detail in the following: (1) condition the posterior on and update and , (2) update and by conditioning the posterior on .

Update step for the class and : The distribution from which we sample and in Metropolis-Hastings is . We can write this distribution as . Since we regard the proposal mechanism as a joint update of and , the proposal generates from the density . Note that denotes the candidate samples generated from the proposal distribution. We take as a uniform distribution and andrieu2009pseudo . Once and are sampled, they are either accepted with probability


or the current values of and are kept. The probabilities in the MH ratio in (1) can be computed exactly which guarantees satisfying the necessary conditions to implement MCMC sampling.

Update step for the level set and : The next step is to sample and from the conditional density . To achieve this, we perform Metropolis-Hastings sampling as we use for sampling and . The conditional density can be written as . Also, for joint sampling of candidates we can write the proposal density as where is sampled uniformly from , and is generated using subsampling. Then, the Metropolis-Hastings ratio can be computed as follows:


Design of the proposal distribution: The crucial part in (2) is designing the proposal distribution to generate a candidate curve from . Let us define an energy function whose gradient drives the curve to the desired location using the data and the training images in a particular class as . When the training data set is too large, calculating may be too expensive as discussed earlier. An unbiased estimator of would be obtained as where . The proposal distribution after sampling the training image in class is then constructed as . Here, the shift term is an approximation222Note that the term is a discrete approximation w.r.t. the level set representation chan2001active , so that the whole expression is an approximation to . to the gradient w.r.t.  and is given by where is the element-wise power operation.

In the design of , the most important part is selecting a covariance matrix, , that generates smooth perturbations since smoother curves are more likely kim2007nonparametric , fan2007mcmc . In the proposed approach, we compute a positive semi-definite covariance matrix such that it generates smooth random perturbations. Given an image, we first generate an matrix

by drawing i.i.d. samples from a unit Gaussian distribution. Then, we construct another

matrix where each column of is a smoothed version of each row in . By assuming is constructed by multiplying by a matrix , we can find the matrix as since is generally invertible. Given , a covariance matrix can be computed by . However, generally is not positive semi-definite since is not generally a full rank matrix. Therefore, we find the closest positive semi-definite matrix to using the approach in highamNearest which we take it as .

4 Experimental results

We compare running time of the proposed pseudo-marginal sampling approach and conventional sampling approach which uses all training examples to estimate the prior shape density. We perform experiments on the MNIST data set lecun1998gradient which contains 60,000 training examples. We construct training sets with various sizes from to by randomly selecting equal number of samples from each digit class. We generate samples using both the pseudo-marginal sampling and conventional sampling on a test image by using training sets with different sizes. The plot in Figure 1 shows average running time as a function of training set size for both pseudo-marginal sampling and conventional MCMC sampling. The average single sample generation time of the proposed approach does not change as the training set size increases since we choose in all experiments. We also measure the segmentation accuracy of all samples with the ground truth using Dice score dice1945measures . Average Dice score333Note that Dice score takes values in where indicates the perfect match with the ground truth. result for pseudo-marginal sampling is whereas it is for the conventional sampling. The very slight decrease in Dice results of pseudo-marginal sampling can be acceptable in many applications when the huge gain in running time is considered.

Figure 1: Average running time for producing a single sample as a function of training set size.
(a) Test image
(b) Kim et al. kim2007nonparametric
(c) The proposed approach
Figure 2: Marginal confidence bounds obtained by samples on a test image. Note that in (c) red indicates the least and blue indicates the most confidence boundaries.

We also run the proposed approach on the test image in Figure 2(a) and generate samples. The algorithm generates samples from digit classes and as shown in Figure 2(c). The optimization-based approach of Kim et al. kim2007nonparametric finds a single segmentation solution shown in Figure 2(b). This experiment shows that our algorithm can produce samples from different modes.

5 Conclusion

We propose a pseudo-marginal Markov chain Monte Carlo (MCMC) sampling-based image segmentation approach that exploits nonparametric shape priors. The proposed approach generates samples from the posterior distribution to avoid shortcomings of the optimization-based approaches which include getting stuck at local optima and being unable to characterize the posterior density. The proposed MCMC sampling approach deals with all these problem while being computationally efficient, unlike the conventional MCMC approaches, by using pseudo-marginal sampling principles. Moreover, our pseudo-marginal shape sampler perfectly satisfies the necessary conditions to implement MCMC sampling which is crucial for ensuring the generated samples come from the desired distribution. Existing methods in the literature only approximately satisfy these conditions.


  • (1) Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics pp. 697–725 (2009)
  • (2) de Bruijne, M., van Ginneken, B., Viergever, M.A., Niessen, W.J.: Adapting active shape models for 3D segmentation of tubular structures in medical images. In: Information Processing in Medical Imaging, pp. 136–147. Springer (2003)
  • (3) Chan, T.F., Vese, L., et al.: Active contours without edges. IEEE transactions on Image processing 10(2), 266–277 (2001)
  • (4) Chang, J., Fisher, J.: Efficient MCMC sampling with implicit shape representations.

    In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2081–2088. IEEE (2011)

  • (5) Chang, J., Fisher III, J.W.: Efficient topology-controlled sampling of implicit shapes. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2012)
  • (6) Chen, S., Radke, R.J.: Level set segmentation with both shape and intensity priors. In: International Conference on Computer Vision, pp. 763–770. IEEE (2009)
  • (7) Chen, S., Radke, R.J.: Markov chain Monte Carlo shape sampling using level sets. In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 296–303. IEEE (2009)
  • (8)

    Cremers, D., Osher, S.J., Soatto, S.: Kernel density estimation and intrinsic alignment for shape priors in level set segmentation.

    International Journal of Computer Vision 69(3), 335–351 (2006)
  • (9) Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International journal of computer vision 72(2), 195–215 (2007)
  • (10) Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
  • (11) Erdil, E., Ghani, M.U., Rada, L., Argunsah, A.O., Unay, D., Tasdizen, T., Cetin, M.: Nonparametric joint shape and feature priors for image segmentation. IEEE Transactions on Image Processing (2017)
  • (12) Erdil, E., Yildirim, S., Cetin, M., Tasdizen, T.: MCMC shape sampling for image segmentation with nonparametric shape priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 411–419 (2016)
  • (13) Fan, A.C., Fisher III, J.W., Wells III, W.M., Levitt, J.J., Willsky, A.S.: MCMC curve sampling for image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 477–485. Springer (2007)
  • (14) Foulonneau, A., Charbonnier, P., Heitz, F.: Multi-reference shape priors for active contours. International journal of computer vision 81(1), 68–81 (2009)
  • (15) Gelfand, A.E., Smith, A.F.: Sampling-based approaches to calculating marginal densities. Journal of the American statistical association 85(410), 398–409 (1990)
  • (16) Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence (6), 721–741 (1984)
  • (17) Geweke, J., Tanizaki, H.: Bayesian estimation of state-space models using the Metropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Analysis 37(2), 151–170 (2001)
  • (18) Heimann, T., Meinzer, H.P.: Statistical shape models for 3D medical image segmentation: a review. Medical Image Analysis 13(4), 543–563 (2009)
  • (19) Higham, N.J.: Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications 103, 103 – 118 (1988)
  • (20) Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1(4), 321–331 (1988)
  • (21) Kim, J., Çetin, M., Willsky, A.S.: Nonparametric shape priors for active contour-based image segmentation. Signal Processing 87(12), 3021–3044 (2007)
  • (22) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
  • (23) Mesadi, F., Cetin, M., Tasdizen, T.: Disjunctive normal parametric level set with application to image segmentation. IEEE Transactions on Image Processing 26(6), 2618–2631 (2017)
  • (24) Mesadi, F., Erdil, E., Cetin, M., Tasdizen, T.: Image segmentation using disjunctive normal Bayesian shape and appearance models. IEEE Transactions on Medical Imaging PP(99), 1–1 (2017)
  • (25) Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: European Conference on Computer Vision (ECCV), pp. 504–513. Springer (2008)
  • (26) Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on pure and applied mathematics 42(5), 577–685 (1989)
  • (27) Silverman, B.W.: Density estimation for statistics and data analysis, vol. 26. CRC press (1986)
  • (28) Soğanlı, A., Uzunbaş, M.G., Çetin, M.: Combining learning-based intensity distributions with nonparametric shape priors for image segmentation. Signal, Image and Video Processing 8(4), 789–798 (2014)
  • (29) Van Ginneken, B., Frangi, A.F., Staal, J.J., Romeny, B.M., Viergever, M.A.: Active shape model segmentation with optimal features. IEEE Transactions on Medical Imaging 21(8), 924–933 (2002)
  • (30) Wang, W., Shan, S., Gao, W., Cao, B., Yin, B.: An improved active shape model for face alignment. In: IEEE International Conference on Multimodal Interfaces, p. 523 (2002)
  • (31) Yang, R., Mirmehdi, M., Xie, X., Hall, D.: Shape and appearance priors for level set-based left ventricle segmentation. IET Computer Vision 7(3), 170–183 (2013)