the MAP estimate). Image segmentation, particularly when the signal-to-noise ratio (SNR) is low, is one such task. However, because the space of shapes is infinitely large, direct inference or sampling is often difficult, if not infeasible. In these cases, Markov chain Monte Carlo (MCMC) sampling approaches can be used to compute empirical estimates of marginal probabilities. Recently, Changet al.  derived an efficient MCMC method for sampling from distributions of implicit shapes (i.e. level sets). We improve upon that algorithm in two ways. First, we improve the convergence rate by defining a Gibbs-like iteration in which every sample is accepted and, second, we demonstrate how to efficiently incorporate both local and global topological constraints on sample shapes.
Many approaches formulate image segmentation as an energy optimization. One can derive a related Bayesian inference procedure by viewing the energy functional as the log of a probability function
where is the labeling associated with some segmentation, is the observed image, and the in the exponent depends on whether one is maximizing or minimizing. In PDE-based level-set methods, , where is the level-set function. In graph-based segmentation algorithms, such as ST-Mincut   or Normalized Cuts , is the label assignment. Often, the energy functional decomposes into a data fidelity term and a regularization of the segmentation (Normalized Cuts being an exception). Bayesian formulations treat the former as the data likelihood and the latter as a prior on segmentations:
While both the data likelihood and prior terms are user-defined, the form of the prior varies considerably depending on the optimization method; a curve-length penalty is typically used in level-set methods while a neighborhood affinity is typically used in graph-based methods. We refer to these two forms as the PDE-based energy and the graph-based energy, respectively.
Another form of prior information incorporates the topology of the segmented object. Han et al.  first showed this using topology-preserving level-set methods, where the topology of the segmented object was not allowed to change. This methodology was later extended in  to topology-controlled level-set methods, where the topology was allowed to change but only within an allowable set of topologies. To our knowledge, topology-constrained sampling methods have not previously been considered.
Current methods for sampling implicit shapes include , , and  which all use Metropolis-Hastings MCMC sampling procedures. As shown in , the methods of  and  converge very slowly and cannot accommodate any topological changes. , however, presented an algorithm called BFPS that acts directly on the level-set function. BFPS generates proposals by sampling a sparse set of delta functions and smoothing them with an FIR filter. Delta locations and heights were biased by the gradient of the energy to increase the Hastings ratio. Empirical results demonstrated that BFPS was orders of magnitude faster than  and , and representations, unlike the previous two methods, could change topology. Here, we show how to efficiently sample from a distribution over segmentations in both the PDE-based and graph-based energies. In contrast to previous MCMC samplers, proposals at each iteration are accepted with certainty, achieving an order of magnitude speed up in convergence. Additionally, we incorporate topological control so as to exploit prior knowledge of the shape topology.
2 Metropolis-Hastings MCMC Sampling
We begin with a brief discussion of two MCMC sampling algorithms (c.f.  for details). For notational convenience, we drop the dependence on in distributions. We denote as the target distribution, i.e. the distribution from which we would like to sample. MCMC methods are applicable when one can compute a value proportional to the target distribution, but not sample from it directly. Distributions defined over the infinite-dimensional space of shapes fall into this category. MCMC methods address this problem by constructing a first-order Markov chain such that the stationary distribution of that chain is exactly the target distribution. For this condition to hold, it is sufficient to show that the chain is ergodic and satisfies detailed balance. The Metropolis-Hastings sampler (MH-MCMC)  is one such approach. At time in the chain, the algorithm samples from some user-defined proposal distribution, , and assigns the transition probability:
where is the Hasting’s ratio, and the segmentation labels are and . As this transition probability satisfies detailed balance, the correct stationary distribution is ensured as long as the chain is ergodic. A single sample from the target distribution is then generated by repeating this proposal/transition step until the chain converges.
2.1 Gibbs Sampling
Gibbs sampling is a special case of MH-MCMC, where the acceptance ratio, , evaluates to one at every iteration. Gibbs proposals typically select a random dimension and sample from the target distribution conditioned on all other dimensions. Empirical evidence (as shown in Figure 1) indicates that when learning appearance models (e.g. in ), Gibbs sampling exhibits slow convergence times.
Block Gibbs sampling, where a group of dimensions are sampled conditioned on all other dimensions, allows for larger moves. The proposed algorithm is related to this type of Gibbs sampling.
3 Gibbs-Inspired Metropolis-Hastings
BFPS  achieves a high acceptance ratio at each iteration by biasing the proposal by the gradient of the energy functional. Here, we design a similar proposal such that samples are accepted with certainty. Given some previous level-set function, , we generate a proposal, with the following steps: (1) sample a mask, , that selects a subset of pixels and (2) add a constant value, , to all pixels within this mask. We can express this as
where the mask, , is a set of indicator variables with the same size as . The support of the mask can be of any shape and size, though in practice we use circles of random center and radius. Deferring the choice of , we write the proposal likelihood as
Figure 2 shows a notional mask overlaid on a level-set function, , with the height of plotted on the real line for pixels in the support of the mask.
The dotted line marks the zero-height which splits pixels into foreground and background. As all pixels in the support of the mask are summed with the same constant, , choosing an is equivalent to shifting the zero-height by . The resulting proposed zero-height can fall into one of different ranges, where counts the pixels in the support of the mask. The random range, , of the perturbation can take on integers in , where count the pixels in the support of the mask that have height .
Sampling a perturbation, , can then be decomposed into sampling a range followed by a value within that range. Most energy functionals only depend on the sign of pixels, which conveniently is determined solely by the range. If we choose
(i.e. a uniform distribution), we can write the perturbation likelihood as
where is the width or range (note that is deterministic conditioned on ). Because the value of within a range does not affect the sign of the resulting level-set function, the proposed labeling can be expressed as
By choosing the following Gibbs-like proposal for the ranges,
One subtlety exists in the selection of endpoints for the first and last range. Since these two ranges extend to , their corresponding is also . We therefore restrict both of these ranges to be of some finite length . While any finite value will suffice, in practice, we choose to be . With the endpoint selection for , another complication arises. Consider the illustrative distributions for the zero-height line shown in Figure 3.
In these examples, some constant is chosen, and the current zero-height line is marked with the dotted line. We can consider these two distributions as two different mask selections. Figure (a)a shows what we call a valid mask, in which the current zero-height line falls within the non-zero probability space. Figure (b)b, however, shows what we call an invalid mask, in which the current zero-height line falls outside the non-zero probability space. Here, one could propose a forward perturbation such that the resulting proposed zero-height line falls in the non-zero probability space. However, the backward transition to bring the proposed zero-height line back to the current zero-height line would be zero because it lies in the zero-probability space. We note that if is chosen to be some constant , and all pixels are initialized to have height , then every possible (initial) mask will be valid. As the chain progresses, some masks will become invalid, in which case we only sample valid masks. Because any mask that contains both positive and negative pixels is valid, there are always valid masks to sample from.
3.1 Algorithm Description
The resulting algorithm, called the Gibbs-Inspired Metropolis Hastings Shape Sampler (GIMH-SS), is described by the following steps: (1) Sample a circular mask, , with a random center and radius; (2) Sample a range, , according to Equation 9; (3) Sample a perturbation, , uniformly in this range; (4) Compute using Equation 5; (5) Repeat from Step 1 until convergence.
When calculating the likelihood of each range in Step 2, only a value proportional to the true likelihood is needed. Normalization can be performed after the fact because all ranges are enumerated. Equivalently, the energy, which is related to the likelihood via a logarithm, only needs to be calculated up to a constant offset. We therefore compute the change in the energy by including or excluding the pixels in each range. This can be efficiently computed for all ranges using two passes. On each pass, we begin at the range , setting this energy to zero and choosing to either go in the positive (first pass) or negative (second pass) direction. We then consider and compute the change in energy when the the one pixel changes sign. In graph-based energies, if an -affinity graph is used, only computations are required at each range evaluation, one for each edge. This procedure is iterated until all ranges are computed (i.e. up through ).
Unlike graph-based energy functionals, the curve length penalty used in PDE-based energy functionals can be a non-local and fairly expensive computation. In optimization-based segmentation algorithms, the evolution of the level-set is based on the gradient of the curve length (i.e. the curvature) which can efficiently be calculated locally. If a subpixel-accurate curve length calculation is required, then the energy functional will indeed depend on the value of the perturbation, , within a particular range, . As an approximation, we compute a pixel-accurate curve length by first setting all pixels on the boundary to have height , initializing a signed-distance function, and then calculating the curve length via
and the following discrete approximation to the function
If a first-order accurate signed-distance function is obtained using a fast marching method, the height at a particular pixel depends only on pixels in a 3x3 neighborhood. This dependence relationship is illustrated in Figure (a)a, where changing the sign of the center black pixel will affect the height of the signed distance function at all the gray pixels. From equation 10, the local curve length calculation depends on the magnitude of the gradient of the level-set function. If a centered finite-difference is used for the and directions, then the local curve length computation depends on neighbors above, below, left, and right. Thus, if the heights are changed for the gray pixels in Figure (a)a, the curve length computation changes for all the gray pixels in Figure (b)b. The change in the curve length for changing the sign of the center pixel is consequently a function of the sign of the 21 pixels in Figure (b)b. We precompute all possible ( million) combinations so that the curve length penalty can be efficiently computed by a simple table lookup.
3.2 Convergence Speeds
To illustrate the performance of GIMH-SS, we run the sampling algorithm on the image shown in Figure 5. We note that this is the same image used in the comparison of . We run each of the four algorithms for seconds. As in , an approximate algorithm is used for  and  that does not necessarily ensure detailed balance. Certain random initializations require topological changes to converge to the correct segmentation, which neither  and  can not accomodate. In these situations, samples are accepted with a certain probably instead of instantly being rejected. We note that without this approximation, the algorithms would never converge to the correct solution. While BFPS exhibits running times that are orders of magnitude faster than  and , GIMH-SS is almost an order of magnitude faster than BFPS. The average energy obtained using GIMH-SS both rises faster and converges faster than BFPS. Additionally, the gradient of the energy functional (which potentially can be difficult to evaluate) is never needed in the proposal.
3.3 Relation to Block Gibbs Sampling
Block Gibbs sampling first selects a mask of pixels (or dimensions) to changes, and then samples from the target distribution conditioned on all other pixels. In binary segmentation, a block of size would require one to evaluate an exponential number () of different configurations, which is intractable for a reasonably sized mask.
The GIMH-SS algorithm similarly selects a mask of pixels. The level-set function orders the masked pixels, in that, if a pixel of height changes sign, all pixels of height must as well. Consequently, this algorithm samples from a subset of the conditional distribution, resulting in a linear number () of different configurations. We note that ergodicity is ensured because the ordering of pixels by the level-set function changes over time.
4 Topology-Controlled Sampling
In this section, we extend GIMH-SS to a topology-controlled sampler. The topology of a continuous, compact surface is often described by its genus (i.e. the number of “handles”). Digital topology 
is the discrete counterpart of continuous topology, where regions are represented via binary variables on a grid.
In digital topology, connectiveness must be defined for the foreground (FG) and background (BG) describing how pixels in a local neighborhood are connected. For example, in 2D, a 4-connected region corresponds to a pixel being connected to its neighbors above, below, left, and right. An 8-connected region corresponds to being connected to the eight pixels in a 3x3 neighborhood. Connectiveness must be jointly defined for the foreground () and background () to avoid topological paradoxes. As shown in , valid connectivities for 2D are . Given a pair of connectiveness, the topological numbers  at a particular pixel, (for the FG) and (for the BG) count the number of connected components a pixel is connected to in a 3x3 neighborhood. Figure 6 shows a few neighborhoods with their corresponding topological numbers.
While these topological numbers reflect topological changes, they do not distinguish splitting or merging of regions from the creation or destruction of a handle. Segonne  defines two additional topological numbers, and , which count the number of unique connected components a pixel is connected to over the entire image domain. and depend on how pixels are connected outside of the 3x3 region and allow one to distinguish all topological changes.
By labeling each connected component in the foreground and background, Segonne shows that can be computed efficiently when a pixel is added to the foreground and when a pixel is added to the background. In 3D, one must calculate when adding a pixel to the foreground, which can be computationally expensive. We show here that this calculation is not necessary in 2D.
Consider the two pixels marked by and in Figure 7.
Removing the pixel from the foreground splits the region and removing the pixel destroys a handle. We emphasize that in this 2D case, both and . In 3D, this is generally not the case because for the pixel, the two background regions can actually be connected in another 2D slice of the volume. In fact, in 2D, the destruction of a handle in the foreground corresponds directly to a merging of regions in the background. Likewise, the splitting of regions in the foreground corresponds directly to a creation of a handle in the background. Because of this one-to-one mapping, we do not need to compute the expensive when adding a pixel to the foreground. Table 1 summarizes the topological changes of the foreground and background in 2D as a function of the four topological numbers.
|Add to FG||Add to BG|
4.1 Topology-Controlled GIMH-SS
In this section, we summarize how to sample from the space of segmentations while enforcing topology constraints (c.f. [Chang2012tech] for details). While the goal is similar to , a simple alteration of the level-set velocity in an MCMC framework will not preserve detailed balance. A naïve approach generates proposals using GIMH-SS and rejects samples that violate topology constraints. Such an approach wastes significant computation generating samples that are rejected due to their topology. We take a different approach: only generate proposed samples from the set of allowable topological changes. Recalling the discussion of ranges in GIMH-SS, one can determine which ranges correspond to allowable topologies and which do not. Ranges corresponding to restricted topologies have their likelihood set to zero in a topology-controlled version of GIMH-SS (TC-GIMH-SS). The methodology presented here treats the topology as a hard constraint. A distribution over topologies could be implemented by weighting ranges based on topologies rather than eliminating restricted ones.
Similar to GIMH-SS, TC-GIMH-SS also makes two passes to calculate the likelihoods of the ranges. Starting at , we either proceed in the positive or negative direction. Because corresponds to the range that does not change the sign of any pixels, will never correspond to a topological change. As the algorithm iterates through the possible ranges, a list of pixels that violate a topological constraint is maintained. If this violated list is empty after a range is considered, then the range corresponds to an allowable topology. At each iteration, while any pixel in the violated list is allowed to change sign, it is removed and all neighboring pixels in the violated list are checked again as their topological constraint may have changed. In this process, for each range, one must maintain the labels of each connected component.
While TC-GIMH-SS applies to almost any PDE-based or graph-based energy functional, we make a specific choice for demonstration purposes. Similar to , we learn nonparametric probability densities over intensities and combine mutual information with a curve length penalty. We impose four different topology constraints on the foreground: unconstrained (GIMH-SS), topology-preserving (TP), genus-preserving (GP), and connected-component-preserving (CCP). The TP sampler does not allow any topology changes, the GP sampler only allows the splitting and merging of regions, and the CCP sampler only allows the creation and destruction of handles. Typical samples from each of these constraints are shown in Figure 8. When the topology constraint is incorrect for the object (e.g. using TP or GP), the resulting sample may be undesirable (e.g. creating an isthmus connecting the two connected components of the background). When the topology is correct, however, more robust results can be obtained. For example, the CCP constraint removes some noise in the background.
The usefulness of the topology constraint relies on a valid initialization. Histogram images  display the marginal probability that a pixel belongs to the foreground or background where lighter colors equate to higher foreground likelihood. The computed histogram images are shown in Figure 9 for each of the topology constraints.
We initialize the samples either using a random circle containing the foreground (FG Init), or a random circle placed anywhere in the image (Random Init). While not always true, incorrect topology constraints are sometimes mitigated when looking at marginal statistics. For example, in the FG initialization case, the isthmus in the TP and GP constraints is no longer visible. Additionally, if the initialization only captures one connected component of the background (which is possible with random initialization), certain samplers prohibiting splitting regions (TP and CCP) will not be able to capture the entire region. This is reflected in the histogram image with the gray center.
Consider the low SNR image of Figure 10. Chang et al  demonstrated that sampling improves results when compared to optimization based methods in low SNR cases. When the problem is less likelihood dominated, as in this case, the prior has greater impact. The top row of Figure 10 shows the histogram images obtained using each of the topological constraints. One can see remnants of the isthmuses in the TP and GP cases.
Thresholding the normalized histogram image at a value reveals the quantile of the segmentation. For example, if , the resulting foreground region of the thresholded histogram contains pixels that are in the foreground for at least 90% of the samples. We show the 95%, 50%, and 5% quantile segmentations in Figure 10. Since reducing the threshold never shrinks the foreground segmentation, we can overlay these quantiles on top of each other. In the 5% quantile, we can clearly see the isthmuses in the TP and GP cases. This result is poor because the wrong topology (i.e. the wrong prior) was used. However, if we use the CCP constraint, we are able to improve results as compared to the unconstrained case by removing a lot of the background noise.
The CCP constraint is particularly useful when an unknown number of handles exist (e.g. deformable objects). Objects with a known number of handles in 3D projected onto a 2D plane can have any number of handles. We show two example images of a human and the resulting thresholded histogram image in Figure 11. In the first image, the handles formed by the arms are not captured well with TP and GP. In the second image, the vignetting causes the unconstrained and the GP to group some background with foreground.
We have presented a new MCMC sampler for implicit shapes. We have shown how to design a proposal such that every proposed sample is accepted. Unlike previous methods, GIMH-SS efficiently samples PDE-based and graph-based energy functionals and does not require the evaluation of the gradient of the energy functional (which may be hard to compute). Additionally, GIMH-SS was extended to include hard topological constraints by only proposing samples that are topologically consistent with some prior.
This research was partially supported by the Air Force Office of Scientific Research under Award No. FA9550-06-1-0324. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the Air Force.
-  J. Chang and J. W. Fisher III, “Efficient MCMC Sampling with Implicit Shape Representations,” in IEEE CVPR, 2011, pp. 2081–2088.
D. M. Greig, B. T. Porteous, and A. H. Seheult,
“Exact Maximum A Posteriori Estimation for Binary Images,”Jrnl. of the Royal Statistics Society, vol. 51, pp. 271–279, 1989.
-  Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. PAMI, vol. 23, pp. 1222–1239, November 2001.
-  J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. PAMI, vol. 22, no. 8, pp. 888–905, 2000.
-  X. Han, C. Xu, and J. L. Prince, “A topology preserving level set method for geometric deformable models,” IEEE Trans. PAMI, vol. 25, no. 6, pp. 755–768, 2003.
“Active contours under topology control–genus preserving level
Int. J. Comput. Vision, vol. 79, pp. 107–117, August 2008.
-  A. C. Fan, J. W. Fisher III, W. M. Wells, J. J. Levitt, and A. S. Willsky, “MCMC curve sampling for image segmentation.,” MICCAI, vol. 10, no. Pt 2, pp. 477–485, 2007.
-  S Chen and R. J. Radke, “Markov chain Monte Carlo shape sampling using level sets,” IEEE ICCV Workshops, pp. 296–303, 2009.
-  W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
-  J. Kim, J. W. Fisher III, A. Yezzi, M. Cetin, and A. S. Willsky, “A nonparametric statistical method for image segmentation using information theory and curve evolution.,” IEEE Trans. Image Proc., vol. 14, no. 10, pp. 1486–1502, 2005.
-  T. Y. Kong and A. Rosenfeld, “Digital topology: introduction and survey,” Comput. Vision Graph. Image Process., vol. 48, pp. 357–393, December 1989.
-  G. Bertrand, “Simple points, topological numbers and geodesic neighborhoods in cubic grids,” Pattern Recogn. Lett., vol. 15, pp. 1003–1011, October 1994.