Graph cuts provides an ingenious technique for image segmentation that relies on transforming the problem of energy minimization into the problem of determining the maximum flow or minimum cut on an edge-weighted graph. By using graph cuts, segmentations can be found efficiently in low-order polynomial time. When images are noisy, segmentation performed using only image data produces poor results. In these cases, regularization is needed.
Typically, when one is analyzing images, one has expectations for what he or she is looking for – people, birds, cars, cows, bacteria, whatever the object may be. In these cases it is natural to try to restrict segmentation to results that match the shape of the desired object. Within any class of objects there is often variability in shape. Such variability can arise even when dealing with shape-invariant objects, for example by changes in pose or perspective. Probabilistic expression of shape knowledge is a natural way of capturing this variability.
1.1 Related prior work
In Chang et al (2012), we presented a technique for tracking an object whose boundary motion is modeled using the level-set method. Such objects do not retain any particular a-priori shape. We remarked that one could adapt our regularization method to the segmentation of objects that do retain their shapes. The purpose of this article is to explicitly present a method for performing such a task.
Our method is grounded in the theory of graph cuts-based image segmentation with shape-based regularization, where segmentation is performed using a-priori shape knowledge. The prior literature on this subject is vast. Briefly, we mention a few influential articles. Slabaugh and Unal (2005) developed a graphcuts-based method for segmentation assuming that the desired object is an ellipse. Allowing for more flexibility, Freedman and Zhang (2005) provided a method to segment any particular shape based on the signed distance embedding. Some methods have concentrated on looking at shapes with certain characteristics such as compactedness or local convexity Veksler (2008); Das et al (2009). Other methods have treated the task of space specification as a statistical problem.
. Two of the most common forms of shape representation are kernel-PCA and kernel-based level-set embeddings. Both methods learn a probability distribution for the target shape based on a set of training shapes or templates. They differ in how they use the training images. Kernel PCA-based methods project the training templates into principal-component space and define probability distributions over the resulting vector spaces. Level-set-based methods define a distance between implicitly embedded shapes and define a probability distribution with respect to the distance.
One major disadvantage of the PCA-based approach is that the resulting probability distribution for the shape is Gaussian. Given a set of training templates, this approach will tend to favor shapes that are more similar to the space-averaged shape, thereby potentially making certain training shapes improbable. This problem also exists for exponential-family shape priors where the log-prior is a linear combination of the training templates. Combinations of shapes may not be valid shapes Cremers et al (2008), and under this approach certain training templates may become improbable as well. Instead, we opt for a kernel-density based approach.
Kernel density estimation (KDE) is a non-parametric method for the estimation of probability distributions. Its main advantage is its flexibility and its ability to capture multiple modes. Our method here, and that given in Chang et al (2012), can be described as a graph-cuts adaptation of the method of Cremers et al (2006). In their paper, they treated shape-prior specification as a density estimation problem and used KDE to derive a prior from an ensemble of training shapes. The resulting energy functionals, being nonlinear, require some intervention to use in a graph-cuts framework.
In this manuscript, we make the main contribution of demonstrating the use of majorization-minimization (MM) for the iterative relaxation of a nonlinear energy functional using graph-cuts. The MM algorithm is a generalization of the well-known expectation-maximization (EM) algorithm. It is often useful for linearizing nonlinear objective functions. We use MM to find a surrogate minimization problem that is solved using graph-cuts. In doing so, we expose a mathematical link between the energy functional ofCremers et al (2006), and the linear energy functionals commonly used for graph-cuts with statistical priors Malcolm et al (2007a); Dambreville et al (2008); Zhu-Jacquot and Zabih (2007). The subsequent use of graph cuts results in a computationally more efficient approach relative to that of using level set approaches Grady and Alvino (2009).
2 Mathematical Method
Signed-distance functions provide a handy tool to represent shapes mathematically. Shapes or regions can be implicitly represented by a function , which for every is the signed Euclidean distance from to the boundary of . In this paper we take the convention that is positive if , and negative if .
In many applications, since the shape of an object is only approximately known, it is advantageous to represent the knowledge of the shape probabilistically. We use a kernel density estimate of the distribution of possible shapes embedded as a collection of individual discretized level sets. Let us first denote
the characteristic function for a region,
Then, for two shapes and , embedded as signed distance functions and , we introduce an (asymmetric) pairwise shape energy
where . This expression is a generalization of other shape energies seen in the literature, where commonly Malcolm et al (2007b); Dambreville et al (2008); Zhu-Jacquot and Zabih (2007); Cremers et al (2008); Vu and Manjunath (2008); Lempitsky et al (2008); Heimann and Meinzer (2009); Lempitsky et al (2012). Although symmetric shape energies are desirable Cremers and Soatto (2003), for general use in our optimization scheme we require an asymmetric energy that does not depend on the Euclidean-distance shape embedding of the evolving segmentation. If one uses , then it is possible to make this energy symmetric with small modifications (see Discussion).
To make this energy robust to changes in scale, orientation, and location, a transformation that maps points in the reference frame of to points in the reference frame of is needed. In Appendix A, we provide a method for finding rigid transformations of the form
where is a scaling factor, is a rotation matrix, and is a translation vector. These three parameters are chosen to minimize Eq. 2.1 (See Appendix A for details on calculating these parameters). This method worked well for our examples; however, other shape alignment schemes may be used Lee and Park (1994); Jiang and Tomasi (2008); Viola and Wells III (1997); Tabbone and Wendling (2003); Belongie et al (2002).
After transformation, we can use the transformation-minimized energy in Eq. 2.1 to assemble references shapes into a probability distribution by using a kernel of the form
We can then represent a probability distribution over shapes, according to:
where are weighting coefficients that sum to one. This representation of the prior is a kernel density estimate Silverman (1986) (KDE) of the distribution of shapes. We set , the reciprocal temperature, to be the scale-normalized kernel width Silverman (1986); Cremers and Grady (2006):
where is the inertial scale of a shape as defined in Jiang and Tomasi (2008). Note that this expression of the shape prior is potentially multi-modal.
2.2 Generative image modeling
The task is to identify an object in an image , where refers to the dimension of the color space used. We model an image probabilistically with distributions of intensity values conditional on a parameter vector , and region:
Information about and
can be incorporated as prior probability distributionsand . With these prior distributions, we can write the joint posterior distribution , where is an energy (not to be confused with ).
We wish to infer the segmentation
by maximizing the posterior probability relative to. To this end, we will maximize the logarithm of the posterior, or equivalently, minimize the energy
The energy functional in Eq. 7 resembles the energy functional given by Cremers et al (2006), where level-set based gradient descent is used for relaxation. Unfortunately, optimization performed in this manner is slow, since only small deformations of an evolving contour occur in each iteration.
We take an iterative two-step approach to minimizing this energy. Given the segmentation estimate in the n-th step, we minimize the energy with respect to O’Hagan et al (2004).
Given , we find the optimal by iteratively minimizing
In Eq. 8, we have replaced logarithm-sum term of Eq. 7 with a linearized majorization term, by exploiting the concavity of the logarithm function. The majorization term defines a surrogate minimization problem that is more easily solved – in our case using graph cuts – albeit at the price of iteration. Such an algorithm is commonly known as a majorization-minimization or MM algorithm Hunter and Lange (2004) (see Appendix B). This approach is generally possible for other energies that contain components of the form where the function is convex, and is a functional that is dependent on .
2.4 Graph cuts for surrogate energy relaxation
Here, we describe minimization of the surrogate energy in Eq. 8
using graph cuts. Graph cut methods have their grounding in combinatorial optimization theory, and are concerned with finding the minimum-cost cut in an undirected graph. A cut is a partition of a connected graph into two disconnected sets. The cost of a cut is the sum of the edge weights along a cut, and a max-flow min-cut algorithm finds the cut with the lowest cost. To use graph cuts for image segmentation, we must express our energy function in terms of edge-weights on a graph. FollowingBoykov and Jolly (2001), we begin by expressing the energy given in Eq. 8 as a function of the vertices and edges of a graph :
In the graph that we construct, each pixel is assigned a node, and edges exist between nodes representing neighboring pixels (Fig 2). The neighbor edges are known as neighbor-links (n-links). Two special nodes called the source (foreground) and sink (background) are added, along with edges connecting these nodes to each pixel node (Fig 2). These edges are known as terminal-links (t-links).
We want to infer an unknown two-coloring on the nodes of the graph that represents inclusion of a node into either the foreground set , or the background set . This inference involves splitting the graph into two parts (Fig 2), where a pixel’s connection to the source represents inclusion into the foreground set.
Discretizing the shape divergence (Eq. 2.1) over an eight-connected neighbor graph yields
where refers to the set of tuples of nodes that are neighboring in space. The level set function is taken to be the Euclidean distance embedding of , transformed to the center, scale, and orientation of . As mentioned before, we use an asymmetric representation of the shape divergence since the signed-distance function is a property of a global segmentation and is not obtainable from local labeling.
Using this expression, one sees that
The neighbor energy given in Eq. 12 obeys the submodularity property, that is,
To embed into the graph, we set the weights of the t-links between each pixel and the source to the following
and the weights of the t-links between each pixel and the sink to the following
The cutting of the edge from an to the source implies that , so it adds to the cost of the cut by the contribution of into the total energy as if . In other words, these weights can be interpreted as a pixel’s strength of belonging to each region.
To embed into the graph, we set the n-links between pairwise neighboring pixels and to
Our surrogate energy is now minimized by finding the minimum cut of the graph. For details on how to perform this optimization, we refer the reader to Boykov and Kolmogorov (2004).
We have implemented our method in Java using the Fiji application programming interface Schindelin (2008) and tested it on the regularization of luminosity-based segmentation of a variety of objects in images (Fig 3). For comparison, we have obtained results of segmentation using a shape-free global length penalty (El Zehiry et al, 2007). In these examples, we have used Laplacian statistics for and uninformative priors for (by setting . The parameter was set to , and the inverse-temperature was set to . The algorithm was terminated when the labeling became stationary.
The first images we processed were low-contrast noisy images of a van. A van (Dodge B2500) was driven across a bumpy road and videotaped as it entered and exited the camcorder’s field of view. Snapshots of the recording were randomly extracted, showing the van in different regions an of the image, under different orientations and different scale factors. We manually constructed a shape prior for the van by hand-drawing five templates of a van in many poses. These templates of the van were aligned using the method detailed in Appendix A. Shape-free segmentation of the van captured the dorsal aspect of the van, but was inaccurate in delineating its ventral features. Using the shape prior, we obtained an admissible result that is recognizable as a van. Although a van maintains a rigid structure, its shape in images can still be considered probabilistic due to uncertainty in pose.
We also processed pictures of leaves from the Caltech 101 image database111http://www.vision.caltech.edu/Image_Datasets/leaves/. In the given examples, background objects interfere with the segmentation. Without a shape prior, segmentations based on pixel intensities include these background distractors. To construct templates, we hand-traced six representative shapes for each of the two types of leaves. The regularized segmentations that we obtained were able to control for the background distractors.
In Fig 4, we segmented a five-pointed leaf using a prior composed of both three-tip and five-tip templates. Although the mean shape of the hybrid-shape prior is a leaf with seven tips, the algorithm relaxed into the correct five-tip state.
We have presented a method for regularization of segmentation using a shape prior that is learned through kernel-density-estimation. Using a majorization-minimization trick, we showed how one can minimize the resulting energy, that includes a non-linear kernel-density term, using graph-cuts. Although there have been previous studies that include statistical shape priors in graph-cuts segmentation Zhu-Jacquot and Zabih (2007); Cremers and Grady (2006), these methods have used unimodal shape prior distributions where the input shape templates no longer retain their a-priori weights. Our shape-prior is a multi-modal kernel density estimator like the one used by Cremers et al (2006), thereby enjoying many of the benefits outlined in Cremers et al (2006).
The advantage of our method however is speed, which can be a significant limitation in applications involving “big data." It is also an issue in tracking applications, where many successive images need to be processed. Using an MM algorithm, we find surrogate energy functionals that can be minimized quickly using graph cuts. The surrogate energy functional (Eq 8) contains an iteratively re-weighted shape regularization term, where shapes that are the most-similar the current segmentation estimate are the most influential. In our examples, convergence of segmentation defined as stationarity of the labeling typically takes place in few () iterations.
Closer examination of Eq. 8 is revealing. Essentially, our algorithm takes advantage of the mathematical link between the relaxation of the nonlinear kernel-shape-prior energy functional of Cremers et al (2006) and iterative relaxation of the linear shape-prior energy functional found in many studies on incorporating shape priors into graph cuts Zhu-Jacquot and Zabih (2007); Cremers and Grady (2006); Vu and Manjunath (2008). As a result, our method has a computational complexity that is similar to that of these graph-cuts methods, which also rely on iterative template-alignment, while retaining many of the advantages of kernel-density estimation.
As in other iterative schemes for image segmentation, our optimization method is local. Consequently, the iteration can become trapped in local minima, as in the level-set method of Cremers et al (2006). It is important to note, however, that the notion of locality is different between this method and in the level-set method. In the level-set method, a boundary is propogated to steady state, where the boundary undergoes only small deformations between iterations.
In this method, the boundary is able to jump between iteratations Chang et al (2012). The locality of our approach refers to locality in the influence of the templates – the weights – as the segmentation iterates. Each iteration involves the optimization of an energy (Eq. 8) that is linear in . While the method we provide for minimization of this energy is local, because of the locality of the template-alignment procedure, other non-local techniques for minimizing the surrogate energy Lempitsky et al (2008) may be used. In this manner, one could trade-off speed for globality.
To prevent poorly-aligned templates from dominating over the likelihood, particuarly in early iterations, one may treat as an annealment parameter. Please see Vu and Manjunath (2008).
4.2 Computational considerations
Each iteration of the MM-algorithm is solved quickly using graph-cuts. Many efficient algorithms for this optimization exist, yielding excellent performance on modern hardware. There are many articles discussing the performance advantages of graph-cuts Boykov et al (2001); Grady and Alvino (2009). For even more speed, there are GPU-accelerated approaches Vineet and Narayanan (2008), and flow-reuse approaches Kohli and Torr (2007).
The minimization of the surrogate energy is not the bottleneck of our method. To unleash the full potential of this shape regularization method, one needs to optimize the processing steps that are in the periphery the graph cut optimization – the inference of the image intensity model, and the alignment of the templates.
As far as we know, other methods of shape-regularized segmentation also require these processing steps. Fortunately, these operations can be easily parallelized. Furthermore, they need not be repeated in every step of the algorithm. The image intensity parameters need only change if a significant proportion of the labels change. Similarly, the templates only need re-alignment if the segmentation has significantly changed. To improve performance, one may calculate the transformation needed for a single template, and use that transformation for the entire set of templates. One may also use a fast moment-based methodJiang and Tomasi (2008); Pei and Lin (1995) to provide an initial state for shape alignment.
4.3 Future directions
The MM algorithm is useful for linearizing objective functionals. In this paper, we have presented it as a technique for linearizing a nonlinear shape prior, yielding a linearized surrogate energy functional. It may prove to be a useful tool for the graph-cut community for the relaxation other useful nonlinear functionals. Finally, the MM and EM algorithms may have use in inference of more-robust likelihood models. In our opinion, it is an avenue worth investigating.
JC acknowledges support by grant number T32GM008185 from the National Institute of General Medical Sciences. JC and TC also acknowledge support from the the National Science Foundation through grants DMS-1032131 and DMS-1021818, and from the Army Research Office through grant 58386MA.
Appendix A Shape alignment
In order to align the shape templates, we choose the affine transformation that minimizes the shape energy
In this expression, we have rewritten the contour integral on in terms of an integral over the zero-level set of . Here, we develop a local Newton-Raphson algorithm for finding the optimal transform. Let us denote the column vector of transformation parameters . We estimate using the iterative updates
To populate the matrix, we need to calculate the gradients with respect to , , and . For , the first-order derivatives take the form
The sign function comes about from differentiation of the absolute value function in the distributional sense.
, the second-order derivatives constitute tensors that take the form
If , the first term is zero and the following term is added
The contour integrals in these expressions can be calculated using a regularized version of the Dirac delta function such as
Similarly, the characteristic function can be interpreted as the Heaviside function acting on the signed-distance shape embedding, which can be approximated using an approximation of the Heaviside function such as
The nonzero transformation derivatives are as follows
Appendix B MM algorithm for iterative graph cuts
The shape term can make minimization of the energy difficult, since its formulation involves a sum within a logarithm, making the energy functional nonlinear with respect to the labeling of pixels (into background and foreground) in the image.
To linearize the shape contribution, we will derive a surrogate function with separated terms. A function , with fixed , is said to majorize a function at if the following holds (Hunter and Lange, 2004)
We wish to perform iterative inference by finding a sequence of segmentations , where majorizes Eq. 7. By the descent property of the MM algorithm (Hunter and Lange, 2004), this sequence converges to a local minimum.
For any convex function , the following holds (Hunter and Lange, 2004)
Noting that is convex, we have
verifying that the inequality condition (Eq. 16) holds. In the case that , we have
- Belongie et al (2002) Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(4):509–522
- Boykov and Jolly (2001) Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In: Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, IEEE, vol 1, pp 105–112
- Boykov and Kolmogorov (2004) Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Pattern Analysis and Machine Intelligence, IEEE Transactions on 26(9):1124–1137
- Boykov et al (2001) Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 23(11):1222–1239
- Chang et al (2012) Chang J, Chou T, Brennan K (2012) Tracking monotonically advancing boundaries in image sequences using graph cuts and recursive kernel shape priors. Medical Imaging, IEEE Transactions on 13(5):1008–1020
- Cremers and Grady (2006) Cremers D, Grady L (2006) Statistical priors for efficient combinatorial optimization via graph cuts. Computer Vision–ECCV 2006 pp 263–274
- Cremers and Soatto (2003) Cremers D, Soatto S (2003) A pseudo-distance for shape priors in level set segmentation. In: 2nd IEEE workshop on variational, geometric and level set methods in computer vision, Citeseer, pp 169–176
- Cremers et al (2006) Cremers D, Osher S, Soatto S (2006) Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International journal of Computer Vision 69(3):335–351
Cremers et al (2008)
Cremers D, Schmidt F, Barthel F (2008) Shape priors in variational image segmentation: Convexity, lipschitz continuity and globally optimal solutions. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, pp 1–6
- Dambreville et al (2008) Dambreville S, Rathi Y, Tannenbaum A (2008) A framework for image segmentation using shape models and kernel space shape priors. IEEE transactions on pattern analysis and machine intelligence pp 1385–1399
- Das et al (2009) Das P, Veksler O, Zavadsky V, Boykov Y (2009) Semiautomatic segmentation with compact shape prior. Image and Vision Computing 27(1-2):206–219
- El Zehiry et al (2007) El Zehiry N, Xu S, Sahoo P, Elmaghraby A (2007) Graph cut optimization for the Mumford-Shah model. In: The Seventh IASTED International Conference on Visualization, Imaging and Image Processing, ACTA Press, pp 182–187
- Freedman and Drineas (2005) Freedman D, Drineas P (2005) Energy minimization via graph cuts: Settling what is possible. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, IEEE, vol 2, pp 939–946
- Freedman and Zhang (2005) Freedman D, Zhang T (2005) Interactive graph cut based segmentation with shape priors. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, IEEE, vol 1, pp 755–762
- Grady and Alvino (2009) Grady L, Alvino C (2009) The piecewise smooth mumford–shah functional on an arbitrary graph. Image Processing, IEEE Transactions on 18(11):2547–2561
- Heimann and Meinzer (2009) Heimann T, Meinzer H (2009) Statistical shape models for 3d medical image segmentation: A review. Medical image analysis 13(4):543–563
- Hunter and Lange (2004) Hunter D, Lange K (2004) A tutorial on MM algorithms. The American Statistician 58(1):30–37
- Jiang and Tomasi (2008) Jiang T, Tomasi C (2008) Robust shape normalization based on implicit representations. In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, IEEE, pp 1–4
- Kohli and Torr (2007) Kohli P, Torr PH (2007) Dynamic graph cuts for efficient inference in markov random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29(12):2079–2088
- Lee and Park (1994) Lee S, Park J (1994) Nonlinear shape normalization methods for the recognition of large-set handwritten characters. Pattern Recognition 27(7):895–902
- Lempitsky et al (2008) Lempitsky V, Blake A, Rother C (2008) Image segmentation by branch-and-mincut. Computer Vision–ECCV 2008 pp 15–29
- Lempitsky et al (2012) Lempitsky V, Blake A, Rother C (2012) Branch-and-mincut: Global optimization for image segmentation with high-level priors. Journal of Mathematical Imaging and Vision pp 1–15
- Malcolm et al (2007a) Malcolm J, Rathi Y, Tannenbaum A (2007a) Graph cut segmentation with nonlinear shape priors. In: Image Processing, 2007. ICIP 2007. IEEE International Conference on, IEEE, vol 4
- Malcolm et al (2007b) Malcolm J, Rathi Y, Tannenbaum A (2007b) Multi-object tracking through clutter using graph cuts. In: Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, IEEE, pp 1–5
O’Hagan et al (2004)
O’Hagan A, Forster J, Kendall M (2004) Bayesian inference. Arnold
- Pei and Lin (1995) Pei SC, Lin CN (1995) Image normalization for pattern recognition. Image and Vision Computing 13(10):711–723
- Schindelin (2008) Schindelin J (2008) Fiji is just ImageJ – batteries included. In: Proceedings of the ImageJ User and Developer Conference, Luxembourg
- Silverman (1986) Silverman B (1986) Density estimation for statistics and data analysis, vol 26. Chapman & Hall/CRC
- Slabaugh and Unal (2005) Slabaugh G, Unal G (2005) Graph cuts segmentation using an elliptical shape prior. In: Image Processing, 2005. ICIP 2005. IEEE International Conference on, IEEE, vol 2
- Tabbone and Wendling (2003) Tabbone S, Wendling L (2003) Binary shape normalization using the radon transform. In: Discrete Geometry for Computer Imagery, Springer, pp 184–193
- Veksler (2008) Veksler O (2008) Star shape prior for graph-cut image segmentation. Computer Vision–ECCV 2008 pp 454–467
- Vineet and Narayanan (2008) Vineet V, Narayanan P (2008) Cuda cuts: Fast graph cuts on the gpu. In: Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on, IEEE, pp 1–8
- Viola and Wells III (1997) Viola P, Wells III W (1997) Alignment by maximization of mutual information. International journal of computer vision 24(2):137–154
- Vu and Manjunath (2008) Vu N, Manjunath B (2008) Shape prior segmentation of multiple objects with graph cuts. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, pp 1–8
- Zhu-Jacquot and Zabih (2007) Zhu-Jacquot J, Zabih R (2007) Graph cuts segmentation with statistical shape priors for medical images. In: Signal-Image Technologies and Internet-Based System, 2007. SITIS’07. Third International IEEE Conference on, IEEE, pp 631–635