Image Segmentation by Discounted Cumulative Ranking on Maximal Cliques

09/24/2010 ∙ by Joao Carreira, et al. ∙ University of Bonn 0

We propose a mid-level image segmentation framework that combines multiple figure-ground hypothesis (FG) constrained at different locations and scales, into interpretations that tile the entire image. The problem is cast as optimization over sets of maximal cliques sampled from the graph connecting non-overlapping, putative figure-ground segment hypotheses. Potential functions over cliques combine unary Gestalt-based figure quality scores and pairwise compatibilities among spatially neighboring segments, constrained by T-junctions and the boundary interface statistics resulting from projections of real 3d scenes. Learning the model parameters is formulated as rank optimization, alternating between sampling image tilings and optimizing their potential function parameters. State of the art results are reported on both the Berkeley and the VOC2009 segmentation dataset, where a 28 achieved.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Segmenting an image into multiple regions has for long been considered a plausible precursor of many high level visual recognition routines. Indeed, if plausible image regions could be extracted so they would at least partly overlap the projections of visible surfaces in the scene, it would be conceivable that such interpretations can be later lifted to high-level scene percepts by invoking part-based object models and scene consistency rules. This has motivated research into (hierarchical) multipart image segmentations, for which many excellent methods are available [1, 2, 3, 4]

. But finding good multipart image segmentations in one step has proven difficult, partly due to the inherently local nature of the grouping process. The competition constraints implicit in various methods make it difficult to integrate scene constraints and mid-level grouping into early computations, and can influence results in ways that do not always correlate with scene properties. Learning segmentation models has also been problematic, partly because of insufficient support for reliable feature extraction and because inference, the inner core of learning, is usually very expensive.

The alternative computational framework we pursue assembles multipart image interpretations by tiling multiple figure-ground image segment hypotheses using mid-level scene constraints. The problem of hypothesis selection and consistent (full) image segmentation is formulated as optimization over sets of maximal cliques, sampled from a graph that connects non-overlapping image segments. By designing and learning clique potentials that encode both intrinsic, unary Gestalt segment properties and pairwise spatial compatibilities that account for plausible configurations of neighboring, spatially non-overlapping segments, we are able to eliminate many implausible image segments and tilings that cannot possibly arise from the projection of surfaces in typical, structured 3d scenes. We show that such a strategy achieves the state of the art in benchmarks like Berkeley and VOC2009.

1.1 Related work

Approaches to image segmentation include normalized cuts [1], mean shift [2] and minimum spanning trees [3]

. They are usually computed multiple times, to increase the probability that some of the retrieved segments capture full objects, or their significant parts in images. Another methodology to obtain multiple segmentations is to aggregate in a hierarchy, two well-known examples being multigrid methods

[5] and the Ultrametric Contour Maps [4]. The latter achieved state-of-the-art results in a number of challenging segmentation datasets. These algorithms partition the image into a number of regions by using pairwise pixel dependencies. Direct learning is usually targeted at finding the parameters of local affinities  [4, 6]. Other techniques work at coarser scales by optimizing over superpixels. This allow features to be computed over a larger spatial support. Ren and Malik [7] learn a classification model to combine superpixels based on their Gestalt properties. Hoiem et al [8] proposed a model that reasons jointly over scene geometry and occlusion boundaries, progressively merging superpixels so as to maximize the likelihood of a qualitative 3d scene interpretation. Instead our goal is complementary: a set of consistent full image segmentation hypotheses, computed based on mid-level Gestalt cues and implicit 3d constraints.

While multi-part image segmentation algorithms are most commonly used, a number of figure-ground methods have been recently pursued. Bagon et al [9] proposed an algorithm that generates figure-ground segmentations by maximizing a self-similarity criterion around a user selected image point. Malisiewicz and Efros [10] showed that good object-level segments could be obtained by merging pairs and triplets of segments from multi-part segmentations, but at the expense of generating also a large quantity of implausible ones. Carreira and Sminchisescu [11] generate a compact set of segments using parametric minimum cuts and learn to score them using region and Gestalt-based features. These algorithms were shown to be quite successful in extracting full object segments, suggesting that a promising research direction is to develop methods that combine multiple figure-ground segmentations (or just segments obtained at multiple scales, potentially from different methods), into plausible full image segmentations. Still missing is a formal multiple hypothesis computational framework for consistent selection (tiling) and learning, which we pursue here. Providing a compact set of multiple hypotheses rather than a single answer is desirable for learning, for high-level, informed processing and for graceful performance degradation.

Organization:

In sec. 2 we present our maximal clique formulation framework including both the search procedure and the parameterization of the clique potentials. Sec. 3 describes our ranking-based learning framework that alternates between sampling new tilings (a discrete optimization method) and optimizing the parameters of our clique potentials (a continuous problem) against the test error measure, here the full image segmentation quality. Sec 4 discusses our segment, mid-level unary and pair-wise terms based on Gestalt measures and the statistics of projected boundaries of 3d surfaces, including T-junctions and extremal edges. We show inference and learning statistics as well as experiments on the Berkeley and Pascal VOC 2009 segmentation datasets in sec. 5. We conclude with ideas for future work in sec. 6.

2 Image tiling as sampling maximal cliques

Given a set of segments our aim is to generate several tilings such that no two segments on overlap and has a high score . Consider for that a graph , called the consistency graph, where the vertices are the segments in . Two vertices are connected by an edge if the corresponding segments do not overlap.111While disallowing overlap increases the exposure to imperfect boundary alignments between the available segments, it leads to a dramatic reduction in the solution space and doesn’t require additional processing to assign pixels lying on the intersection of overlapping segments. A clique of , which is a fully connected subgraph of , corresponds to a set of segments that can form a tiling. A clique is called maximal222Also called inclusion maximal clique. if it is not included into any other clique and hence a larger clique cannot be obtained by adding vertices to it. In our case a maximal clique corresponds to a tiling that cannot be extended using any other segment in . A maximum clique of a graph is a clique with the largest number of vertices. A maximum weighted clique is a clique that maximizes the sum of weights associated to its vertices.

We formulate the search for tilings as finding maximal cliques with high potential

(1)

where ,

are feature vectors extracted for, respectively, segment

and image neighbors (denoted ). are the corresponding weights learned as mentioned in Section 3.

The problem of finding the maximum (weighted) clique of a general graph is known to be both NP-complete and hard to approximate to a given bound [12]. Existing algorithms produce one single solution which equals or approximates the maximum clique. In the weighted case maximization is done only over unary terms associated to vertices. This is different from our case. We desire multiple tilings for each image and the potential of a clique (tiling) depends on both unary and pairwise terms. Enumerating all cliques to find the optimum is not feasible as we deal with many vertices (over 150) and the complexity to enumerate all cliques of size of a graph with vertices is . Finding a maximal

clique can be done in linear time in the number of vertices, by starting with one vertex and adding each of the other vertices in some order. But graphs that have a large maximum clique can have maximal cliques of arbitrary small size. To obtain multiple estimates we follow a two step greedy approach:

(i) starting with each vertex generate a maximal clique; (ii) refine each solution using a local search in the space of maximal cliques based on the trained cost function. We generate up to different tilings, ranked in decreasing order of . Notice that our approach is based on established strategies to find approximations of the maximum clique (step 1 is known as a

sequential greedy heuristic

and step 2 as a local search heuristic [12]). Algorithm 1 describes the proposed method.

Input: Pool of segments , weights , features .

1:   segments in in decreasing order of /* unary terms */
2:  for  do
3:      /* initialize clique */
4:     /* Step 1: sequential greedy heuristic to “build” maximal clique */
5:     for  do
6:        if  does not overlap any segment in  then
7:           
8:        end if
9:     end for
10:     /* Step 2: local search heuristic for solution refinement */
11:     repeat
12:        for all  not overlapping  do
13:            /* remove segments that overlap , add */
14:            /* extend to a maximal clique like in lines 59 */
15:           if  /* see eq. 1 */ then
16:              
17:           end if
18:        end for
19:     until convergence
20:  end for

Output: Pool of tilings for the current image ranked in decreasing order of .

Algorithm 1 FG-Tiling() - Discrete optimization for image tilings.

Complexity:

The size of the largest clique that can be formed with a certain vertex is bounded by the degree of this vertex, in our case . If a set is kept, containing the segments in which are not overlapping any segment in , the complexity of step 1 is . Maximum steps are needed to build from the list of sorted segments and is an upper bound for the loop in step 1 and the verification inside.

Step 2 can be executed in where is the maximum number of iterations allowed333In experiments we use .. The inner loop over all is bounded by as must not overlap . Rejecting segments in overlapping with is also bounded by as all segments previously in are not overlapping . Finally, extending to a maximal clique has the same complexity as step 1, namely .

Ordering the segments is done only once, thus the complexity for running FG-Tiling for all segments is where the dominant worst case component is if is fixed. In practice our matlab implementation using takes on average 20 seconds per image for the BSDS test set.

3 Learning mid-level vision

Assume we are given a set of features computed, respectively, for segments and pairs of segments which are neighbors in the image, i.e. share a common boundary and do not overlap. We search for the weights such that the ranking of tilings induced by (eq. 1) is as close as possible to the ranking induced by the quality of the tilings with respect to the ground truth.

Input: Segments for the images in the training set , features , rank .

1:   /* Initialize weights */
2:  for all  do
3:      FG-Tiling() /* extract initial tilings for image */
4:  end for
5:  repeat
6:      /* optimization: find that maximizes */
7:     for all  do
8:         FG-Tiling() /* extract new tilings for image */
9:     end for
10:  until no improvement

Output: Weights .

Algorithm 2 Learning algorithm that estimates parameters .

The learning process alternates between the discrete optimization of tilings, where it runs FG-Tiling with the existing parameters to create a new pool of tilings for each of the images in the training set, and a continuous parameter optimization step that finds parameters which maximize an objective function on the produced tilings, as used for testing: the overlap with ground truth (Algorithm 2). Instead of aiming to enforce only the best tiling in the first position, which might be impossible, we design a scoring (with best as special case) that aims at ranking tilings in decreasing order of their quality. For an image , weights , and a pool of tilings where is the tiling at rank when sorting in decreasing order of the value of , the objective function is:

(2)

where is the quality of measured using the ground truth, is the weighting of rank , and is the rank parameter which determines the constraint we want to enforce (e.g. for only the best ranked, for a full K-ordering). We define as the average covering of with all ground truth segmentations as in  [4]. The covering is the sum of overlaps between each individual segment in a ground truth segmentation and the closest segment in a tiling, multiplied by the area of the ground truth segment. is the standard overlap measure between and  [13]. For rank weighting, we use: . This decay is similar to the Discounted Cumulative Gain (DCG) [14] that uses a logarithmic reduction factor of the form . DCG penalizes more aggressively the error in the first ranks. We found this to work slightly less well in our tests.444For an image segment graph with

nodes, clique potentials can be used to define a constrained probability distribution over partitions. We can write a Gibbs distribution over cliques as

, and can learn using ML, with partition functions approximated by summing only over cliques, computed by FG-Tiling

 (This approach will be presented in an upcoming technical report.). Here we choose a different loss that directly optimizes the overlap measure used during test time. Notice however, our very different use of cliques compared to product expansions in graphical models. Along this path, modeling the nodes as binary variables in a random field would neither produce the semantics we need, nor would necessarily lead to clique consistent inference.

4 Mid-level image descriptors

Our model aims to generate full image tilings that have properties similar to the ones of ground truth segmentations produced by human annotators. We use both unary features inspired by Gestalt properties and pairwise features sensitive to the boundary statistics arising from projections of 3d surfaces, for a total of unary and

pairwise features. These features are computed once and do not change during learning and inference. All features are individually normalized to zero mean and standard deviation

.

Unary Descriptors: As unary features, we primarily use the ones proposed in [11], that include the amount of contrast along the boundary of the segment (8 features), , region properties such as position in the image, area and orientation, (18 features), as well as Gestalt properties such as convexity and dissimilarity between the segment interior and the rest of the image in terms of intensity and texture, (8 features).

We complemented the unary features in [11] with a novel set of responses quantifying center-surround dissimilarity, . We define three image strips of width , and pixels around each segment. We compute how dissimilar each strip and the segment are according to different local features: hue, rgb, SIFT and textons. For each type of local feature and each strip, dissimilarity is determined as the chi-square distance between the histogram of quantized local features in the strip and in the segment, resulting in the features. The local features are sampled on a regular grid, every pixels. The color histograms use patches and pixels wide, while the SIFT patches are and pixels wide. The textons are the ones used in globalPb  [4] quantized into bins. We quantize the other features into bins, with the codebook being obtained in each image

at test time by k-means.


Pairwise Descriptors: We define a segment neighborhood between pairs of segments sharing a boundary and not overlapping. The occurrence of such pairs is usually non-accidental, particularly in our pool of figure-ground segmentations, because we don’t consider the ground. Segments that are artifacts of the particular parameter and location constraint that generated them will tend to have few neighbors. Computing this type of neighborhoods can be done robustly by growing all segments by a small amount (4 pixels in our implementation) and then detecting the pairs that overlap. The pairwise features capture the configuration of pairs of segments. We use two sets of pairwise features. The first encodes pairwise region properties such as relative area, position and orientation and is simply defined by (18 features).

We also employ features which signal occlusion. In ground truth segmentations, neighboring segments often correspond to projections of objects at different depths, which result in distinctive image statistics. These are sufficiently informative even for determining which of the two neighboring regions corresponds to the occluding surface in 3D space, the so called figure-ground assignment problem [15] [16]. The occluding segment usually has a higher convexity coefficient and is often surrounded by the occluded segment. Let be the unary convexity feature in . Then the relative convexity feature is implemented as . Let the length of the adjacent boundary between two segments be , and the segment perimeters be and . Then surroundedness is defined as . Another important occlusion features are t-junctions, boundary patterns shaped as a T, usually caused by the intersection of the boundaries of two objects in an occlusion relationship. Typically the location of the leg of the T indicates which segment is occluding the other. T-junctions were used in recent approaches to figure-ground assignment, as an energy term for triplets of regions in CRFs [15] [16] [17]. Here we model them directly as a pairwise segment compatibility feature, by measuring the consistency with which the leg of the t-junctions belongs to the same segment, weighted by the quality of the fitting of the junction to a T, as opposed to being Y-like. The feature is defined as , with the sums being over all junctions between the pair of segments. The weighting is , being the angle formed by the leg of the junction with the base. When the leg of the junction is on the boundary separating both segments, or the leg is not on the boundary of segment then is set to . Junctions are hard to detect when considering pixel intensities locally, even for humans [18]. But given a pair of neighboring segments this can be done robustly, as illustrated in fig. 1.

The shading along region borders was shown to provide information about occlusion in both computational [19] and psychophysical tests, under the name of extremal edges [20]. The phenomenon is explained by the illumination gradient tending to be orthogonal to the boundary, on the occluding side. We implement the gradient orthogonality feature as in [16] and produce the compatibility feature as . The absolute value is computed because we’re not interested here in determining which segment is in front, just in having an occlusion indicator.

         

Figure 1: Our T-junction detector works on all pairs of non-overlapping and spatially neighboring segments. In order to detect junctions, we grow the two regions plus their shared background, sum the three binary masks, and find the points in the image where the sum is maximized (first image on the left). These are initial junction points, and are improved by solving a least squares problem minimizing the distance to the closest line segments approximating the boundaries of the two regions (image and ). To form the base and the leg of the T, these line segments are clustered into two sets based on their orientation, using agglomerative clustering, and a line is fit to each cluster (image ). The cluster having a line segment endpoint with maximum minimal deviation from the junction along its fitted line is set as the base of the T. The final result is shown on the last image on the right.

5 Experiments

Our inference and learning methods were tested on the Berkeley Dataset (BSDS) [21] and on the Pascal VOC 2009 Segmentation Dataset (VOC2009) [13]. For comparison we show results of the Oriented Watershed Transform Ultrametric Contour Maps using globalPb as contour detector (gPb-owt-ucm) [4].

We generate a pool of segments using the publicly available implementation of Constrained Parametric Min-Cuts (CPMC) [11], which produces nested sets of segments around rectangular seeds on a regular grid with predicted qualities for each segment. Per image an average of 194 segments is generated for the BSDS test set and 156 segments for the VOC2009 validation set. This algorithm was recently shown to produce compact sets of segments that accurately cover ground truth objects.

Fig. 2 shows the evaluation of FG-Tiling and two baselines, Enum-1min and Constrained-random, on the BSDS dataset (see sec. 5). All methods produce maximal cliques i.e. tilings with segments that do not overlap and the cliques cannot be extended using the current pool of segments. For each method the produced tilings are ranked using the scoring function in eq. 1.

Enum-1min is an algorithm that recursively, exhaustively, enumerates maximal cliques until the given time of 1 minute per image is reached and returns the highest scoring cliques that have been found555The time of 1 minute given to Enum-1min is equal to 3 the average time of FG-Tiling on the BSDS test set. Without the time constraint the algorithm did not finish enumerating cliques after 48 hours on a test image where a pool of figure-ground segmentations had been used.. Similar to line 1 of FG-Tiling, Enum-1min first sorts the segments based on . During enumeration, it quickly finds one tiling similar to the result of step 1 in FG-Tiling. However, within 1 minute, it produces only small variations of the same tiling, as seen also in fig. 2, right. Constrained-random is similar to step 1 in Algorithm 1 with the difference that in line 1 the order of the segments is randomized. The method gets a few “lucky shots“ which explains the quite high values in the plot in fig. 2 left, but overall the average quality of the produced tilings is much lower than the other two methods (23% less than FG-Tiling on the test set of BSDS). FG-Tiling balances the diversity and quality of the produced tilings to give the best results of all methods.

Figure 2: Evaluation of FG-Tiling, Constrained-random, and Enum-1min on the BSDS dataset. Left: highest quality for number of tilings considered. Center: average quality for given rank (if exists). The quality was measured with respect to the ground truth using the covering measure (see sec. 5 for details). Right: histogram of pairwise similarity between produced segmentations.

During learning, for the initial run of FG-Tiling we set the weights corresponding to the pairwise terms to zero. The weights

corresponding to the unary terms are set using linear regression s.t.

approximates the response where is the set of ground truth segments for the image. Parameter optimization is done using a Quasi-Newton method. During this step, the sum of over all images in the training set and their corresponding pools of tilings is maximized. The first time this step is executed, the initial weight estimates required to initialize the search are obtained using linear regression over all tilings produced for the training set. Regression uses targets for each tiling . The inner loop (line 6) needs on average 15 iterations to converge. The outer loop (lines 510) saturates after a few iterations (3–4) and both the quality of the first ranked tiling as well as the highest quality over all tilings for each image are maximized.

Figure 3: Left, center: progress of learning for rank optimization on the BSDS dataset. Left: progress of the first ranked and the highest quality tilings on the training and testing sets. Iteration 0 corresponds to the results with the initial weights , iteration 1: the same tilings after the first optimization step, iterations 2–3: after new tilings and learned weights. Center: highest quality vs. number of segmentations retained on the BSDS test set. Right: highest quality vs. number of segmentations retained with no rank learning, learning with rank parameter and .

Fig. 3 shows the progress of learning on the Berkeley Segmentation Dataset (BSDS) [21] using and a comparison of the results: without learning, learning with and with . We observe that compared to , produces a slightly better ranking also on the first position, presumably due to the additional constraints from lower ranks.

BSDS OIS First ODS BIS
max. possible 0.73 0.73 0.73 1.00
gPb-owt-ucm 0.64 - 0.58 0.74
FG-Tiling 0.64 0.58 - 0.78
VOC2009 OIS First ODS BIS
max. possible 1.00 1.00 1.00 1.00
gPb-owt-ucm 0.58 - 0.45 0.61
FG-Tiling 0.74 0.52 - 0.78
Table 1: Average coverings on the test set of BSDS and on the validation set of VOC2009. The OIS scores for FG-Tiling are obtained considering a maximum of 64 respectively 73 tilings per image, which equals the average number of segmentations produced by gPb-owt-ucm [4] on the BSDS test respectively VOC2009 validation sets (notice however that our method uses considerably fewer segments, on average 194 respectively 156 as opposed to 1100 and 1043 in gPb-owt-ucm). If FG-Tiling uses the same number of segments but more tilings (on average 176 and 140 in BSDS and VOC2009 respectively), OIS scores of 0.66 respectively 0.76 are obtained.

Table 1 shows results of benchmarks on the test set of BSDS and on the validation set of VOC2009. The values represent average covering scores of ground truth segmentations by the output segmentations. BIS measures the best covering of the ground truth segmentations by individual segments from any segmentation produced by the evaluated method. OIS and ODS have been used in [4] to evaluate the results of gPb-owt-ucm. They have been introduced in the context of hierarchical segmentation, where scale is used to navigate from coarser to finer segmentations. The optimal image scale (OIS) measures for each image the quality of the produced segmentation that best covers the ground truth. The optimal dataset scale (ODS) measures the quality of the segmentations when the same scale is selected for all images. The scale to be evaluated is chosen to maximize the score on the test set. ”First“ evaluates the results using the predicted best segmentation for each image. ”First“ is only applicable to our method, since the segmentations from gPb-owt-ucm don’t have associated scores to select a single segmentation. ODS is not applicable to our method, as FG-Tiling generates independent segmentations. Note that ”First“ does not use any ground truth information to select the tiling to be evaluated for each image.

The BSDS dataset has multiple ground truth (human) segmentations for each image. To evaluate the quality of a segmentation, the average over all ground truth segmentations for that image is considered. As the provided human segmentations are different, the upper bound for OIS, ”First“, and ODS on the BSDS test set are 0.73. A score of 1.00 for BIS could be obtained by generating segments that perfectly cover all ground truth segments.

The results obtained by FG-Tiling are competitive on BSDS and superior on the VOC2009. Note that the given VOC2009 scores are not using the ”segmentation challenge“ evaluation which requires recognition, but evaluating the quality of unlabeled segmentations like the method we compare with [4]. The results of gPb-owt-ucm on VOC2009 have been computed by us using the code provided by the authors and are consistent with their published results on VOC2008.

6 Conclusions

We have proposed a mid-level computational learning and inference framework for image segmentation that tiles multiple figure-ground hypotheses into a complete interpretation. The inference problem is formulated as searching for high-scoring maximal cliques in a graph connecting non-overlapping putative figure/ground hypotheses. Clique potentials are based on both intrinsic Gestalt segment quality and compatibilities among neighboring image segments, as derived from statistics of 3d scene boundaries. Learning is formulated as optimizing the ranking of the best-K hypotheses, directly on the testing error, measuring the overlap between image tilings and the ground truth human annotations. We have empirically analyzed the performance of our learning and inference components and have shown that these achieve state of the art results in the Berkeley and the VOC2009 segmentation benchmarks. In the latter the proposed method improves on the state-of-the-art by 28% when considering the full set of generated tilings, and by 16% for the predicted best tiling. In future work we plan to combine segmentation and partial recognition in order to be able to interpret images that contain both familiar and unknown objects.

References

  • [1] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
  • [2] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002.
  • [3] P. Felzenszwalb and D. Huttenlocher. Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2):167–181, 2004.
  • [4] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. From contours to regions: An empirical evaluation.

    IEEE International Conference on Computer Vision and Pattern Recognition

    , 2009.
  • [5] E. Sharon, M. Galun, D. Sharon, R. Basri, and A. Brandt. Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104):719–846, 2006.
  • [6] T. Cour, N. Gogin, and J. Shi. Learning spectral graph segmentation. In

    IEEE International Conference on Artificial Intelligence and Statistics

    , 2005.
  • [7] X. Ren and J. Malik. Learning a classification model for segmentation. IEEE International Conference on Computer Vision, 2003.
  • [8] D. Hoiem, A. Efros, and M. Hebert. Recovering surface layout from an image. International Journal of Computer Vision, 75(1):151–172, 2007.
  • [9] S. Bagon, O. Boiman, and M. Irani. What is a good image segment? a unified approach to segment extraction. In European Conference on Computer Vision, 2008.
  • [10] T. Malisiewicz and A. Efros. Improving spatial support for objects via multiple segmentations. In British Machine Vision Conference, 2007.
  • [11] J. Carreira and C. Sminchisescu. Constrained Parametric Min-Cuts for Automatic Object Segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
  • [12] I. Bomze, M. Budinich, P. Pardalos, and M. Pelillo.

    Handbook of Combinatorial Optimization

    , chapter The Maximum Clique Problem, pages 1–74.
    Kluwer Academic Publishers, 1999.
  • [13] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html.
  • [14] K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422–446, 2002.
  • [15] X. Ren, C. Fowlkes, and J. Malik. Figure/ground assignment in natural images. In European Conference on Computer Vision, 2006.
  • [16] I. Leichter and M. Lindenbaum. Boundary ownership by lifting to 2.1d. In IEEE International Conference on Computer Vision, 2009.
  • [17] D. Hoiem, A. Stein, A. A. Efros, and M. Hebert. Recovering occlusion boundaries from a single image. In IEEE International Conference on Computer Vision, 2007.
  • [18] Josh McDermott. Psychophysics with junctions in real images. Journal of Vision, 2(7):131–131, November 2002.
  • [19] P. Huggins, H. Chen, P. Belhumeur, and S. Zucker. Finding folds: On the appearance and identification of occlusion. In IEEE International Conference on Computer Vision and Pattern Recognition, 2001.
  • [20] T. Ghose and S. Palmer. Surface convexity and extremal edges in depth and figure-ground perception. Journal of Vision, 5(8):970–970, September 2005.
  • [21] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE International Conference on Computer Vision, 2001.