Evaluation Framework of Superpixel Methods with a Global Regularity Measure

03/17/2019 ∙ by Remi Giraud, et al. ∙ 0

In the superpixel literature, the comparison of state-of-the-art methods can be biased by the non-robustness of some metrics to decomposition aspects, such as the superpixel scale. Moreover, most recent decomposition methods allow to set a shape regularity parameter, which can have a substantial impact on the measured performances. In this paper, we introduce an evaluation framework, that aims to unify the comparison process of superpixel methods. We investigate the limitations of existing metrics, and propose to evaluate each of the three core decomposition aspects: color homogeneity, respect of image objects and shape regularity. To measure the regularity aspect, we propose a new global regularity measure (GR), which addresses the non-robustness of state-of-the-art metrics. We evaluate recent superpixel methods with these criteria, at several superpixel scales and regularity levels. The proposed framework reduces the bias in the comparison process of state-of-the-art superpixel methods. Finally, we demonstrate that the proposed GR measure is correlated with the performances of various applications.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Superpixel decomposition methods, that group pixels into homogeneous regions, have become popular with Ref. ren2003, and have been widely proposed in the past years [felzenszwalb2004, vedaldi2008, moore2008, levinshtein2009, veksler2010, liu2011, achanta2012, vandenbergh2012, conrad2013, buyssens2014, machairas2015, li2015, yao2015, rubio2016, giraud2017_scalp]

. Such decompositions provide a low-level image representation, while trying to respect the image contours. The superpixels are usually used as a pre-processing in many computer vision methods such as: object localization

[fulkerson2009], stereo and occlusion processing [zhang2011stereo], contour detection and segmentation [arbelaez2011], multi-class object segmentation [gould2008, yang2010, tighe2010, gould2014, giraud2017spm], data associations across views [sawhney2014], face labeling [kae2013]

, or adapted neural networks architectures

[liu2015deep, gadde2016superpixel]. For most methods, the computational cost depends on the number of elements to process, and the superpixel representation is well adapted since this aspect is drastically reduced. Moreover, superpixels can also directly improve the accuracy of applications such as labeling [arbelaez2011], since they gather homogeneous pixels in larger areas, thus reducing the potential noise of independent pixel labeling. Finally, contrary to the classical multi-resolution approach, that decreases the image size by averaging the information, the superpixels preserve the image geometry and content.

The definition of an optimal superpixel decomposition depends on the application. Nevertheless, according to the literature, the desired properties of a superpixel method should be the following. (i) The color clustering must group pixels into homogeneous areas in terms of color features, for instance in RGB or CIELab space. (ii) The decomposition boundaries should adhere to the image contours, i.e., the superpixels should not overlap with several image objects. (iii) The superpixels should have regular shapes and consistent sizes within the decomposition. Regular superpixels provide an easier analysis of the image content, and reach better performances for applications such as tracking [chang2013, reso2013]. Since these properties cannot be optimal at the same time [neubert2014compact], most decomposition methods compute a trade-off between these aspects in their model. For instance, Figure 1 shows synthetic examples (a) and (d) where a trade-off between the considered aspects must be computed to decompose the images into three superpixels. In Figures 1(b) and (e), the color homogeneity (i) is optimal, while the respect of image contours (ii) and shape regularity (iii) are respectively favored in (c) and (f). Such examples enable to illustrate how a superpixel decomposition method should behave to optimize each criteria.

Figure 1: Examples of trade-off between the superpixel properties for the decomposition of images (a) and (d) into three superpixels. The decomposition in (b) and (e), are optimal in terms of color homogeneity while the ones in (c) and (f) are respectively optimal in terms of respect of image contours and regularity.

Many evaluation metrics dedicated to the superpixel context were proposed to evaluate the consistency of a decomposition with respect to the three above-mentioned superpixel properties

[levinshtein2009, liu2011, achanta2012, schick2012, neubert2012, wang2017superpixel]. The metrics namely include intra-cluster variation (ICV) [benesova2014] or explained variation (EV) [moore2008] for color homogeneity (i), the undersegmentation error (UE) [levinshtein2009, achanta2012, neubert2012], achievable segmentation accuracy (ASA) [liu2011], or boundary recall (BR) [martin2004] for the criteria (ii), on the respect of image objects, and circularity (C) [schick2012] or mismatch factor [machairas2015] for the criteria (iii), of superpixel shape regularity. These metrics offer different interpretations and evaluations of the superpixel properties. For instance, the circularity [schick2012] and the local regularity metric introduced in Ref. wang2017superpixel respectively evaluate circular and square shapes as the most regular ones, which can be arguable.

Many recent methods allow the user to set a parameter that enforces or relaxes the regularity constraint of the superpixel shape, e.g., Ref. achanta2012, buyssens2014, machairas2015, li2015, yao2015, giraud2017_scalp. We show in Figure 2 that a same superpixel method can produce different results, according to this parameter. While regular decompositions provide visually consistent superpixels, with approximately the same shape and size, the irregular one more accurately follows the color variations, at the expense of dissimilar superpixel shapes.

Figure 2: Example of decompositions obtained with the same method [achanta2012]

with different regularity settings. The difference between the decompositions can be expressed with the variance of distances between superpixel barycenters (normalized by the average distance), which is

for the irregular decomposition and for the regular one.

Since the regularity can be set in most recent methods [vandenbergh2012, achanta2012, buyssens2014, machairas2015, li2015, yao2015, giraud2017_scalp], this parameter should be carefully tunned during the evaluation of performances. In each superpixel paper that introduces a decomposition method with such parameter, the authors give a default regularity setting that generally provides a trade-off between segmentation accuracy and superpixel shape regularity. However, when comparing the state-of-the-art methods, even if the results are computed with the default settings of the initial paper, there can be an important bias in the comparison, since most evaluation metrics or superpixel-based pipelines achieve very different performances depending on the regularity of the decompositions.

In the recent study of Ref. stutz2016, no mention is made on the used settings, although the ranking between methods could be altered with different regularity parameters, while Ref. strassburg2015influence shows that the regularity can be in line with the performances of segmentation and labeling applications. Therefore, we recommend to evaluate the performances with accurate metrics, and for several superpixel scales and regularity levels. Hence, the performances of superpixel methods are not biased by the chosen regularity setting.

1.1 Contributions

In this paper, we propose an evaluation framework of superpixel methods. This work aims to unify the evaluation process across superpixel works and to enable a clear assessment of their performances.

  • We take a global view of existing superpixel metrics and investigate their limitations. We show that the evaluation can be reduced to one criteria per decomposition property: color homogeneity, respect of image objects and shape regularity.

  • We address the non-robustness of the state-of-the-art regularity metrics, with the proposed global regularity (GR) measure, that relevantly evaluates the shape regularity and consistency of the superpixels. 111An implementation of the proposed GR metric is available at: www.labri.fr/~rgiraud/downloads

  • We report an evaluation of the state-of-the-art methods on this GR criteria. Contrary to the standard superpixel literature, to reduce the bias in the evaluation process, we recommend to perform the evaluation for several regularity levels, evaluated with GR, since it expresses the range of potential results of each superpixel method.

  • Finally, we demonstrate on various applications that the regularity constraint has a substantial impact on performances, and that our GR metric is correlated with the obtained results.

1.2 Outline

In this paper, we first present in Section 3 the existing superpixel metrics, and investigate their limitations. To address the non-robustness of the regularity metrics, we propose in Section 4 a new global regularity measure. In Section 5, we use the considered metrics to compare the state-of-the-art superpixel methods according to several superpixel scales and regularity levels. Finally, we demonstrate the impact of the regularity setting on several applications in Section LABEL:sec:impact_regu.

2 Superpixel Decomposition Methods

We first present a non-exhaustive overview of existing superpixel methods. An extensive benchmark evaluation of methods have been proposed in [stutz2016]. Nevertheless, this paper mainly focuses on the evaluation of performances on superpixel metrics and does not compare methods according to their regularity parameter. In this work, we show that the regularity of superpixel decomposition is a major criteria to compare methods. We group the decomposition methods as graph, coarse-to-fine, or gradient-ascent-based approaches. Note that the method type does not reflect the properties of the generated superpixels. For instance, as illustrated in Figure 19 two graph-based methods such as Ref. ren2003 or Ref. felzenszwalb2004 respectively produces very regular and irregular superpixels. We report in Table LABEL:table:methods, a summary of the properties of some state-of-the-art methods. We consider the type and computational complexity of each method, and their ability to control the number of computed superpixels and the shape regularity of the superpixels.

2.1 Graph-based Methods

Most of these methods consider the image as a graph and seek a local minimum of a given functional. These approaches enable to produce either regular [ren2003, buyssens2014] or irregular decompositions [felzenszwalb2004]. The term superpixel was introduced in [ren2003] that uses normalized cuts [shi2000] to decompose an image into regular regions. The computational cost of the normalized cuts algorithm, that computes segmentations based on contour and texture features, is substantial and has limited its application.

The methods proposed in [felzenszwalb2004], performs a fast agglomerative clustering but generates very irregular regions in terms of size and shape. Such irregularity can be an issue, for instance, when trying to compare superpixel neighborhoods (see Section LABEL:sec:nearest), since it prevents from matching similar patterns.

In [moore2008], the superpixels are constrained to fit to a lattice, using a contour prior and refines the decomposition using optimal cuts. However, despite this constraint, the produced superpixels have irregular shapes, and the method appears to be very dependent on the input contour map. [veksler2010] uses a graph cut approach, and the regularity is enforced by associating pixels to a set of overlapping regular patches at multiple scales. Finally, [liu2011] aims to maximize the entropy rate when cutting the graph. The generated decomposition may be quite irregular, but this method obtains close to state-of-the-art performances.

Recently, [buyssens2014] proposes an eikonal-based method to iteratively refine the superpixel boundaries. To overcome the initial regular seed placement, a refinement step is considered with a color geodesic distance to add new superpixels.

ETPS [yao2015]
SLIC [achanta2012]
Figure 3: Illustration of the steps of the coarse-to-fine [yao2015]

(top) and K-means-based iterative clustering

[achanta2012] (bottom) methods.

2.2 Coarse-to-Fine-based Methods

Coarse-to-fine approaches consider the image at different resolution levels and progressively converges towards pixel-level decomposition. A coarse-to-fine framework has been proposed with the SEEDS method [vandenbergh2012], that, similarly to the quad-tree based approach, iteratively segments the image by considering smaller block size. SEEDS uses an histogram based energy, which is particularly robust to noise [stutz2016, giraud2017_scalp], while producing results in reduced computational time. However, this method provides irregular superpixels and does not allow to directly control the number of elements.

More recently, the coarse-to-fine ETPS method were proposed in [yao2015]. Similarly to SEEDS [vandenbergh2012], it at each level, the method considers the blocks at the boundaries of superpixel labels, and the transfer is allowed only if it does not violate the connectivity, i.e., cut a superpixel into separate parts. Contrary to [vandenbergh2012], ETPS uses enables to set the regularity of the decomposition. In its default settings, it produces regular superpixels, but can generate very irregular ones, as the shape constraint is relaxed. Overall, such coarse-to-fine frameworks enable to achieve low computational burden, by focusing the refinement of the clustering on blocks at the superpixel boundaries.

2.3 Gradient-ascent-based Methods

Methods such as Mean shift [comaniciu2002] and Quick shift [vedaldi2008] use a mode-seeking framework to group pixels into regions. While [comaniciu2002] generates superpixels with highly irregular shapes, in a substantial computational time, [vedaldi2008] considers an initial segmentation to improve the decomposition accuracy, but without explicit control of the superpixel size, and requires more time to perform than [comaniciu2002]. Moreover, these methods present the major drawback of not allowing the control of the superpixel number.

In [levinshtein2009], regular superpixels are generated, with control on their number, using geometric flows starting from a regular grid of initial seeds. Nevertheless, [levinshtein2009] does not allow to set the shape constraint, and produces very regular superpixels at the expense of poor adherence to contours.

Although it does not refer to these computed regions as superpixels, [vincent91] proposed to decompose an image using a watershed approach. With constraints, such watershed methods, that perform a gradient-ascent to a local minima do not constrain the region shapes or sizes. Hence, [vincent91] produces superpixels of highly irregular shapes. To constrain the region shapes, a spatial term is added to the model of [neubert2014compact, machairas2015]. Overall, gradient-based methods enable to produce both very regular [levinshtein2009] and very irregular superpixels [vedaldi2008], but these decompositions do not reach state-of-the-art performances.

2.4 K-means-based Methods

Since its introduction, the simple linear iterative clustering (SLIC) method [achanta2012] has been extensively studied. As its name suggests it, the method is straightforward and performs an iterative K-means clustering of pixels starting from an initial grid. The clustering is computed based on a spatial and color distance in the CIELab color space, and can produce either irregular or regular decompositions by setting the trade-off between the two distances. Finally, the SLIC decompositions are computed in order of magnitude faster other approaches [veksler2010] or [liu2011] and can perform well on standard superpixel metrics. For these reasons, SLIC has been considered in a substantial number of computer vision pipelines, e.g., medical image segmentation [lucchi2012supervoxel], optical flow computation [lu2013patch], or saliency detection [zhu2014saliency], and has been extended in many works, e.g., Ref. kim2013sp, neubert2014compact, jia2015, li2015, liu2016manifold, agoes2016, zhang2016, huang2016, rubio2016, giraud2017_scalp, wang2017, which directly aim to the consistency of the SLIC decompositions with respect to the three above-mentioned superpixel properties.

For superpixel-based object recognition methods such as [gould2008, gould2014], the use of regular decompositions is mandatory, i.e., the use of methods that provide superpixels with approximately the same size and compact shapes. Regular decompositions provide less superpixel boundaries, producing generally more visually satisfying results. Moreover, most of these methods enable to set the number of superpixels. For superpixel-based video tracking applications [wang2011], regular approaches enable to better handle the time consistency of the decomposition. The tracking of object trajectories within a scene is improved with consistent superpixel decompositions over time [chang2013].

As stated in Section 1, the three superpixel decomposition aspects to evaluate are: homogeneity, regularity and respect of the image objects. In this section, we focus on the SLIC [achanta2012] framework, since it is the most used superpixel method and it obtains close to state-of-the-art performances. The recent improvements described in Sections LABEL:sec:slic_homo, LABEL:sec:slic_regu and LABEL:sec:slic_contour aim at optimizing the three decomposition aspects, and enable to reach state-of-the-art performances, as shown in Section 5.

The algorithm of [achanta2012] consists of a fast iterative framework that groups pixels into homogeneous regions according to simple color features (average CIELab color). The decomposition starts from a regular grid, composed of blocks of size pixels, with , i.e., the ratio between the number of pixels and the number of desired superpixels . The superpixels are each described by an average feature cluster , which contains the average CIELab colors of pixels and the spatial barycenter of . For each cluster, all pixels within a fixed windows of pixels centered on their barycenter are tested for association at each iteration of the clustering process. The superpixel size is hence constrained to lie within a bounding box. The pixel is compared to a cluster using a spatial distance , and a color distance :


with the regularity parameter that sets the compromise between the color distance and the spatial distance in . High values of produce more regular superpixels, i.e., closer to the initial square grid, while small values of allow the superpixels to better fit to the image contours at the expense of higher variability in the superpixel sizes and shapes. The pixel is associated at each iteration to the superpixel that minimizes , the sum of the color and spatial distances. We illustrate the iterative clustering process of [achanta2012] in the bottom part of Figure 3, starting from a regular grid and iteratively converging towards superpixels that fit to the image content. This approach is performed on simple color features and achieves close to state-of-the-art results in a reduced computational time. Many approaches have then been proposed to improved this framework, generally by adding features to (1).

(a) (b) (c)
Figure 4: Illustration of the iterative clustering algorithm [achanta2012] after (a), (b) and (c) iterations.

To enforce the color homogeneity within the SLIC model, recent works propose to compute content-sensitive superpixels using manifolds which are embedded in the spatial and color space [liu2016manifold], while other approaches improve the performances by computing the clustering in a high dimensional feature space, e.g., [li2015, giraud2017_scalp]. Several works proposed to consider a geodesic color distance along the pixels towards the superpixel barycenter [wang2011, rubio2016]. Such term may further enforce the homogeneity since the colors along the path must also be close to the average superpixel color.

Similarly to patch-based approaches, the pixels in a square area, or patch, centered on the current pixel are considered in the proposed distance :


with the square patch centered on , and , the image features (e.g., in [achanta2012], ). Note that a Gaussian kernel , defined in (LABEL:filtering), is also used to favor the contribution of the closest pixels.

The complexity of the proposed distance (2) is , with , the number of pixels in the neighborhood.

The distance between features in (2) reads:


The terms , and , which only depend on the initial image, can be pre-computed. The complexity of the proposed distance is hence reduced to instead of .

Most methods generally consider a regularity parameter to set the superpixel regularity. As can be seen in Figure 2, the SLIC method [achanta2012] can produce irregular shapes. In [giraud2017_scalp], the information of pixels on the linear path [bresenham1965] to the superpixel barycenter is considered to enforce the regularity.

(a) (b) (c)
Figure 5: Example of SLIC decomposition [achanta2012] (a) with irregular shape (b), set regular with the color distance on the linear path (LABEL:newdist0) of [giraud2017_scalp] (c).
Figure 6: Illustration of geodesic and linear paths, respectively in red and blue colors between two pixels.

Since the optimal color homogeneity may be not correlated with the respect of image objects, a contour information can also be used on such direct path [giraud2017_scalp, wang2017]. In [wang2017], a binary contour map is used to prevent the association of a pixel to a superpixel, if an edge is crossed on the direct path to the superpixel barycenter. In [giraud2017_scalp], the maximum of intensity of a smooth contour prior, is considered on the linear path and included in the clustering distance of [achanta2012].

3 Standard Superpixel Metrics and Limitations

In this section, we present the existing superpixel metrics, that were progressively introduced in the literature to evaluate the color homogeneity, the respect of image objects or the shape regularity, and we show their limitations to only focus on relevant ones. In Sections 3.1 and 3.3, we present color homogeneity and shape regularity metrics that evaluate for an image , a superpixel decomposition composed of superpixels , where denotes the cardinality of the considered element. In Section 3.2, we present metrics that evaluate the respect of image objects and compare the decomposition to a ground truth denoted , with a segmented region.

3.1 Homogeneity of Color Clustering

The homogeneity of the color clustering is a core aspect of the superpixel decomposition. Most methods compute a trade-off between spatial and color distances to compute the superpixels. The ability to gather homogeneous pixels should hence be considered in the comparison process, but this aspect is rarely evaluated in state-of-the-art superpixel works. In Section LABEL:sec:impact_regu, we show that color homogeneity is particularly interesting for image compression [levinshtein2009, wang2013].

The homogeneity of colors can be evaluated by comparing the average colors of superpixels to the colors of the pixels in the initial image. The intra-cluster variation (ICV) [benesova2014] has been proposed to measure such color variation for a decomposition :


with , the average color of the superpixel .

The explained variation (EV) [moore2008] was also proposed to evaluate the homogeneity of the color clustering and is defined as:



The ICV metric presents several drawbacks. It is not normalized by the image variance to allow comparable evaluation between different images [stutz2016], and the superpixel size is not considered, so the measure is not robust to the superpixel scale. In Figure 7, we illustrate these issues on synthetic examples. The dynamic and size of the same image with the same decomposition are altered, and these transformations impact the measure of ICV.

Figure 7: Comparison of ICV and EV to measure the superpixel color homogeneity. The dynamic and dimension of the same image (a) are respectively modified in (b) and (c). EV is robust to these transformations contrary to ICV.

The EV metric is robust to these transformations. It expresses the color variance contained into each superpixel, and simple calculations provide another explicit formulation:


In Figure 8(c), we illustrate the color variance within each superpixel . Higher variance within a superpixel will lower the EV criteria, so high EV values express homogeneous color clustering. Despite the robustness of this criteria, it was not considered in the main state-of-the-art superpixel works. As in Ref. stutz2016, we recommend the use of EV to evaluate color homogeneity.

Figure 8: Example of superpixel decomposition (a), with average colors (b), and color variance within each superpixel (c).

3.2 Respect of Image Objects

The most considered aspect in the superpixel evaluation is the respect of the image objects. An accurate decomposition should have superpixels that do not overlap with multiple objects. To evaluate this aspect, many metrics were proposed. In the literature, undersegmentation error (UE), achievable segmentation accuracy (ASA) and boundary recall (BR) measures are mainly reported. While UE and ASA compute overlaps with the ground truth regions, BR is a contour adherence measure, which is usually correlated to the first two metrics.

Regarding the UE, several definitions were proposed [levinshtein2009, achanta2012, neubert2012]. UE evaluates the number of pixels that cross ground truth region boundaries. The initial UE formulation [levinshtein2009], denoted , is defined as:


The measure was discussed in several works, e.g., Ref. achanta2012, vandenbergh2012, neubert2012, since any superpixel that has an overlap with the ground truth segment penalizes the metric. Therefore, is very sensitive to small overlaps and does not accurately reflect the respect of the image objects. A method to reduce the overlap considered in (7) was proposed in Ref. achanta2012, but it requires a parameter. Recent state-of-the-art works tend to use the free parameter formulation of UE introduced by Ref. neubert2012:


This formulation respectively considers the intersection or the non-overlapping superpixel area in case of small or large overlap with the ground truth region, and addresses the non-robustness of former UE definitions.

The ASA measure [liu2011] also aims at evaluating the overlap of superpixels with the ground truth. It is reported in most of the superpixel literature as follows:


For each superpixel , the largest possible overlap with a ground truth region is considered, and higher values of ASA indicate better results.

Another measure is extensively reported in the superpixel literature to evaluate the adherence to object contours: the boundary recall (BR). This metric evaluates the detection of ground truth contours by the superpixel boundaries such that:


with when is true and otherwise, and is a distance threshold that has to be set, for instance to pixels [liu2011, vandenbergh2012]. Each ground truth pixel is considered as detected, if a superpixel boundary is at less than an distance.


Although the BR measure (10) has been extensively reported in the literature and is recommended in Ref. stutz2016, it does not express the respect of image objects or the contour detection performances, as it only considers true positive contour detection. Hence, the number of computed superpixel contours is not considered, and very irregular methods can obtain higher BR results. Figure 9 compares two decompositions with the same number of superpixels that have maximal . One of the decomposition is very irregular and produces a high number of boundary superpixels, and this aspect is not considered in the BR metric. For these reasons, recent works such as Ref. machairas2015, zhang2016, giraud2017_scalp report BR results according to the contour density (CD), which measures the number of superpixel boundaries such that . Nevertheless, BR needs a parameter to be set and is not sufficient to reflect the respect of image objects. BR should only be considered as a tool for evaluation of contour detection performances, as shown in Section LABEL:sec:impact_regu.

Figure 9: Examples of irregular [yao2015] (b) and regular [achanta2012] (c) decomposition of an image (a), with maximal BR measure. BR evaluates the detection of ground truth contours so both decompositions can have maximal BR measure, although the irregular one produces more superpixel boundaries.

The study of Ref. stutz2016 claims that the UE (8) and ASA (9) measures are correlated, so that , making redundant the use of both for superpixel evaluation. Under relevant assumptions, we now show that the true relation between the two measures is:


In Appendix LABEL:sec:asa_demo, we demonstrate that the relation (11) is exact when all superpixels have a major overlap, i.e., a ground truth region intersects with more than half of the superpixel area. This case is illustrated in Figure 10(b), where a superpixel has a major overlap with a region . Note that this assumption is necessarily true when each superpixel overlaps with only two regions or when the ground truth is binary. In Appendix LABEL:sec:asa_demo, we measure the error of (11) on state-of-the-art methods. This error appears to be negligible, underlying the likelihood of this assumption.

Figure 10: Examples of decomposition where a superpixel overlaps with multiple ground truth regions . In (b), contrary to (a), a region () overlaps with more than half of , which corresponds to the major overlap hypothesis ensuring (11).

Hence, since the UE measure has been proven to be non-robust, and we demonstrate that the study of UE and ASA is equivalent, we recommend to restrain the evaluation of the respect of image objects to the ASA results (9).

3.3 Regularity

Superpixel decompositions compute a lower-level image representation. Since the desired number of elements is given, and most methods tend to compute regions with approximately the same size, superpixels are generally created in smooth areas (see for instance Figure 2). To produce a consistent decomposition, most superpixel methods compute a trade-off between color clustering and regularity, which has been proven to have an impact on application performances [veksler2010, reso2013, strassburg2015influence]. Therefore, the regularity of the superpixel shapes is a core aspect when evaluating and comparing decomposition methods.

In Ref. schick2012, the circularity metric was introduced to locally evaluate the compactness of the superpixels. This measure is the usual local regularity metric, and has been considered in state-of-the-art works [reso2013, buyssens2014, zhang2016, giraud2017_scalp], and benchmarks [schick2014, stutz2016, wang2017superpixel]. This circularity C is defined for a superpixel shape as follows:


where is the superpixel perimeter. The regularity is hence considered as the ability to produce circular areas.

While the circularity independently evaluates the local compactness of each superpixel, other works propose to evaluate the consistency of the shapes within the decomposition. In Ref. wang2017superpixel, the variance of superpixel sizes is considered. Ref. machairas2015 goes further and proposes an adapted version of the mismatch factor [strachan1990] to measure the consistency in terms of size and shape. The mismatch factor is computed as with

being the standard Jaccard index

[jaccard1901], that computes the overlap between two regions and such that: . The global measure J for a superpixel decomposition is computed as follows [machairas2015]:


with the registered superpixel , so its barycenter is at the origin of the coordinates system, and , the binary average shape of . In the following, we consider the J metric when comparing to Ref. machairas2015, since, contrary to the mismatch factor, its interpretation is consistent with the other presented metrics, i.e., higher value is better. Eq. (13) compares each superpixel to the binary average shape of the decomposition using . To compute , the registered superpixels are first averaged into such that:


A thresholded shape

is then defined by binarization with respect to a threshold

. The binary average shape is finally defined as , with , the average superpixel size. We illustrate these definitions in Figure 11. We represent a superpixel decomposition , the corresponding average of superpixel shapes , and the binary average shape .

Figure 11: Illustration of the average superpixel shape definition [machairas2015]. A decomposition is considered in (a). The superpixel shapes are registered and averaged into (b) to provide the binary average shape (c).


Although the circularity metric (12), introduced in Ref. schick2012, is the local regularity metric usually used in the literature [reso2013, buyssens2014, zhang2016, giraud2017_scalp, schick2014, stutz2016], it presents several drawbacks. For instance, the relevance of this metric, and the regularity definition have been discussed [machairas2015], since the circularity considers a circle or a hexagon as more regular than a square, and is very sensitive to boundary smoothness [giraud2017_src]. As a consequence, some methods, such as Ref. zhang2016, start from a hexagonal repartition of superpixels seeds, and design their spatial constraint to fit to a hexagonal grid in order to obtain higher circularity. However, the superpixel literature more generally refers to the regularity as the ability to produce convex shapes with smooth boundaries. Therefore, squares, circles or hexagons should be considered as regular shapes. Moreover, since most superpixel methods generate a regular square grid with their regularity parameter set to maximum value, it would make sense to obtain the highest regularity measure for such square decomposition.

The mismatch factor or Jaccard index (13) was introduced by Ref. machairas2015

to evaluate the shape consistency within the whole decomposition, but it does not consider the superpixel size, so each shape equally contributes. Moreover, by computing a thresholded average shape, the metric appears to be non-robust to large shape outliers, leading to potential irrelevant measures. Finally, with such metric that only considers the shape consistency, decomposing an image with its lines or with stretched rectangles would give the highest regularity measure.

Contrary to ASA and EV, that give a relevant measure of aspects (i) color homogeneity and (ii) respect of image objects, no existing metric provides a robust and accurate regularity measure of a superpixel decomposition. As a consequence, we propose in the following Section 4, a new global regularity measure, that addresses the limitations of state-of-the-art ones.

4 A new Regularity Measure

In the literature, the circularity (12) has been proposed to measure the shape regularity [schick2012], and the mismatch factor to evaluate the shape consistency across the decomposition [machairas2015]. These measures present several drawbacks, that we address in this work by introducing two new metrics (18) and (19), combined into the proposed global regularity (GR) measure. GR robustly evaluates both shape regularity and consistency over the decomposition, and we demonstrate in Sections 5 and LABEL:sec:impact_regu that it relevantly evaluates the performances of superpixel methods.

4.1 Shape Regularity

As stated in Section 3.3, an accurate superpixel shape regularity measure should provide the highest results for convex shapes, such as squares, circles or hexagons, and penalize unbalanced shapes while considering noisy boundaries. To express such a measure, we propose to combine all these aspects into a new shape regularity criteria (SRC) [giraud2017_src]. The convexity of a shape is considered, i.e., the smoothness of its contours and the overlap with its convex hull , that entirely contains . To evaluate both overlap with the convex hull and contour smoothness, we first define as , the ratio between the perimeter and the area of a shape , which is linked with the Cheeger constant for convex shapes [caselles2009]. Then, we introduce our criteria of regularity (CR) as:


Since the convex hull entirely contains and has a lower perimeter, the CR measure is between 0 and 1, and is maximal for convex shapes such as squares, circles or hexagons. Contour smoothness. Finally, the regularity of the superpixel borders must be considered. The convexity measure (CO) compares the number of boundary pixels of the shape and the one of its convex hull. Although this measure is generally in line with SO, it is mostly dependent on the border smoothness and penalizes noisy superpixels:


Nevertheless, the comparison to the convex hull is not sufficient to define the regularity of a superpixel.

The balanced repartition of the pixels within the shape is another aspect to consider. Otherwise, shapes such as ellipses or lines would get the maximum CR results. We define the variance term as the ratio between the minimum and maximum variance of pixel positions and that belong to :


with and

, the standard deviations of the pixel positions. Such measure enables to penalize dissimilarity in the pixel repartition, and

if, and only if, , i.e., if the spatial pixel repartition is balanced.

The proposed shape regularity criteria (SRC) is defined as follows:


Note that in practice, we use the square root of , so both criteria have similar variation ranges. SRC robustly and jointly evaluates convexity, contour smoothness and balanced pixel repartition.

(a) (b) (c)
Figure 12: Convex hull example on a synthetic shape. The overlap between the shape (a) and its convex hull (b) is shown in (c). The shape is contained into the hull and the overlap is such that .

Circularity vs Shape Regularity Criteria

To demonstrate the robustness and relevance of SRC (18) over circularity C (12), we consider in Figure 13 synthetic shapes that are split into three groups (regular, standard and irregular), and generated with smooth (top) and noisy boundaries (bottom). First, we present the circularity drawbacks, which reports lower measure for the Square than for the Hexagon, or the Ellipse. Since methods such as Ref. machairas2015, zhang2016 produce superpixels from a hexagonal grid, the regularity evaluation is very likely to be superior for these methods than for other ones starting from square regular grids. The circularity is also very sensitive to the contour smoothness, so regular and standard noisy shapes have similar measure, and the groups are no longer differentiated (see the bottom part of Figure 13). Finally, standard but smooth shapes have much higher circularity than noisy regular ones.

As can be seen in Figure 13, SO, and CO independently taken are not sufficient to express the compactness of a shape. The proposed SRC combines all defined regularity properties. For instance, SO is representative for all shapes, except for the Ellipse and W, since they both have large overlap with their convex hull. penalizes the Ellipse since it does not have a balanced pixel repartition, and CO considers the large amount of contour pixels in the W shape.

With SRC, we first note that the three regular shapes have the highest regularity measure (), and that regular but noisy shapes, have similar SRC to the smooth standard ones. Overall, since SRC is less dependent on the boundary smoothness, in each group, smooth and noisy shapes are clearly separated, contrary to C. Finally, we show the metric evolution with the shape size in Figure 15. As stated in Ref. roussillon2010, due to discretization issues, the circularity must be thresholded so it is not superior to 1, and it drops as the shape size increases. Therefore, this metric is not robust to the superpixel size, and the comparison of methods on circularity is relevant only if the compared superpixel decompositions have the same number of elements. Contrary to the circularity, the SRC metric is robust to the superpixel scale and provides a consistent evaluation of shape regularity.

Regular shapes Standard shapes Irregular shapes
Square Circle Hexagon Ellipse Cross Bean W Split U
Figure 13: Comparison of circularity (C) and proposed shape regularity criteria (SRC) on three groups of synthetic shapes with smooth (top) and noisy boundaries (bottom). C appears to only favor circular shapes and does not separate standard and regular noisy ones. The SRC metric addresses these issues and clearly differentiates the shape groups in the smooth and noisy cases.
Figure 14: Robustness to the superpixel scale of the proposed shape regularity criteria (SRC) compared to the circularity (C).
(a) Separation of noisy shapes (b) Robustness to scale
Figure 15: Comparison of the circularity (C) and the proposed shape regularity criteria (SRC) on the ability to separate the three groups of noisy shapes (a) and the robustness to the superpixel scale (b).

Finally, a decomposition into superpixels can be seen as a preprocessing, similar to the multi-resolution approach, but where the regions are no longer regular blocks. Since most decomposition methods starts from a square grid, and iteratively refine the superpixel borders, it would thus make sense to obtain the highest regularity measure with a square decomposition, i.e., regularity parameter set to infinite.

Nevertheless, the SRC and circularity metrics independently evaluate each superpixel shape without considering the global homogeneity of the decomposition. Superpixel methods, e.g., Ref. felzenszwalb2004, rubio2016, or other segmentation algorithms such as the quadtree partitioning method [tanimoto1975] can produce regions of very variable sizes. In Figure 16, we show an example of decomposition with superpixels having approximately the same size [achanta2012], and a standard quadtree-based partition [tanimoto1975], which produces larger regions in areas with lower color variance. Since the circularity and SRC measures independently consider each superpixel and report an average evaluation of local regularity, the quadtree partition obtains the highest measure although its elements do not have similar sizes. Such local metrics are thus not sufficient to express the global regularity of a decomposition.

Figure 16: Limitation of the local shape regularity metrics. The shape consistency is not considered so the quadtree decomposition gives higher measures than the decomposition obtained with the SLIC method [achanta2012]. The decompositions are represented with their Delaunay graphs, connecting the barycenters of adjacent superpixels.

4.2 Shape Consistency

The regularity of the superpixel shapes evaluated with a local criteria is a relevant information but does not reflect their consistency in terms of shape and size within the decomposition. In Section 4.1, we define the shape regularity properties for a superpixel: convexity, boundary smoothness and balanced pixel repartition. Nevertheless, a perfectly regular decomposition of an image should be composed of similar regular shapes at the same scale. In other words, a relevant criteria should locally evaluate the shape regularity and how the superpixels are consistent in terms of shape and size within the decomposition.

As stated in Section 3.3, the mismatch factor (13) uses the Jaccard index to compare the superpixels to a thresholded average shape. As illustrated in Figure 17, this measure can be incoherent with the visual consistency of a decomposition. In these examples, the binary average shape is the same, and corresponds to the red areas. The J measure is low and incoherently decreases as the consistency is visually improved.

In this work, we propose the smooth matching factor (SMF), that directly compares the superpixels to the average shape :


SMF compares the spatial distributions of the average superpixel shape to each registered superpixel shape . The SMF criteria should be close to 1 if the distributions of pixels within the shapes are similar, and close to otherwise. Overall, the proposed SMF metric evaluates the consistency in terms of shape and size within a decomposition. By considering the average shape without thresholding, the evaluation is more robust to shape outliers. This criteria addresses the non-robustness of the mismatch factor (J) [machairas2015], as can be seen in Figure 17, where SMF is relevant according to the consistency of the decomposition. Finally, examples of superpixel decomposition on natural images are given in Figure 18 and illustrate that SMF is not correlated to the J measure.

Figure 17: Illustration of several decomposition examples with the corresponding average binary shape in red, and J (13) and SMF (19) values.
Figure 18: Decomposition examples with corresponding J (13) and SMF (19) measures.

4.3 Global Regularity Measure

As previously stated, a perfectly regular superpixel decomposition should be composed of compact shapes, that would be consistent in terms of size and shape. The SRC metric enables to locally evaluate the shape regularity while the SMF measures the shape consistency. To evaluate both aspects, we propose to combine these metrics in the global regularity (GR) measure:


In the following Section 5, we report the evaluation on the considered metrics (EV, ASA, GR) for state-of-the-art superpixel methods. These measures are reported according to the number of generated superpixels, and also according to the regularity, since this setting substantially impacts performances.

5 Comparison of State-of-the-Art Superpixel Methods

5.1 Dataset

We compare the performances of state-of-the-art methods on the standard Berkeley segmentation dataset (BSD) [martin2001]. This dataset contains 200 various test images of size pixels. For each image, human segmentations are provided and considered as ground truth to evaluate the respect of image objects. At least five decompositions are provided per image, and the presented results in the following sections are averaged on all ground truths. Note that other datasets can be considered, e.g., Ref. gould2009decomposing, yamaguchi2012parsing, but the BSD [martin2001] is the most used dataset for comparing superpixel methods. Moreover, Ref. stutz2016 shows that decomposition algorithms that perform well on the BSD usually perform well on other datasets.

5.2 Considered Superpixel Methods

Many frameworks have been proposed to decompose an image into superpixels using either graph-based [felzenszwalb2004, veksler2010, liu2011, buyssens2014], watershed [vincent91, neubert2014compact, machairas2015], coarse-to-fine [vandenbergh2012, yao2015] or gradient-ascent [vedaldi2008, levinshtein2009, achanta2012, li2015, giraud2017_scalp] approaches (see Ref. stutz2016 for a detailed review of existing methods). In order to illustrate the proposed evaluation framework, and to validate our regularity measure, we consider the following state-of-the-art superpixel methods: TP [levinshtein2009], ERS [liu2011], SLIC [achanta2012], SEEDS [vandenbergh2012], ERGC [buyssens2014], WP [machairas2015], LSC [li2015], ETPS [yao2015] and SCALP [giraud2017_scalp]. As stated in the introduction, only the most recent methods, SLIC [achanta2012], ERGC [buyssens2014], WP [machairas2015], LSC [li2015], ETPS [yao2015] and SCALP [giraud2017_scalp], enable to set a regularity parameter. A decomposition example for each considered method is given in Figure 19.

TP [levinshtein2009] ERS [liu2011] SEEDS [vandenbergh2012] SLIC [achanta2012] ERGC [buyssens2014] WP [machairas2015] LSC [li2015] ETPS [yao2015] SCALP [giraud2017_scalp]
 Type ga gb cf ga gb ga ga cf ga
Table 1: Comparison of state-of-the-art superpixel methods, sorted as either graph-based (gb), coarse-to-fine-based (cf) or gradient-ascent-based (ga) approaches. We report the ability of methods to set the number of superpixels, and the regularity of the generated superpixels along with their complexity according to the image size .
Figure 19: Decomposition example of each considered state-of-the-art superpixel methods for approximately superpixels.

5.3 Quantitative Results

Figure 20: Comparison of circularity (C) and proposed shape regularity criteria (SRC) on shapes of various pixel sizes .

In this section, we compare the methods considered in Section 5.2 on the recommended metrics for several superpixel scales and several regularity levels. In Figure 22, we perform the standard evaluation of performances, according to the number of generated superpixels. The behavior of each method regarding the different decomposition aspects, homogeneity of color clustering (EV), respect of image objects (ASA), and regularity (GR), is respectively evaluated in Figures 22(a), (b) and (c). Methods such as TP [levinshtein2009] and WP [machairas2015], that produce very regular superpixels appear to poorly perform on other metrics. Although they report high regularity, recent methods SCALP [giraud2017_scalp] and ETPS [yao2015] perform well on color homogeneity, evaluated with EV. SCALP [giraud2017_scalp] even performs best on respect of image objects, evaluated with ASA. We summarize this evaluation in Table 5.3. A superpixel decomposition cannot obtain maximum values on each criteria at the same time, and according to the desired decomposition aspect, one would grant more consideration to a particular criteria.

Finally, as demonstrated in Section 4, we show in Figure 21 that the proposed GR measure differs from the J one [machairas2015]. The proposed GR provides much smoother evaluation of regularity according to the superpixel size, especially for Turbopixels (TP) [levinshtein2009] and Waterpixels (WP) [machairas2015] methods. Moreover, the hierarchy between methods is modified with our robust evaluation metric. For instance, SEEDS [vandenbergh2012] gets similar results to ERS [liu2011] and SCALP [giraud2017_scalp] gets higher results than SLIC [achanta2012]. This shows that both measures are not equivalent, and we demonstrate in Sections 4, 5 and LABEL:sec:impact_regu the robustness and relevance of the GR metrics that should be considered for regularity evaluation of superpixel methods.

Figure 21: Results of state-of-the-art methods on the mismatch factor [machairas2015].
Figure 22: Evaluation of state-of-the-art superpixel methods on EV, ASA and GR according to the number of superpixels, with the methods default regularity settings. The hierarchy between the performances of methods tends to be consistent on all metrics.
Table 2: Average EV, ASA and GR on several scales , with the methods default regularity settings.